15 May 2023

AWS

AWS Certified SysOps Administrator - Associate (SOA-C02)

Summary of concepts for AWS SysOps Administrator Certification.

CloudWatch

AWS CloudWatch Metrics

CloudWatch provides metrics for every services in AWS

Metric is a variable to monitor (CPUUtilization, NetworkIn…)
Metrics belong to namespaces
Dimension is an attribute of a metric (instance id, environment, etc…).
Up to 30 dimensions per metric
Metrics have timestamps
Can create CloudWatch dashboards of metrics

AWS Provided metrics (AWS pushes them):

Basic Monitoring (default): metrics are collected at a 5 minute internal
Detailed Monitoring (paid): metrics are collected at a 1 minute interval
Includes CPU, Network, Disk and Status Check Metrics

Custom metric (yours to push):

Basic Resolution: 1 minute resolution
High Resolution: all the way to 1 second resolution
Include RAM, application level metrics
Make sure the IAM permissions on the EC2 instance role are correct !

RAM is NOT included in the AWS EC2 metrics

CloudWatch Custom Metrics

You can retrieve custom metrics from your applications or services using the StatsD and collectd protocols. StatsD is supported on both Linux servers and servers running Windows Server. collectd is supported only on Linux

Possibility to define and send your own custom metrics to CloudWatch
Example: memory (RAM) usage, disk space, number of logged in users …
Use API call PutMetricData
Ability to use dimensions (attributes) to segment metrics
- Instance.id
- Environment.name
Metric resolution (StorageResolution API parameter - two possible value):
- Standard: 1 minute (60 seconds)
- High Resolution: 1/5/10/30 second(s) - Higher cost
Important 👀 EXAM: Accepts metric data points two weeks in the past and two hours in the future (make sure to configure your EC2 instance time correctly)
You can use AWS CLI or API to upload the data metrics to CloudWatch.

1	aws cloudwatch put-metric-data --metric-name PageViewCount --namespace MyService --value 2 --timestamp 2023-01-01-14T08:00:00.000Z

high-resolution: high-resolution custom metric, your applications can publish metrics to CloudWatch with 1-second resolution. High-Resolution Alarms allow you to react and take actions faster and support the same actions available today with standard 1-minute alarms.

CloudWatch Dashboards

Great way to setup custom dashboards for quick access to key metrics and alarms
Dashboards are global
``Dashboards can include graphs from different AWS accounts and regions``** - 👀 EXAM
You can change the time zone & time range of the dashboards
You can setup automatic refresh (10s, 1m, 2m, 5m, 15m)
Dashboards can be shared with people who don’t have an AWS account (public, email address, 3rd party SSO provider through Amazon Cognito)

CloudWatch Logs - Sources

SDK, CloudWatch Logs Agent, CloudWatch Unified Agent
Elastic Beanstalk: collection of logs from application
ECS: collection from containers
AWS Lambda: collection from function logs
VPC Flow Logs: VPC specific logs
API Gateway
CloudTrail based on filter
Route53: Log DNS queries

CloudWatch Logs Subscriptions

Get a real-time log events from CloudWatch Logs for processing and analysis
Send to Kinesis Data Streams, Kinesis Data Firehose, or Lambda
Subscription Filter - filter which logs are events delivered to your destination
Cross-Account Subscription - send log events to resources in a different AWS account (KDS, KDF)

Alarms

CloudWatch alarms allow you to monitor metrics and trigger actions based on defined thresholds. In this case, you can create a CloudWatch alarm that monitors the CPU utilization metric of the EC2 instance. When the CPU utilization reaches 100%, the alarm will be triggered, and you can configure actions such as sending notifications or executing automated actions to address the unresponsiveness issue.

Alarm Targets - 👀 EXAM

EC2 - Stop, Terminate, Reboot, or Recover an EC2 Instance
EC2 Auto Scaling - Trigger Auto Scaling Action, scaling up or down.
SNS - Send notification to SNS (from which you can do pretty much anything)
creating a Systems Manager OpsItem.
Composite Alarms are monitoring the states of multiple other alarms

EC2 Instance Recovery

StatusCheckFailed_System

Status Check:
- Instance status = check the EC2 VM
- System status = check the underlying hardware

Recovery: Same Private, Public, Elastic IP, metadata, placement group

👀 Alarms can be created based on CloudWatch Logs Metrics Filters

Test an alarm using aws set-alarm-state

1	👀 aws cloudwatch set-alarm-state --alarm-name "TerminateInHighCPU" --state-value ALARM --state-reason "testing purposes"

CloudWatch Synthetics

CloudWatch Synthetics canaries are automated/configurable scripts that monitor the availability and performance of applications, endpoints, and APIs. They are designed to simulate user interactions with an application and provide insights into its behavior.

Canaries are created using scripts written in Node.js or Python and are scheduled to run at regular intervals. These scripts perform tasks such as navigating through a website, clicking on specific elements, submitting forms, and validating responses. By executing these predefined actions, canaries can monitor the functionality, responsiveness, and performance of an application or API.

CloudWatch Synthetics canaries collect data on metrics like response time, latency, availability, and success rates. They can also be configured to generate alarms when certain conditions are met, allowing proactive identification and remediation of issues.

Reference

https://docs.aws.amazon.com/awssupport/latest/user/trusted-advisor-check-reference.html

Amazon EventBridge (formerly CloudWatch Events)

Schedule: Cron jobs (scheduled scripts) - Schedule Every hour -> Trigger script on Lambda function
Event Pattern: Event rules to react to a service doing something - IAM Root User Sign in Event -> SNS Topic with Email Notification
Trigger Lambda functions, send SQS/SNS messages…

EventBridge Rules

Service Quotas CloudWatch Alarms

Notify you when you’re close to a service quota value threshold
Create CloudWatch Alarms on the Service Quotas console
Example: Lambda concurrent executions
Helps you know if you need to request a quota increase or shutdown resources before limit is reached

Alternative: Trusted Advisor CW Alarms

Limited number of Service Limits checks in Trusted Advisor (~50)
Trusted Advisor publishes its check results to CloudWatch

`👀` For each production EC2 instance, create an `Amazon CloudWatch alarm` for Status `Check Failed: System`. Set the alarm action to `recover the EC2 instance`. Configure the alarm notification to be published to an Amazon Simple Notification Service (Amazon SNS) topic.

Explanation: By creating a CloudWatch alarm for Status Check Failed: System, you can automate the recovery task of EC2 instances (stop, terminate, reboot, or recover your Amazon EC2 instances). When the system health check fails for an EC2 instance, the alarm will be triggered and perform the configured action to recover the instance. This eliminates the need for manual intervention. Additionally, configuring the alarm to publish notifications to an SNS topic allows you to receive notifications whenever a system health check fails.

Status Check

Automated checks to identify hardware and software issues.

System Status Checks

Monitors problems with AWS systems (software/hardware issues on the physical host, loss of system power, …)
Check Personal Health Dashboard for any scheduled critical maintenance by AWS to your instance’s host
Resolution: stop and start the instance (instance migrated to a new host)
- Either wait for AWS to fix the host, OR
- Move the EC2 instance to a new host = STOP & START the instance (if EBS backed) Instance Status Checks
Monitors software/network configuration of your instance (invalid network configuration, exhausted memory, …)
Resolution: reboot the instance or change instance configuration.

Status Checks - CW Metrics & Recovery - `👀 EXAM`

CloudWatch Metrics (1 minute interval)
- StatusCheckFailed_System
- StatusCheckFailed_Instance
- StatusCheckFailed (for both)
Option 1: CloudWatch Alarm
- Recover EC2 instance with the same private/public IP, EIP, metadata, and Placement Group
- Send notifications using SNS trigger
Option 2: Auto Scaling Group
- Set min/max/desired 1 to recover an instance but won't keep the same private and elastic IP.`

Determine which instance use the most bandwidth

NetworkIn and NetworkOut

Identify the processing power required

👀 CPUUtilization specifies the percentage of allocated EC2 compute units that are currently in use on the instance. This metric identifies the processing power required to run an application on a selected instance. This metric is expressed in Percent.

Number of users.

👀 ActiveConnectionCount This metric represents the total number of concurrent TCP connections active from clients to the load balancer and from the load balancer to targets.

RAMUtilization is `NOT available as` an `EC2 metric`

RAMUtilization You can publish your own metrics to CloudWatch using the AWS CLI or an API. You can view statistical graphs of your published metrics with the AWS Management Console. Metrics produced by AWS services are standard resolution by default.

5xx server errors

To monitor the number of 500 Internal Error responses that you’re getting, you can enable Amazon CloudWatch metrics. Amazon S3 CloudWatch request metrics include a metric for 5xx server errors.

4xxx

You can set an alarm to notify operators when the 404 filter metric exceeds a threshold. 👀 HTTPCode_ELB_4XX_Count metric stands for the number of HTTP 4XX client error codes that originate from the load balancer. This count does not include response codes generated by targets.

Events

You can run CloudWatch Events rules according to a schedule.

EBS Snapshots

It is possible to create an automated snapshot of an existing Amazon Elastic Block Store (Amazon EBS) volume on a schedule. You can choose a fixed rate to create a snapshot every few minutes or use a cron expression to specify that the snapshot is made at a specific time of day.

Snapshots are incremental backups, which means that only the blocks on the device that have changed after your most recent snapshot are saved. This minimizes the time required to create the snapshot and saves on storage costs by not duplicating data. Each snapshot contains all of the information that is needed to restore your data (from the moment when the snapshot was taken) to a new EBS volume.

Reference: Schedule Automated Amazon EBS Snapshots Using CloudWatch Events

Filters - QUESTION

You can create a count of 404 errors and exclude other 4xx errors with a filter pattern on 404 errors.

Agents

If your AMI contains a CloudWatch agent, it’s automatically installed on EC2 instances when you create an EC2 Auto Scaling group. With the stock Amazon Linux AMI, you need to install it (AWS recommends to install via yum).

Install Agents to track the state of each of the instances

You must attach the CloudWatchAgentServerRole IAM role to the EC2 instance to be able to run the CloudWatch agent on the instance. This role enables the CloudWatch agent to perform actions on the instance.

Publish custom metrics to CloudWatch.

You can publish your own metrics to CloudWatch using the AWS CLI or an API. You can view statistical graphs of your published metrics with the AWS Management Console. CloudWatch stores data about a metric as a series of data points. Each data point has an associated time stamp. You can even publish an aggregated set of data points called a statistic set.

The put-metric-data command publishes metric data to Amazon CloudWatch, which associates it with the specified metric. If the specified metric does not exist, CloudWatch creates the metric which can take up to fifteen minutes for the metric to appear in calls to ListMetrics.

Collect process metrics with the `procstat` plugin

The procstat plugin enables you to collect metrics from individual processes. It is supported on Linux servers and on servers running Windows Server 2012 or later.

Dashboard Body Structure and Syntax - EXAM

A DashboardBody is a string in JSON format. It can include an array of between 0 and 500 widget objects, as well as a few other parameters. The dashboard must include a widgets array, but that array can be empty.

When deploying resources using AWS CloudFormation, the goal is often to define as much of the desired infrastructure as possible directly within the template. This is achieved by taking the JSON representation of the prototype dashboard and embedding it directly within the CloudFormation template using the DashboardBody property.

Resources:
  MyCloudWatchDashboard:
    Type: AWS::CloudWatch::Dashboard
    Properties:
      DashboardName: MyDashboard
      DashboardBody: |
        {
          "widgets": [
            {
              "type": "text",
              "x": 0,
              "y": 0,
              "width": 12,
              "height": 1,
              "properties": {
                "markdown": "### My Dashboard"
              }
            },
            {
              "type": "metric",
              "x": 0,
              "y": 1,
              "width": 12,
              "height": 6,
              "properties": {
                "metrics": [["AWS/EC2", "CPUUtilization", "InstanceId", "i-1234567890abcdef0"]],
                "title": "EC2 CPU Utilization",
                "period": 300,
                "stat": "Average",
                "region": "us-east-1",
                "yAxis": {"left": {"min": 0, "max": 100}}
              }
            }
          ]
        }

CloudTrail

Provides governance, compliance and audit for your AWS Account

CloudTrail is enabled by default!
Get an history of events / API calls made within your AWS Account by:
- Console
- SDK
- CLI
- AWS Services
Can put logs from CloudTrail into CloudWatch Logs or S3
A trail can be applied to All Regions (default) or a single Region
If a resource is deleted in AWS, investigate CloudTrail first!

CloudTrail Insights

👀 Enable CloudTrail Insights to detect unusual activity in your account:
- inaccurate resource provisioning
- hitting service limits
- Bursts of AWS IAM actions
- Gaps in periodic maintenance activity
CloudTrail Insights analyzes normal management events to create a baseline
And then continuously analyzes write events to detect unusual patterns.
- Anomalies appear in the CloudTrail console
- Event is sent to Amazon S3
- An EventBridge event is generated (for automation needs)

CloudTrail - Integration with EventBridge

Used to react to any API call being made in your account
CloudTrail is not “real-time”:
- Delivers an event within 15 minutes of an API call
- Delivers log files to an S3 bucket every 5 minutes

CloudTrail - Organizations Trails

A trail that will log all events for all AWS accounts in an AWS Organization
Log events for management and member accounts
Trail with the same name will be created in every AWS account (IAM permissions)
Member accounts can’t remove or modify the organization trail (view only)

CloudTrail - Log File Integrity Validation

Digest Files:

References the log files for the last hour and contains a hash of each
Stored in the same S3 bucket as log files (different folder)
Helps you determine whether a log file was modified/deleted after CloudTrail delivered it
Hashing using SHA-256, Digital Signing using SHA- 256 with RSĂ
Protect the S3 bucket using bucket policy, versioning, MFA Delete protection, encryption, object lock
Protect files using IAM

Q. To ensure that SysOps administrators can easily verify that the CloudTrail log files have not been deleted or changed, the following action should be taken:
Enable CloudTrail log file integrity validation when the trail is created or updated.

Explanation: Enabling CloudTrail log file integrity validation ensures that the log files are protected against tampering or unauthorized modification. CloudTrail uses SHA-256 hashes to validate the integrity of the log files stored in Amazon S3. By enabling this feature, the SysOps administrators can easily verify the integrity of the log files and ensure that they have not been deleted or changed

Cloud Trail - Integration with EventBridge AWS CloudTrail

Used to react to any API call being made in your account
Cloud Trail is not “real-time”:
- Delivers an event within 15 minutes of an API call
- Delivers log files to an S3 bucket every 5 minutes

CloudTrail - Organizations Trails

A trail that will log all events for all AWS accounts in an AWS Organization
Log events for management and member accounts
Trail with the same name will be created in every AWS account (IAM permissions)
Member accounts can’t remove or modify the organization trail (view only)

`👀` AWS Config

Helps with auditing and recording compliance of your AWS resources.
Helps record configurations and changes over time. Questions that can be solved by AWS Config:
- Is there unrestricted SSH access to my security groups?
- Do my buckets have any public access?
- How has my ALB configuration changed over time?
You can receive alerts (SNS notifications) for any changes
AWS Config is a per-region service.
Can be aggregated across regions and accounts.
Possibility of storing the configuration data into S3 (analyzed by Athena)

AWS Config keeps track of the configuration of your AWS resources and their relationships to other resources. It can also evaluate those AWS resources for compliance. This service uses rules that can be configured to evaluate AWS resources against desired configurations.

For example,

can track changes to CloudFormation stacks.

AWS Config can track changes to CloudFormation stacks. A CloudFormation stack is a collection of AWS resources that you can manage as a single unit. With AWS Config, you can review the historical configuration of your CloudFormation stacks and review all changes that occurred to them.

For more information about how AWS Config can track changes to CloudFormation deployments, see cloudformation-stack-drift-detection-check.

there are AWS Config rules that check whether or not your Amazon S3 buckets have logging enabled or your IAM users have an MFA device enabled.

👀 AWS Config is a service that enables you to assess, audit, and evaluate the configurations of your AWS resources. It provides detailed inventory and configuration history of your resources, as well as configuration change notifications. With AWS Config, you can track the configuration of your S3 bucket, including its bucket policy.

AWS Config rules use AWS Lambda functions to perform the compliance evaluations, and the Lambda functions return the compliance status of the evaluated resources as compliant or noncompliant. The non-compliant resources are remediated using the remediation action associated with the AWS Config rule. With the Auto-Remediation feature of AWS Config rules, the remediation action can be executed automatically when a resource is found non-compliant.

AWS Config Auto Remediation feature has auto remediate feature for any non-compliant S3 buckets using the following AWS Config rules: s3-bucket-logging-enabled s3-bucket-server-side-encryption-enabled s3-bucket-public-read-prohibited s3-bucket-public-write-prohibited These AWS Config rules act as controls to prevent any non-compliant S3 activities.

Config Rules

AWS Config provides a number of AWS managed rules that address a wide range of security concerns such as checking if you encrypted your Amazon Elastic Block Store (Amazon EBS) volumes, tagged your resources appropriately, and enabled multi-factor authentication (MFA) for root accounts.

Can use AWS managed config rules (over 75)
Can make custom config rules (must be defined in AWS Lambda).
- Ex: evaluate if each EBS disk is of type gp2
- Ex: evaluate if each EC2 instance is t2.micro
Rules can be evaluated / triggered:
- For each config change
- And / or: at regular time intervals
AWS Config Rules does not prevent actions from happening (no deny).

Managed rules:

require-tags: managed rule in AWS Config. This rule checks if a resource contains the tags that you specify.

Config Rules - Remediations

Has auto remediate feature for any non-compliant S3 buckets using the following AWS Config rules:

s3-bucket-logging-enabled s3-bucket-server-side-encryption-enabled s3-bucket-public-read-prohibited s3-bucket-public-write-prohibited

These AWS Config rules act as controls to prevent any non-compliant S3 activities.

Automate remediation of non-compliant resources using SSM Automation Documents.
Use AWS-Managed Automation Documents or create custom Automation Documents
- Tip: you can create custom Automation Documents that invokes Lambda function.
You can set Remediation Retries if the resource is still non-compliant after autoremediation.

AWS Config Auto Remediation

Config Rules - Notifications

Use EventBridge to trigger notifications when AWS resources are noncompliant
Ability to send configuration changes and compliance state notifications to SNS (all events - use SNS Filtering or filter at client-side)
👀 QUESTION
If there are EC2s that are terminated in an environment, you should use the [EIP-attached Config rule](https://docs.aws.amazon.com/config/latest/developerguide/eip-attached.html) to find EIPs that are unattached in your environment.

AWS Config - Aggregators

The aggregator is created in one central aggregator account.
Aggregates rules, resources, etc... across multiple accounts & regions.
If using AWS Organizations, no need for individual Authorization
Rules are created in each individual source AWS account
Can deploy rules to multiple target accounts using CloudFormation StackSets

CloudWatch vs CloudTrail vs Config

CloudWatch
- Performance monitoring (metrics, CPU, network, etc…) & dashboards
- Events & Alerting
- Log Aggregation & Analysis
CloudTrail
- Record API calls made within your Account by everyone
- Can define trails for specific resources
- Global Service
Config
- Record configuration changes
- Evaluate resources against compliance rules
- Get timeline of changes and compliance

AWS Task Orchestrator and Executor (AWSTOE) - 👀 EXAM

Use the AWS Task Orchestrator and Executor (AWSTOE) application to orchestrate complex workflows, modify system configurations, and test your systems without writing code. This application uses a declarative document schema. Because it is a standalone application, it does not require additional server setup.

`AWS Artifact - 👀 EXAM

AWS Artifact keeps compliance-related reports and agreements.

RDS

Advantage over using RDS versus deploying

RDS is a managed service:
- Automated provisioning, OS patching
- Continuous backups and restore to specific timestamp (Point in Time Restore)!
- Monitoring dashboards
- Read replicas for improved read performance
- Multi AZ setup for DR (Disaster Recovery)
- Maintenance windows for upgrades
- Scaling capability (vertical and horizontal)
- Storage backed by EBS (gp2 or io1)
BUT you can’t SSH into your instances

RDS Read Replicas for read scalability

Up to 15 Read Replicas
Within AZ, Cross AZ or Cross Region.
Replication is ASYNC.
Replicas can be promoted to their own DB.

RDS Read Replicas - Network Cost

In AWS there’s a network cost when data goes from one AZ to another
For RDS Read Replicas within the same region, you don’t pay that fee.

RDS Multi AZ (Disaster Recovery)

SYNC replication.
One DNS name - automatic app failover to standby
Increase availability.
Failover in case of loss of AZ, loss of network, instance or storage failure

👀 Exam - The Read Replicas be setup as Multi AZ for Disaster Recovery (DR).

Lambda in VPC

You must define the VPC ID, the Subnets and the Security Groups
Lambda will create an ENI (Elastic Network Interface) in your subnets
AWSLambdaVPCAccessExecutionRole

RDS Proxy for AWS Lambda

When using Lambda functions with RDS, it opens and maintains a database connection
This can result in a “TooManyConnections” exception
With RDS Proxy, you no longer need code that handles cleaning up idle connections and managing connection pools

DB Parameter Groups

You can configure the DB engine using Parameter Groups
Dynamic parameters are applied immediately
Static parameters are applied after instance reboot
You can modify parameter group associated with a DB (must reboot)
Must-know parameter:
- PostgreSQL / SQL Server: rds.force_ssl=1 => force SSL connections
- MySQL / MariaDB: require_secure_transport=1 => force SSL connections

RDS Events & Event Subscriptions

RDS keeps record of events related to:

DB instances
Snapshots
Parameter groups, security groups …
RDS Event Subscriptions
- Subscribe to events to be notified when an event occurs using SNS
- Specify the Event Source (instances, SGs, …) and the Event Category (creation, failover, …)
RDS delivers events to EventBridge

RDS with CloudWatch

CloudWatch metrics associated with RDS (gathered from the hypervisor):

DatabaseConnections
SwapUsage
ReadIOPS / WriteIOPS
ReadLatency / WriteLatency
ReadThroughPut / WriteThroughPut
DiskQueueDepth
FreeStorageSpace - To monitor the available storage space for an RDS DB instance
BinLogDiskUsage: Tracks the amount of disk space occupied by binary logs on the master.
FreeableMemory: Tracks the amount of available random access memory and not the available storage space.
DiskQueueDepth: Provides the number of outstanding IOs (read/write requests) waiting to access the disk.
Enhanced Monitoring (gathered from an agent on the DB instance). 👀
- Useful when you need to see how different processes or threads use the CPU.
Access to over 50 new CPU, memory, file system, and disk I/O metrics

Amazon RDS provides metrics in real time for the operating system (OS) that your DB instance runs on. You can view the metrics for your DB instance using the console. Also, you can consume the `Enhanced Monitoring`` JSON output from Amazon CloudWatch Logs in a monitoring system of your choice.

RDS storage autoscaling - `👀` EXAM

With RDS storage autoscaling, you can set the desired maximum storage limit. Autoscaling will manage the storage size. RDS storage autoscaling monitors actual storage consumption and then scales capacity automatically when actual utilization approaches the provisioned storage capacity.

Amazon Aurora DB

Aurora is a proprietary technology from AWS (not open sourced)
Postgres and MySQL are both supported as Aurora DB (that means your drivers will work as if Aurora was a Postgres or MySQL database)
Aurora is “AWS cloud optimized” and claims 5x performance improvement over MySQL on RDS, over 3x the performance of Postgres on RDS
Aurora storage automatically grows in increments of 10GB, up to 128 TB.
Aurora can have up to 15 replicas and the replication process is faster than MySQL (sub 10 ms replica lag)
Failover in Aurora is instantaneous. It’s HA (High Availability) native.
Aurora costs more than RDS (20% more) - but is more efficient

Aurora High Availability and Read Scaling

One Aurora Instance takes writes (master)
Support for Cross Region Replication

Shared storage Volume: Replication + Self Healing + Auto expanding

Reader Endpoint Connection Load Balancing

RDS & Aurora Security

At-rest encryption:
- Database master & replicas encryption using AWS KMS - must be defined as launch time
- If the master is not encrypted, the read replicas cannot be encrypted To encrypt an un-encrypted database, go through a DB snapshot & restore as encrypted
In-flight encryption: TLS-ready by default, use the AWS TLS root certificates client-side
IAM Authentication: IAM roles to connect to your database (instead of username/pw)
Security Groups: Control Network access to your RDS / Aurora DB
No SSH available except on RDS Custom
Audit Logs can be enabled and sent to CloudWatch Logs for longer retention

Aurora for SysOps

You can associate a priority tier (0-15) on each Read Replica
- Controls the failover priority
- RDS will promote the Read Replica with the highest priority (lowest tier)
- If replicas have the same priority, RDS promotes the largest in size
- If replicas have the same priority and size, RDS promotes arbitrary replica
You can migrate an RDS MySQL snapshot to Aurora MySQL Cluster

Connect to Amazon Aurora DB cluster from outside a VPC

To connect to an Amazon Aurora DB cluster directly from outside the VPC, the instances in the cluster must meet the following requirements:

The DB instance must have a public IP address.
The DB instance must be running in a publicly accessible subnet.

For Amazon Aurora DB instances, you can’t choose a specific subnet. Instead, choose a DB subnet group when you create the instance. Create a DB subnet group with subnets of similar network configuration. For example, a DB subnet group for Public subnets.

Aurora Replicas - TODO

Aurora Replicas are independent endpoints in an Aurora DB cluster, best used for scaling read operations and increasing availability. Up to 15 Aurora Replicas can be distributed across the Availability Zones that a DB cluster spans within an AWS Region. The DB cluster volume is made up of multiple copies of the data for the DB cluster. However, the data in the cluster volume is represented as a single, logical volume to the primary instance and to Aurora Replicas in the DB cluster.

Alternatively, you can also use Amazon Aurora Multi-Master which is a feature of the Aurora MySQL-compatible edition that adds the ability to scale out write performance across multiple Availability Zones, allowing applications to direct read/write workloads to multiple instances in a database cluster and operate with higher availability.

Metrics to generate reports on the Aurora DB Cluster and its replicas

AuroraReplicaLagMaximum - This metric captures the maximum amount of lag between the primary instance and each Aurora DB instance in the DB cluster.
AuroraBinlogReplicaLag - This metric captures the amount of time a replica DB cluster running on Aurora MySQL-Compatible Edition lags behind the source DB cluster. This metric reports the value of the Seconds_Behind_Master field of the MySQL SHOW SLAVE STATUS command. This metric is useful for monitoring replica lag between Aurora DB clusters that are replicating across different AWS Regions.
AuroraReplicaLag - This metric captures the amount of lag an Aurora replica experiences when replicating updates from the primary instance.
InsertLatency - This metric captures the average duration of insert operations.

Aurora Reader Endpoint - 👀 EXAM

To perform queries, you can connect to the reader endpoint, with Aurora automatically performing load-balancing among all the Aurora Replicas.

A reader endpoint for an Aurora DB cluster provides load-balancing support for read-only connections to the DB cluster. Use the reader endpoint for read operations, such as queries. By processing those statements on the read-only Aurora Replicas, this endpoint reduces the overhead on the primary instance. It also helps the cluster to scale the capacity to handle simultaneous SELECT queries, proportional to the number of Aurora Replicas in the cluster. Each Aurora DB cluster has one reader endpoint.

Reference:Amazon Aurora connection management

Amazon ElastiCache Overview

The same way RDS is to get managed Relational Databases…
ElastiCache is to get managed Redis or Memcached
Caches are in-memory databases with really high performance, low latency
Helps reduce load off of databases for read intensive workloads
Helps make your application stateless
AWS takes care of OS maintenance / patching, optimizations, setup, configuration, monitoring, failure recovery and backups
```
Using ElastiCache involves heavy application code changes
```

ElastiCache Replication (Redis): Cluster Mode Disabled

One primary node, up to 5 replicas
Asynchronous Replication.
- Therefore, when a primary node fails over to a replica, a small amount of data might be lost due to replication lag.
The primary node is used for read/write
The other nodes are read-only
One shard, all nodes have all the data
Guard against data loss if node failure
Multi-AZ enabled by default for failover
Helpful to scale read performance
Horizontal and vertical

ElastiCache Replication: Cluster Mode Enabled

Data is partitioned across shards (helpful to scale writes)

Automatically increase/decrease the desired shards or replicas
Supports both Target Tracking and Scheduled Scaling Policies
Works only for Redis with Cluster Mode Enabled

Memcached

Fix high Memcached evictions

To fix the issue of high Memcached evictions in Amazon ElastiCache, the following actions should be taken:

Increase the size of the nodes in the cluster: This allows for more available memory in each node, reducing the likelihood of evictions due to limited cache space.
Increase the number of nodes in the cluster: By adding more nodes, the overall cache capacity increases, reducing the chance of evictions.

The Evictions metric for Amazon ElastiCache for Memcached represents the number of nonexpired items that the cache evicted to provide space for new items. If you are experiencing evictions with your cluster, it is usually a sign that you need to scale up (use a node that has a larger memory footprint) or scale out (add additional nodes to the cluster) to accommodate the additional data

VPC

With Amazon Virtual Private Cloud (Amazon VPC), you can launch AWS resources in a logically isolated virtual network that you’ve defined. This virtual network closely resembles a traditional network that you’d operate in your own data center, with the benefits of using the scalable infrastructure of AWS.

Configuration

Regardless of the type of subnet, the internal IPv4 address range of the subnet is always private. AWS never announces these address blocks to the internet.

When you create a VPC, you must specify a range of IPv4 addresses for the VPC in the form of a Classless Inter-Domain Routing (CIDR) block; for example, 10.0.0.0/16. This is the primary CIDR block for your VPC.

Subnets created in a VPC can communicate with each other, this is default behaviour. The main route table facilitates this communication.

Reference: How Amazon VPC works

VCP Diagram

CIDR - IPv4

Classless Inter-Domain Routing - a method for allocating IP addresses
Used in Security Groups rules and AWS networking in general
A CIDR consists of two components
Base IP
- Represents an IP contained in the range (XX.XX.XX.XX)
- Example: 10.0.0.0, 192.168.0.0, …
Subnet Mask
- Defines how many bits can change in the IP
- Example: /0, /24, /32
- Can take two forms:
  - /8 ó 255.0.0.0
  - /16 ó 255.255.0.0
  - /24 ó 255.255.255.0
  - /32 ó 255.255.255.255

Public vs. Private IP (IPv4)

The Internet Assigned Numbers Authority (IANA) established certain blocks of IPv4 addresses for the use of private (LAN) and public (Internet) addresses
Private IP can only allow certain values:
- 10.0.0.0 - 10.255.255.255 (10.0.0.0/8) <- in big networks
- 172.16.0.0 - 172.31.255.255 (172.16.0.0/12) <- AWS default VPC in that range
- 192.168.0.0 - 192.168.255.255 (192.168.0.0/16) <- e.g., home networks
All the rest of the IP addresses on the Internet are Public

VPC in AWS - IPv4

VPC = Virtual Private Cloud
You can have multiple VPCs in an AWS region (max. 5 per region - soft limit)
- Max. CIDR per VPC is 5, for each CIDR:
- Min. size is /28 (16 IP addresses)
Max. size is /16 (65536 IP addresses)
Because VPC is private, only the Private IPv4 ranges are allowed:
- 10.0.0.0 - 10.255.255.255 (10.0.0.0/8)
- 172.16.0.0 - 172.31.255.255 (172.16.0.0/12)
- 192.168.0.0 - 192.168.255.255 (192.168.0.0/16)
Your VPC CIDR should NOT overlap with your other networks (e.g., corporate)

VPC - Subnet (IPv4)

AWS reserves 5 IP addresses (first 4 & last 1) in each subnet
These 5 IP addresses are not available for use and can’t be assigned to anEC2 instance
Example: if CIDR block 10.0.0.0/24, then reserved IP addresses are:
- 10.0.0.0 - Network Address
- 10.0.0.1 - reserved by AWS for the VPC router
- 10.0.0.2 - reserved by AWS for mapping to Amazon-provided DNS
- 10.0.0.3 - reserved by AWS for future use
- 10.0.0.255 - Network Broadcast Address. AWS does not support broadcast in a VPC, therefore the address is reserved
Exam Tip, if you need 29 IP addresses for EC2 instances:
- You can’t choose a subnet of size /27 (32 IP addresses, 32 - 5 = 27 < 29)
- You need to choose a subnet of size /26 (64 IP addresses, 64 - 5 = 59 > 29)

Internet Gateway (IGW)

Allows resources (e.g., EC2 instances) in a VPC connect to the Internet
It scales horizontally and is highly available and redundant
Must be created separately from a VPC
One VPC can only be attached to one IGW and vice versa
Internet Gateways on their own do not allow Internet access…
Route tables must also be edited!

Bastion Hosts

Is an ec2 instance, it’s espcial because it’s in a public subnet, with its segurity group

We can use a Bastion Host to SSH into our private EC2 instances
The bastion is in the public subnet which is then connected to all other private subnets
Bastion Host security group must allow inbound from the internet on port 22 from restricted CIDR, for example the public CIDR of your corporation
Security Group of the EC2 Instances must allow the Security Group of the Bastion Host, or the private IP of the Bastion host

NAT Instance (outdated, but still at the exam)

NAT = Network Address Translation
Allows EC2 instances in private subnets toconnect to the Internet
Must be launched in a public subnet
Must disable EC2 setting: Source / destination Check
Must have Elastic IP attached to it
Route Tables must be configured to route traffic from private subnets to the NAT instance

NAT Gateway

AWS-managed NAT, higher bandwidth, high availability, no administration
Pay per hour for usage and bandwidth
NATGW is created in a specific Availability Zone, uses an Elastic IP
Can’t be used by EC2 instance in the same subnet (only from other subnets)
Requires an IGW (Private Subnet => NATGW => IGW)
5 Gbps of bandwidth with automatic scaling up to 45 Gbps
No Security Groups to manage / required

NAT Gateway with High Availability

NAT Gateway is resilient within a single Availability Zone
Must create multiple NAT Gateways in multiple AZs for fault-tolerance
There is no cross-AZ failover needed because if an AZ goes down it doesn’t need NAT

Connect the Lambda function to a private subnet that has a route to a NAT gateway deployed in a public subnet of the VPC.

Explanation: By connecting the Lambda function to a private subnet with a route to a NAT gateway, the function can access resources within the VPC while also leveraging the NAT gateway to access the internet and communicate with third-party APIs. The NAT gateway acts as a bridge between the private subnet and the internet, allowing the Lambda function to securely access external resources.

DNS Resolution in VPC

DNS Resolution (enableDnsSupport)
- Decides if DNS resolution from Route 53 Resolver server is supported for the VPC
- True (default): it queries the Amazon Provider DNS Server at 169.254.169.253 or the reserved IP address at the base of the VPC IPv4 network range plus two (.2).

enableDnsSupport - Indicates whether the DNS resolution is supported for the VPC. If this attribute is false, the Amazon-provided DNS server in the VPC that resolves public DNS hostnames to IP addresses is not enabled. If this attribute is true, queries to the Amazon provided DNS server at the 169.254.169.253 IP address, or the reserved IP address at the base of the VPC IPv4 network range plus two will succeed.

DNS Hostnames (enableDnsHostnames)
- By default,
- True => default VPC
- False => newly created VPCs
Won’t do anything unless enableDnsSupport=true
If True, assigns public hostname to EC2 instance if it has a public IPv4

enableDnsHostnames - Indicates whether the instances launched in the VPC get public DNS hostnames. If this attribute is true, instances in the VPC get public DNS hostnames, but only if the enableDnsSupport attribute is also set to true.

By default, both attributes are set to true in a default VPC or a VPC created by the VPC wizard. By default, only the enableDnsSupport attribute is set to true in a VPC created on the Your VPCs page of the VPC console or using the AWS CLI, API, or an AWS SDK.

DNS Resolution in VPC

If you use custom DNS domain names in a Private Hosted Zone in Route 53, you must set both these attributes (enableDnsSupport & enableDnsHostname) to true

Network Access Control List (NACL)

NACL are like a firewall which control traffic from and to subnets
One NACL per subnet, new subnets are assigned the Default NACL
You define NACL Rules:
- Rules have a number (1-32766), higher precedence with a lower number
- First rule match will drive the decision
- Example: if you define #100 ALLOW 10.0.0.10/32 and #200 DENY 10.0.0.10/32, the IP address will be allowed because 100 has a higher precedence over 200
- The last rule is an asterisk (*) and denies a request in case of no rule match
- AWS recommends adding rules by increment of 100
Newly created NACLs will deny everything
NACL are a great way of blocking a specific IP address at the subnet level

Default NACL

Accepts everything inbound/outbound with the subnets it’s associated with

Ephemeral Ports

For any two endpoints to establish a connection, they must use ports
Clients connect to a defined port, and expect a response on an ephemeral port
Different Operating Systems use different port ranges, examples:
- IANA & MS Windows 10 -> 49152 - 65535
- Many Linux Kernels -> 32768 - 60999

Security Group	NACL
Operates at the instance level	Operates at the subnet level
Supports allow rules only	Supports allow rules and deny rules
`Stateful`: return traffic is automatically allowed, regardless of any rules	`Stateless`: return traffic must be explicitly allowed by rules (think of ephemeral ports)
All rules are evaluated before deciding whether to allow traffic	Rules are evaluated in order (lowest to highest) when deciding whether to allow traffic, first match wins
Applies to an EC2 instance when specified by someone	Automatically applies to all EC2 instances in the subnet that it’s associated with

VPC - Reachability Analyzer

A network diagnostics tool that troubleshoots network connectivity between two endpoints in your VPC(s)
It builds a model of the network configuration, then checks the reachability based on these configurations (it doesn’t send packets)
When the destination is
- Reachable - it produces hop-by-hop details of the virtual network path
- Not reachable - it identifies the blocking component(s) (e.g., configuration issues in SGs, NACLs, Route Tables, …)
Use cases: troubleshoot connectivity issues, ensure network configuration is as intended, …

VPC Peering

Privately connect two VPCs using AWS’ network
Make them behave as if they were in the same network
Must not have overlapping CIDRs
VPC Peering connection is NOT transitive (must be established for each VPC that need to communicate with one another)
You must update route tables in each VPC’s subnets to ensure EC2 instances can communicate with each other
You can create VPC Peering connection between VPCs in different AWS accounts/regions
You can reference a security group in a peered VPC (works cross accounts - same region)

VPC Endpoint (AWS PrivateLink)

A VPC Endpoint allows you to connect your VPC directly to AWS services without the need for internet gateways, NAT gateways, or VPN connections. It enables private communication between your VPC and the AWS service without going over the internet.

Most secure & scalable way to expose a service to 1000s of VPC (own or other accounts)
Does `not require VPC peering, internet gateway, NAT, route tables… (magical..)
Requires a network load balancer (Service VPC) and ENI (Customer VPC) or GWLB
If the NLB is in multiple AZ, and the ENIs in multiple AZ, the solution is fault tolerant!

To configure a VPC Endpoint for accessing AWS Systems Manager APIs, you can follow these steps:

Create a VPC Endpoint for AWS Systems Manager in your Amazon VPC. This creates an elastic network interface with a private IP address within your VPC.
Update the route tables in your VPC to route traffic destined for the AWS Systems Manager API endpoints to the VPC Endpoint. This ensures that traffic is directed through the VPC Endpoint instead of going over the internet.
Verify that your on-premises instances and AWS managed instances are configured to use the appropriate VPC and route tables.

Types of Endpoints

Interface Endpoints (powered by PrivateLink)
- Provisions an ENI (private IP address) as an entry point (must attach a Security Group)
- Supports most AWS services
- $ per hour + $ per GB of data processed
Gateway Endpoints
- Provisions a gateway and must be used as a target in a route table (does not use security groups)
- Supports both S3 and DynamoDB
- Free

Gateway or Interface Endpoint for S3?

Gateway is most likely going to be preferred all the time at the exam.
Cost: free for Gateway, $ for interface endpoint
Interface Endpoint is preferred access is required from on premises (Site to Site VPN or Direct Connect), a different VPC or a different region.

VPC Flow Logs

Capture information about IP traffic going into your interfaces:
- VPC Flow Logs
- Subnet Flow Logs
- Elastic Network Interface (ENI) Flow Logs
Helps to monitor & troubleshoot connectivity issues
Flow logs data can go to S3, CloudWatch Logs, and Kinesis Data Firehose
Captures network information from AWS managed interfaces too: ELB, RDS, ElastiCache, Redshift, WorkSpaces, NATGW, Transit Gateway…

VPC Flow Logs Syntax

srcaddr & dstaddr - help identify problematic IP
srcport & dstport - help identity problematic ports
Action - success or failure of the request due to Security Group / NACL
Can be used for analytics on usage patterns, or malicious behavior
Query VPC flow logs using Athena on S3 or CloudWatch Logs Insights
Flow Logs examples: https://docs.aws.amazon.com/vpc/latest/userguide/flow-logs-records-examples.html

AWS Site-to-Site VPN

Virtual Private Gateway (VGW)
- VPN concentrator on the AWS side of the VPN connection
- VGW is created and attached to the VPC from which you want to create the Site-to-Site VPN connection
- Possibility to customize the ASN (Autonomous System Number)
Customer Gateway (CGW)
- Software application or physical device on customer side of the VPN connection
- https://docs.aws.amazon.com/vpn/latest/s2svpn/your-cgw.html#DevicesTested

AWS Site-to-Site VPN

Site-to-Site VPN Connections

Customer Gateway Device (On-premises)
- 👀 What IP address to use?
  - Public Internet-routable IP address for your Customer Gateway device
  - If it’s behind a NAT device that’s enabled for NAT traversal (NAT-T), use the public IP address of the NAT device
👀 Important step: enable Route Propagation for the Virtual Private Gateway in the route table that is associated with your subnets
👀 If you need to ping your EC2 instances from on-premises, make sure you add the ICMP protocol on the inbound of your security groups.

AWS VPN CloudHub

Provide secure communication between multiple sites, if you have multiple VPN connections
Low-cost hub-and-spoke model for primary or secondary network connectivity between different locations (VPN only)
It’s a VPN connection so it goes over the public Internet
To set it up, connect multiple VPN connections on the same VGW, setup dynamic routing and configure route tables

To create a VPN

Create Customer Gateway
Create Virtual Private Gateway
Use Site-to-Site VPN connection for both VGW and customers Gateway.

Direct Connect (DX)

Provides a dedicated private connection from a remote network to your VPC.
Dedicated connection must be setup between your DC and AWS Direct Connect locations
You need to setup a Virtual Private Gateway on your VPC
Access public resources (S3) and private (EC2) on same connection
Use Cases:
- Increase bandwidth throughput - working with large data sets - lower cost
- More consistent network experience - applications using real-time data feeds
- Hybrid Environments (on prem + cloud)
Supports both IPv4 and IPv6

Direct Connect

Direct Connect Gateway

If you want to setup a Direct Connect to one or more VPC in many different regions (same account), you must use a Direct Connect Gateway. - Exam 👀

Direct Connect Gateway

Direct Connect - Connection Types

Dedicated Connections: 1Gbps,10 Gbps and 100 Gbps capacity
- Physical ethernet port dedicated to a customer
- Request made to AWS first, then completed by AWS Direct Connect Partners
Hosted Connections: 50Mbps, 500 Mbps, to 10 Gbps
- Connection requests are made via AWS Direct Connect Partners
- Capacity can be added or removed on demand
- 1, 2, 5, 10 Gbps available at select AWS Direct Connect Partners
Lead times are often longer than 1 month to establish a new connection - 👀 EXAM

Direct Connect - Encryption

Data in transit is not encrypted but is private
AWS Direct Connect + VPN provides an IPsec-encrypted private connection

Direct Connect - Resiliency

High Resiliency for Critical Workloads

Maximum Resiliency for Critical Workloads

Site-to-Site VPN connection as a backup

In case Direct Connect fails, you can set up a backup Direct Connect connection (expensive), or a Site-to-Site VPN connection.

Transit Gateway

For having transitive peering between thousands of VPC and on-premises, hub-and-spoke (star) connection
Regional resource, can work cross-region
Share cross-account using Resource Access Manager (RAM)
You can peer Transit Gateways across regions
Route Tables: limit which VPC can talk with other VPC
Works with Direct Connect Gateway, VPN connections
Supports IP Multicast (not supported by any other AWS service)1

Transit Gateway: Site-to-Site VPN ECMP

ECMP = Equal-cost multi-path routing.
Routing strategy to allow to forward a packet over multiple best path.

Use case: create multiple Site-to-Site VPN connections to increase the bandwidth of your connection to AWS.

VPC - Traffic Mirroring

Capture the traffic
- From (Source) - ENIs
- To (Targets) - an ENI or a Network Load Balancer
Source and Target can be in the same VPC or different VPCs (VPC Peering)
Use cases: content inspection, threat Auto Scaling group monitoring, troubleshooting, …

IPv6 in VPC

IPv4 cannot be disabled for your VPC and subnets
You can enable IPv6 (they’re public IP addresses) to operate in dual-stack mode.
Your EC2 instances
They can communicate using either IPv4 or IPv6 to the internet through an Internet Gatewaywill get at least a private internal IPv4 and a public IPv6

IPv6 Troubleshooting

IPv4 cannot be disabled for your VPC and subnets
So, if you cannot launch an EC2 instance in your subnet
- It’s not because it cannot acquire an IPv6 (the space is very large)
- It’s because there are no available IPv4 in your subnet
Solution: create a new IPv4 CIDR in your subnet

Egress-only Internet Gateway

An egress-only internet gateway is a horizontally scaled, redundant, and highly available VPC component that allows outbound communication over IPv6 from instances in your VPC to the internet, and prevents the internet from initiating an IPv6 connection with your instances. You must update the Route Tables

An egress-only internet gateway is for use with IPv6 traffic only. To enable outbound-only internet communication over IPv4, use a NAT gateway instead.

Reference: Enable outbound IPv6 traffic using an egress-only internet gateway

Carrier gateway

A Carrier gateway is a highly available virtual appliance that provides outbound IPv6 internet connectivity for instances in your VPC. It acts as a gateway between your VPC and the internet, allowing IPv6 traffic to flow in and out of your VPC. By configuring a Carrier gateway, you can enable outbound communication over IPv6 for the EC2 instances in the private subnets while keeping them isolated from direct internet access.

Security Groups

By default, security groups allow all outbound traffic.
Security group rules are always permissive; you can’t create rules that deny access.
Security groups are stateful

The reason for the issue where the new EC2 instances are unable to mount the Amazon EFS file system in a new Availability Zone could be:

The security group for the mount target does not allow inbound NFS connections from the security group used by the EC2 instances.

Explanation: When mounting an Amazon EFS file system from EC2 instances, the security group associated with the mount target should allow inbound NFS (Network File System) connections from the security group used by the EC2 instances. By default, the security group associated with the mount target allows inbound connections from the default security group of the VPC. If the EC2 instances are using a different security group, it needs to be added to the mount target’s security group’s inbound rules to allow NFS connections.

👀 - Only support allow rules. You have to allow incoming traffic from your customers to your instances

The following provides an overview of the steps to enable your VPC and subnets to use IPv6:

Step 1: Associate an IPv6 CIDR Block with Your VPC and Subnets - Associate an Amazon-provided IPv6 CIDR block with your VPC and with your subnets.
Step 2: Update Your Route Tables - Update your route tables to route your IPv6 traffic. For a public subnet, create a route that routes all IPv6 traffic from the subnet to the Internet gateway. For a private subnet, create a route that routes all Internet-bound IPv6 traffic from the subnet to an egress-only Internet gateway.
Step 3: Update Your Security Group Rules - Update your security group rules to include rules for IPv6 addresses. This enables IPv6 traffic to flow to and from your instances. If you’ve created custom network ACL rules to control the flow of traffic to and from your subnet, you must include rules for IPv6 traffic.
Step 4: Change Your Instance Type - If your instance type does not support IPv6, change the instance type. If your instance type does not support IPv6, you must resize the instance to a supported instance type. In the example, the instance is an m3.large instance type, which does not support IPv6. You must resize the instance to a supported instance type, for example, m4.large.
Step 5: Assign IPv6 Addresses to Your Instances - Assign IPv6 addresses to your instances from the IPv6 address range of your subnet.
Step 6: (Optional) Configure IPv6 on Your Instances - If your instance was launched from an AMI that is not configured to use DHCPv6, you must manually configure your instance to recognize an IPv6 address assigned to the instance.

VPC Section Summary

CIDR - IP Range
VPC - Virtual Private Cloud => we define a list of IPv4 & IPv6 CIDR
Subnets - tied to an AZ, we define a CIDR
Internet Gateway - at the VPC level, provide IPv4 & IPv6 Internet Access
Route Tables - must be edited to add routes from subnets to the IGW, VPC Peering Connections, VPC Endpoints, …
Bastion Host - public EC2 instance to SSH into, that has SSH connectivity to EC2 instances in private subnets
NAT Instances - gives Internet access to EC2 instances in private subnets. Old, must be setup in a public subnet, disable Source / Destination check flag
NAT Gateway - managed by AWS, provides scalable Internet access to private EC2 instances, IPv4 only
Private DNS + Route 53 - enable DNS Resolution + DNS Hostnames (VPC) - NACL - stateless, subnet rules for inbound and outbound, don’t forget Ephemeral Ports
Security Groups - stateful, operate at the EC2 instance level
Reachability Analyzer - perform network connectivity testing between AWS resources
VPC Peering - connect two VPCs with non overlapping CIDR, non-transitive
VPC Endpoints - provide private access to AWS Services (S3, DynamoDB, CloudFormation, SSM) within a VPC
VPC Flow Logs - can be setup at the VPC / Subnet / ENI Level, for ACCEPT and REJECT traffic, helps identifying attacks, analyze using Athena or CloudWatch Logs Insights
Site-to-Site VPN - setup a Customer Gateway on DC, a Virtual Private Gateway on VPC, and site-to-site VPN over public Internet
AWS VPN CloudHub - hub-and-spoke VPN model to connect your sites
Direct Connect - setup a Virtual Private Gateway on VPC, and establish a direct private connection to an AWS Direct Connect Location
Direct Connect Gateway - setup a Direct Connect to many VPCs in different AWS regions
AWS PrivateLink / VPC Endpoint Services:
- Connect services privately from your service VPC to customers VPC
- Doesn’t need VPC Peering, public Internet, NAT Gateway, Route Tables
- Must be used with Network Load Balancer & ENI
ClassicLink - connect EC2-Classic EC2 instances privately to your VPC
Transit Gateway - transitive peering connections for VPC, VPN & DX
Traffic Mirroring - copy network traffic from ENIs for further analysis
Egress-only Internet Gateway - like a NAT Gateway, but for IPv6

Networking Costs in AWS per GB

Use Private IP instead of Public IP for good savings and better network performance
Use same AZ for maximum savings (at the cost of high availability) - Exam 👀

S3 Data Transfer Pricing - Analysis for USA

S3 ingress: free
S3 to Internet: $0.09 per GB
S3 Transfer Acceleration:
- Faster transfer times (50 to 500% better)
- Additional cost on top of Data Transfer Pricing: +$0.04 to $0.08 per GB
S3 to CloudFront: $0.00 per GB
CloudFront to Internet: $0.085 per GB (slightly cheaper than S3)
- Caching capability (lower latency)
- Reduce costs associated with S3 Requests Pricing (7x cheaper with CloudFront)
S3 Cross Region Replication: $0.02 per GB

AWS Network Firewall - 👀 OJO QUESTION

Protect your entire Amazon VPC
- Pass traffic through only from known AWS service domains or IP address endpoints, such as Amazon S3.
- Use custom lists of known bad domains to limit the types of domain names that your applications can access.
- Perform deep packet inspection DPI on traffic entering or leaving your VPC.
- Use stateful protocol detection to filter protocols like HTTPS, independent of the port used.
- From Layer 3 to Layer 7 protection
Any direction, you can inspect
- VPC to VPC traffic
- Outbound to internet
- Inbound from internet
- To / from Direct Connect & Site-to-Site VPN
Internally, the AWS Network Firewall uses the AWS Gateway Load Balancer 👀
Rules can be centrally managed cross-account by AWS Firewall Manager to apply to many VPCs

Network Firewall - Fine Grained Controls

Supports 1000s of rules
IP & port - example: 10,000s of IPs filtering
Protocol - example: block the SMB protocol for outbound communications
Stateful domain list rule groups: only allow outbound traffic to *.mycorp.com or third-party software repo
General pattern matching using regex
Traffic filtering: Allow, drop, or alert for the traffic that matches the rules
Active flow inspection to protect against network threats with intrusion-prevention capabilities (like Gateway Load Balancer, but all managed by AWS)
Send logs of rule matches to Amazon S3, CloudWatch Logs, Kinesis Data Firehose

CloudFormation

cfn-init

AWS::CloudFormation::Init must be in the Metadata of a resource
With the cfn-init script, it helps make complex EC2 configurations readable
The EC2 instance will query the CloudFormation service to get init data
Logs go to /var/log/cfn-init.log

(More readable compared with user data scripts)

UserData:
  Fn::Base64:
    !Sub |
      #!/bin/bash -xe
      # Get the latest CloudFormation package
      yum update -y aws-cfn-bootstrap
      # Start cfn-init
      /opt/aws/bin/cfn-init -s ${AWS::StackId} -r MyInstance --region ${AWS::Region} || error_exit 'Failed to run cfn-init'

Metadata:
  Comment: Install a simple Apache HTTP page
  AWS::CloudFormation::Init:
    config:
      packages:
        yum:
          httpd: []
      files:
        "/var/www/html/index.html":
          content: |
            <h1>Hello World from EC2 instance!</h1>
            <p>This was created using cfn-init</p>
          mode: '000644'
      commands:
        hello:
          command: "echo 'hello world'"
      services:
        sysvinit:
          httpd:
            enabled: 'true'
            ensureRunning: 'true'

cfn-signal & wait conditions

We still don’t know how to tell CloudFormation that the EC2 instance got properly configured after a cfn-init
For this, we can use the cfn-signal script!
- We run cfn-signal right after cfn-init
- Tell CloudFormation service to keep on going or fail
We need to define WaitCondition:
- Block the template until it receives a signal from cfn-signal
- We attach a CreationPolicy (also works on EC2, ASG)
  - The creation policy is invoked only when AWS CloudFormation creates the associated resource. Currently, the only AWS CloudFormation resources that support creation policies are AWS::AutoScaling::AutoScalingGroup, AWS::EC2::Instance, and AWS::CloudFormation::WaitCondition.

1 2	# Start cfn-signal to the wait condition /opt/aws/bin/cfn-signal -e $? --stack ${AWS::StackId} --resource SampleWaitCondition --region ${AWS::Region}

SampleWaitCondition:
  Type: AWS::CloudFormation::WaitCondition
  CreationPolicy:
    ResourceSignal:
      Timeout: PT2M
      Count: 1

CloudFormation StackSets

Create, update, or delete stacks across multiple accounts and regions with a single operation
Administrator account to create StackSets
Trusted accounts to create, update, delete stack instances from StackSets

`Use` AWS CloudFormation `StackSets for Multiple Accounts in an AWS Organization`:

Use AWS CloudFormation StackSets to deploy a template to each account to create the new IAM roles.

Explanation: AWS CloudFormation StackSets allows you to deploy a CloudFormation template across multiple AWS accounts. By using StackSets, you can create and manage the same IAM roles in each account within the organization. This ensures consistent deployment of roles across accounts and simplifies the management process.

Reference: New: Use AWS CloudFormation StackSets for Multiple Accounts in an AWS Organization

`QUESTION` To lunch the last AMI.

Use the Parameters section in the template to specify the Systems Manager (SSM) Parameter, which contains the latest version of the Windows regional AMI ID.

Parameters:
  LatestWindowsAMIParameter:
    Type: AWS::SSM::Parameter::Value<AWS::EC2::Image::Id>
    Default: /LatestWindowsAMI

Parameters:
  LatestWindowsAMIParameter:
    Type: AWS::SSM::Parameter::Value<AWS::EC2::Image::Id>
    Default: /LatestWindowsAMI

UpdatePolicy attribute - `👀 EXAM`

By adding the UpdatePolicy attribute in CloudFormation and enabling the WaitOnResourceSignals property, the Auto Scaling group update process will be handled more gracefully. This approach allows CloudFormation to monitor the health and success of each instance during the update process before moving on to the next instance.

Appending a health check at the end of the user data script allows the instance to signal CloudFormation that it has successfully completed its initialization. This helps ensure that the instance is fully operational before proceeding to the next instance in the Auto Scaling group update process.

CreationPolicy:
  AutoScalingCreationPolicy:
    MinSuccessfulInstancesPercent: Integer
  ResourceSignal:
    Count: '3'
    Timeout: PT15M

UpdatePolicy:
  AutoScalingRollingUpdate:
    MinInstancesInService: '1'
    MaxBatchSize: '2'
    PauseTime: PT1M
    WaitOnResourceSignals: 'true'

👀 `DependsOn` attribute

With the DependsOn attribute, you can specify that the creation of a specific resource follows another. When you add a DependsOn attribute to a resource, that resource is created only after the creation of the resource specified in the DependsOn attribute.

Set Tags consistently in AWS across all accounts

Use the CloudFormation Resource Tags property to apply tags to certain resource types upon creation.

Resources:
  MyEC2Instance:
    Type: AWS::EC2::Instance
    Properties:
      ImageId: ami-0123456789abcdef0
      InstanceType: t2.micro
      Tags:
        - Key: Name
          Value: MyEC2Instance
        - Key: Environment
          Value: Production
        - Key: CostCenter
          Value: MyCostCenter

Output and Export

Outputs:
  SubnetId:
    Description: Subnet ID created in Stack A
    Value: !Ref YourSubnetResourceName
    Export:
      Name: SubnetId-Exported

Resources:
  EC2Instance:
    Type: AWS::EC2::Instance
    Properties:
      SubnetId: !ImportValue SubnetId-Exported
      # Other properties for the EC2 instance

No conditions in parameters

Parameters:
  InstanceTypeParameter:
    Type: String
    Default: t2.micro
    AllowedValues:
      - t2.micro
      - m1.small
      - m1.large
    Description: Enter t2.micro, m1.small, or m1.large. Default is t2.micro.

!GetAtt - The Fn::GetAtt intrinsic function returns the value of an attribute from a resource in the template. This example snippet returns a string containing the DNS name of the load balancer with the logical name myELB - YML : !GetAtt myELB.DNSName JSON : “Fn::GetAtt” : [ “myELB” , “DNSName” ]
!Sub - The intrinsic function Fn::Sub substitutes variables in an input string with values that you specify. In your templates, you can use this function to construct commands or outputs that include values that aren’t available until you create or update a stack.
!Ref - The intrinsic function Ref returns the value of the specified parameter or resource.
!FindInMap - The intrinsic function Fn::FindInMap returns the value corresponding to keys in a two-level map that is declared in the Mappings section. For example, you can use this in the Mappings section that contains a single map, RegionMap, that associates AMIs with AWS regions.

AWS Backup

Fully managed service
Centrally manage and automate backups across AWS services.
No need to create custom scripts and manual processes
Supported services:
- Amazon EC2 / Amazon EBS
- Amazon S3
- Amazon RDS (all DBs engines) / Amazon Aurora / Amazon DynamoDB
- Amazon DocumentDB / Amazon Neptune
- Amazon EFS / Amazon FSx (Lustre & Windows File Server)
AWS Storage Gateway (Volume Gateway)
Supports cross-region backups
Supports cross-account backups
On-Demand and Scheduled backups
Tag-based backup policies
You create backup policies known as Backup Plans
- Backup frequency (every 12 hours, daily, weekly, monthly, cron expression)
- Backup window
- Transition to Cold Storage (Never, Days, Weeks, Months, Years)
- Retention Period (Always, Days, Weeks, Months, Year

`👀 QUESTIONUESTION`

AWS Backup is a fully managed and cost-effective backup service that simplifies and automates data backup across AWS services including Amazon EBS, Amazon EC2, Amazon RDS, Amazon Aurora, Amazon DynamoDB, Amazon EFS, and AWS Storage Gateway. In addition, AWS Backup leverages AWS Organizations to implement and maintain a central view of backup policy across resources in a multi-account AWS environment. Customers simply tag and associate their AWS resources with backup policies managed by AWS Backup for Cross-Region data replication.

`👀 QUESTIONUESTION` For the production account, a SysOps administrator must ensure that all data is backed up daily for all current and future Amazon EC2 instances and Amazon Elastic File System (Amazon EFS) file systems. Backups must be retained for 30 days

Create a backup plan in AWS Backup. Assign resources by resource ID, selecting all existing EC2 and EFS resources that are running in the account. Edit the backup plan daily to include any new resources. Schedule the backup plan to run every day with a lifecycle policy to expire backups after 30 days.

Explanation: AWS Backup provides a centralized and automated solution for backing up data. By creating a backup plan and assigning resources by resource ID, you can easily include all existing EC2 instances and EFS file systems in the backup process. Editing the backup plan daily ensures that any new resources are automatically included in the backups. By scheduling the backup plan to run every day and configuring a lifecycle policy to expire backups after 30 days, you meet the requirement of daily backups with a retention period of 30 days.

AWS Backup does not reboot EC2 instances - `👀 QUESTIONUESTION`

AWS Backup does not reboot EC2 instances at any time. To maintain the file integrity of images created, you have to apply the reboot parameter when taking images.

To create a Lambda function that calls the CreateImage API with a reboot parameter and then schedule the function to run on a daily basis via Amazon EventBridge (Amazon CloudWatch Events).

AWS Backup Vault Lock

Enforce a WORM (Write On1ce Read Many) state for all the backups that you store in your AWS Backup Vault
Additional layer of defense to protect your backups against:
- Inadvertent or malicious delete operations
- Updates that shorten or alter retention periods
Even the root user cannot delete backups when enabled

AWS Shared Responsibility Model

AWS responsibility - Security of the Cloud
- Protecting infrastructure (hardware, software, facilities, and networking) that runs all the AWS services
- Managed services like S3, DynamoDB, RDS, etc.
Customer responsibility - Security in the Cloud
- For EC2 instance, customer is responsible for management of the guest OS (including security patches and updates), firewall & network configuration, IAM
- Encrypting application data
Shared controls:
- Patch Management, Configuration Management, Awareness & Training

DDoS (Distributed Denial-of-service) Protection on AWS

AWS Shield Standard: protects against DDoS attack for your website and applications, for all customers at no additional costs.
AWS Shield Advanced: 24/7 premium DDoS protection.
AWS WAF: Filter specific requests based on rules.
CloudFront and Route 53:
- Availability protection using global edge network
- Combined with AWS Shield, provides attack mitigation at the edge
Be ready to scale - leverage AWS Auto Scaling.

AWS WAF - Web Application Firewall

Protects your web applications from common web exploits (Layer 7)
Layer 7 is HTTP (vs Layer 4 is TCP)
Deploy on Application Load Balancer, API Gateway, CloudFront
Define Web ACL (Web Access Control List):
- Rules can include: IP addresses, HTTP headers, HTTP body, or URI strings
- Protects from common attack - SQL injection and Cross-Site Scripting (XSS).
- Size constraints, geo-match (block countries)
- Rate-based rules (to count occurrences of events) - for DDoS protection

Penetration Testing on AWS Cloud

AWS customers are welcome to carry out security assessments or penetration tests against their AWS infrastructure without prior approval for 8 services:
- Amazon EC2 instances, NAT Gateways, and Elastic Load Balancers
- Amazon RDS
- Amazon CloudFront
- Amazon Aurora
- Amazon API Gateways
- AWS Lambda and Lambda Edge functions
- Amazon Lightsail resources
- Amazon Elastic Beanstalk environments

Penetration Testing on your AWS Cloud

Prohibited Activities
- DNS zone walking via Amazon Route 53 Hosted Zones
- Denial of Service (DoS), Distributed Denial of Service (DDoS), Simulated DoS, Simulated DDoS
- Port flooding
- Protocol flooding
- Request flooding (login request flooding, API request flooding)
Read more: https://aws.amazon.com/security/penetration-testing/

Amazon Inspector: Securiry Compliance for EC2 and Sofware deployed on AWS.

AWS Inspector - 👀 EXAM

Amazon Inspector is used for security compliance of instances and applications deployed on AWS.
Amazon Inspector checks for unintended network accessibility of your Amazon EC2 instances and vulnerabilities on those EC2 instances.
Amazon Inspector also integrates with AWS Security Hub to provide a view of your security posture across multiple AWS accounts.

Amazon Inspector is an automated security assessment service that helps you test the network accessibility of your Amazon EC2 instances and the security state of your applications running on the instances.

An Amazon Inspector assessment report can be generated for an assessment run once it has been successfully completed. An assessment report is a document that details what is tested in the assessment run, and the results of the assessment. The results of your assessment are formatted into a standard report, which can be generated to share results within your team for remediation actions, to enrich compliance audit data, or to store for future reference.

You can select from two types of report for your assessment, a findings report or a full report. The findings report contains an executive summary of the assessment, the instances targeted, the rules packages tested, the rules that generated findings, and detailed information about each of these rules along with the list of instances that failed the check. The full report contains all the information in the findings report and additionally provides the list of rules that were checked and passed on all instances in the assessment target.

👀 Amazon Inspector

Automated Security Assessments
For EC2 instances
- Leveraging the AWS System Manager (SSM) agent
- Analyze against unintended network accessibility
- Analyze the running OS against known vulnerabilities
For Container Images push to Amazon ECR
- Assessment of Container Images as they are pushed Amazon ECR
For Lambda Functions
- Identifies software vulnerabilities in function code and package ependencies
- Assessment of functions as they are deployed

-> Reporting & integration with AWS Security Hub To provide a view of your security posture across multiple AWS accounts. -> Send findings to Amazon Event Bridge

AWS Security Hub - TODO

Supports automated security checks aligned to the Center for Internet Security’s (CIS) AWS Foundations Benchmark version 1.4.0 requirements for Level 1 and 2 (CIS v1.4.0).

CIS AWS Foundations Benchmark - TODO

Serves as a set of security configuration best practices for AWS. These industry-accepted best practices provide you with clear, step-by-step implementation and assessment procedures. Ranging from operating systems to cloud services and network devices, the controls in this benchmark help you protect the specific systems that your organization uses.

https://docs.aws.amazon.com/securityhub/latest/userguide/cis-aws-foundations-benchmark.html https://aws.amazon.com/about-aws/whats-new/2022/11/security-hub-center-internet-securitys-cis-foundations-benchmark-version-1-4-0/

What does Amazon Inspector evaluate?

Only for EC2 instances, Container Images & Lambda functions - EXAM
Continuous scanning of the infrastructure, only when needed
Package vulnerabilities (EC2, ECR & Lambda) - database of CVE
Network reachability (EC2)
A risk score is associated with all vulnerabilities for prioritization

Amazon Inspector discovers potential security issues by using security rules to analyze AWS resources. Amazon Inspector also integrates with AWS Security Hub to provide a view of your security posture across multiple AWS accounts.

Amazon GuardDuty - 👀 EXAM

Intelligent Threat discovery to protect your AWS Account.
Uses Machine Learning algorithms, anomaly detection, 3rd party data
One click to enable (30 days trial), no need to install software
Input data includes:
- CloudTrail Events Logs - unusual API calls, unauthorized deployments
  - CloudTrail Management Events - create VPC subnet, create trail, …
  - CloudTrail S3 Data Events - get object, list objects, delete object, …
- VPC Flow Logs - unusual internal traffic, unusual IP address
- DNS Logs - compromised EC2 instances sending encoded data within DNS queries
- Optional Feature - EKS Audit Logs, RDS & Aurora, EBS, Lambda, S3 Data Events…
Can setup EventBridge rules to be notified in case of findings
EventBridge rules can target AWS Lambda or SNS
Can protect against CryptoCurrency attacks (has a dedicated “finding” for it) - 👀 EXAM

AWS Macie

Amazon Macie is a fully managed data security and data privacy service that uses machine learning and pattern matching to discover andprotect your sensitive data in AWS.
Macie helps identify and alert you to sensitive data, such as personally identifiable information (PII).
Notify to Amazon EventBridge => Integrations

`QUESTION` AWS Trusted Advisor

Trusted Advisor provides real-time guidance to help users follow AWS best practices to provision their resources. Hight level account assessment.

E.g. AWS Trusted Advisor checks for service usage that is more than 80% of the service limit.

Check categories - 👀 EXAM

Cost optimization
Performance
Security
Fault tolerance
Service limits

Trusted Advisor - Support Plans - Exam

7 CORES CHECKS for Basic & Developer Support plan
- S3 Bucket Permissions
- Security Groups - Specific Ports Unrestricted
- IAM Use (one IAM user minimum)
- MFA on Root Account
- EBS Public Snapshots
- RDS Public Snapshots
- Service Limits
FULL CHECKS
- Full Checks available on the 5 categories
- Ability to set CloudWatch alarms when reaching limits
- Programmatic Access using AWS Support API - Exa,

AWS KMS (Key Management Service)

Anytime you hear “encryption” for an AWS service, it’s most likely KMS
AWS manages encryption keys for us
Fully integrated with IAM for authorization
Easy way to control access to your data
Able to audit KMS Key usage using CloudTrail - Exam
Seamlessly integrated into most AWS services (EBS, S3, RDS, SSM…)
Never ever store your secrets in plaintext, especially in your code!
KMS Key Encryption also available through API calls (SDK, CLI)
Encrypted secrets can be stored in the code / environment variables

KMS Automatic Key Rotation

For Customer-managed CMK (not AWS managed CMK)
If enabled: automatic key rotation happens every 1 year.
Previous key is kept active so you can decrypt old data
New Key has the same CMK ID (only the backing key is changed)

KMS Manual Key Rotation

When you want to rotate key every 90 days, 180 days, etc...
New Key has a different CMK ID
Keep the previous key active so you can decrypt old data
Better to use aliases in this case (to hide the change of key for the application)
Good solution to rotate CMK that are not eligible for automatic rotation (like asymmetric CMK)

Changing The KMS Key For An Encrypted EBS Volume

You can’t change the encryptionkeys used by an EBS volume.
Create an EBS snapshot and create a new EBS volume and specify the new KMS key.

You can share RDS DB snapshots encrypted with KMS CMK with other accounts, but must first share the KMS CMK with the target account using Key Policy.

KMS Key Deletion Considerations

Schedule CMK for deletion with a waiting period of 7 to 30 days
CMK’s status is “Pending deletion” during the waiting period
During the CMK’s deletion waiting period:
The CMK can’t be used for cryptographic operations (e.g., can’t decrypt KMS- encrypted objects in S3 - SSE-KMS)
The key is not rotated even if planned
You can cancel the key deletion during the waiting period
Consider disabling your key instead of deleting it if you’re not sure!

👀

You can allow IAM users or roles in one AWS account to use a customer master key (CMK) in a different AWS account. You can add these permissions when you create the CMK or change the permissions for an existing CMK.

To permit the usage of a CMK to users and roles in another account, you must use two different types of policies:

The key policy for the CMK must give the external account (or users and roles in the external account) permission to use the CMK. The key policy is in the account that owns the CMK.
IAM policies in the external account must delegate the key policy permissions to its users and roles. These policies are set in the external account and give permissions to users and roles in that account.

CloudHSM - Dedicated Hardware (HSM = Hardware Security Module)

KMS => AWS manages the software for encryption.
CloudHSM => AWS provisions encryption hardware.
CUSTOMER MANAGED CMK.
You manage your own encryption keys entirely (not AWS)
HSM device is tamper resistant, FIPS 140-2 Level 3 compliance.
Supports both symmetric and asymmetric encryption (SSL/TLS keys)
No free tier available
Must use the CloudHSM Client Software
Redshift supports CloudHSM for database encryption and key management
Good option to use with SSE-C encryption.

CloudHSM - High Availability

CloudHSM clusters are spread across Multi AZ (HA)
Great for availability and durability

👀 `AWS Artifact` (Not really a service) - 👀 EXAM

is a service that provides on-demand access to AWS compliance reports and other relevant documents.

Artifact Reports - Allows you to download AWS security and compliance documents from third-party auditors, like AWS ISO certifications, Payment Card Industry (PCI), and System and Organization Control (SOC) reports
Artifact Agreements - Allows you to review, accept, and track the status of AWS agreements such as the Business Associate Addendum (BAA) or the Health Insurance Portability and Accountability Act (HIPAA) for an individual account or in your organization -

Can be used to support internal audit or compliance.

AWS Secrets Manager

Newer service, meant for storing secrets
Capability to force rotation of secrets every X days
Automate generation of secrets on rotation (uses Lambda)
Integration with Amazon RDS (MySQL, PostgreSQL, Aurora)
Secrets are encrypted using KMS
Mostly meant for RDS integration

AWS Secrets Manager - Multi-Region Secrets

Replicate Secrets across multiple AWS Regions
Secrets Manager keeps read replicas in sync with the primary Secret
Ability to promote a read replica Secret to a standalone Secret
Use cases: multi-region apps, disaster recovery strategies, multi-region DB…

Secrets Manager - Monitoring

CloudTrail captures API calls to the Secrets Manager API
CloudTrail captures other related events that might have a security or compliance impact on your AWS account or might help you troubleshoot operational problems.
CloudTrail records these events as non-API service events:
- RotationStarted event
- RotationSucceeded event
- RotationFailed event
- RotationAbandoned event - a manual change to a secret instead of automated rotation
- StartSecretVersionDelete event
- CancelSecretVersionDelete event
EndSecretVersionDelete event
Combine with CloudWatch Logs and CloudWatch alarms for automations.

SSM Parameter Store vs Secret Manager

Secret Manager ($$$):
- Automatic rotation
- Lambda func is provided for RDS, Redshift, DocumentDB
- KMS enc is mandatory.
- Integration with CloudFormation
SSM Parameter Store ($)
- Simple API
- No secret rotation
- KMS enc optional
- Integration with CloudFormation
- Pull screcte from SSM Parater Store

SSM - Patch Manager

Patch Baseline
Defines which patches should and shouldn’t be installed on your instances
Ability to create custom Patch Baselines (specify approved/rejected patches)
Patches can be auto-approved within days of their release
By default, install only critical patches and patches related to security
Patch Group
Associate a set of instances with a specific Patch Baseline
Example: create Patch Groups for different environments (dev, test, prod)
Instances should be defined with the tag key Patch Group.
An instance can only be in one Patch Group
Patch Group can be registered with only one Patch Baseline

SSM - Patch Manager Patch Baselines

Pre-Defined Patch Baseline
- Managed by AWS for different Operating Systems (can’t be modified)
- AWS-RunPatchBaseline (SSM Document) - apply both operating system and application patches (Linux, macOS, Windows Server)
Custom Patch Baseline
- Create your own Patch Baseline and choose which patches to auto-approve
- Operating System, allowed patches, rejected patches, …
- Ability to specify custom and alternative patch repositories

EBS

QUESTION

Use separate Amazon EBS volumes for the operating system and your data, even though root volume persistence feature is available.
EBS snapshots only capture data that has been written to your Amazon EBS volume, which might exclude any data that has been locally cached by your application or operating system.
By default, data on a non-root EBS volume is preserved even if the instance is shutdown or terminated. By default, when you attach a non-root EBS volume to an instance, its DeleteOnTermination attribute is set to false. Therefore, the default is to preserve these volumes. After the instance terminates, you can take a snapshot of the preserved volume or attach it to another instance. You must delete a volume to avoid incurring further charges.

`QUESTION.` To set up a backup strategy for an Amazon Elastic Block Store (Amazon EBS) volume storing a custom database on an Amazon EC2 instance, the following action should be taken:

Create an Amazon Data Lifecycle Manager (Amazon DLM) policy to take a snapshot of the EBS volume on a recurring schedule.

Explanation: Amazon Data Lifecycle Manager (Amazon DLM) allows you to create automated snapshot lifecycle policies for your Amazon EBS volumes. By creating an Amazon DLM policy, you can define the desired backup schedule and retention period for the EBS volume. The policy will then automatically create snapshots according to the defined schedule. This ensures that regular backups are taken and can be used for data recovery if needed.

EBS volumes deleted with the `TerminateInstances` API call continue to show for some time on AWS Config console - 👀 EXAM

Terminated Amazon EC2 instances use the DeleteOnTermination attribute for each attached EBS volume to determine to delete the volume. Amazon EC2 deletes the Amazon EBS volume that has the DeleteOnTermination attribute set to true, but it does not publish the DeleteVolume API call. This is because AWS Config uses the DeleteVolume API call as a trigger with the rule, and the resource changes aren’t recorded for the EBS volume. The EBS volume still shows as compliant or noncompliant. AWS Config performs a baseline every six hours to check for new configuration items with the ResourceDeleted status. The AWS Config rule then removes the deleted EBS volumes from the evaluation results.

Amazon EBS volumes deleted using the DeleteVolume API call invoke a DescribeVolumes API call on volume. The DescribeVolumes API call returns an InvalidVolume.NotFound error code and the Amazon EBS volume is removed from the list of resources in AWS Config

SSD-backed volumes (IOPS-intensive)

EBS Volume Types

EBS Volumes come in 6 types

gp2 / gp3 (SSD): General purpose SSD volume that balances price and performance for a wide variety of workloads.
io1 / io2 (SSD): Highest-performance SSD volume for mission-critical low-latency or high-throughput workloads. Only multi-attach 16 instances at a time
st1 (HDD): Low cost HDD volume designed for frequently accessed, throughput intensive workloads
sc1 (HDD): Lowest cost HDD volume designed for less frequently accessed workloads

AWS EBS Types

EBS Volumes are characterized in Size | Throughput | IOPS (I/O Ops Per Sec).

Only gp2/gp3 and io1/io2 can be used as boot volumes

EBS-optimized instance uses an optimized configuration stack and provides additional, dedicated capacity for Amazon EBS I/O. This optimization provides the best performance for your EBS volumes by minimizing contention between Amazon EBS I/O and other traffic from your instance.

Provisioned IOPS SSD (io2 Block Express, io2 & `io1`) volumes

Provisioned IOPS SSD volumes are designed to deliver a maximum of 256,000 IOPS, 4,000 MB/s of throughput, and 64 TiB in size per volume1. io2 Block Express is the latest generation of the Provisioned IOPS SSD volumes that delivers 4x higher throughput, IOPS, and capacity than regular io2 volumes, along with sub-millisecond latency - at the same price as io2. io2 Block Express provides highest block storage performnce for the largest, most I/O- intensive, mission-critical deployments of Oracle, Microsoft SQL Server, SAP HANA, and SAS Analytics

General purpose SSD (`gp3 and gp2`) volumes

General-purpose volumes are backed by solid-state drives (SSDs) and are suitable for a broad range of transactional workloads, virtual desktops, medium sized single instance databases, latency sensitive interactive applications, dev/test environments, and boot volumes.

HDD-backed volumes (MB/s-intensive)

Throughput optimized HDD (`st1`) volumes

ST1 is backed by hard disk drives (HDDs) and is ideal for frequently accessed, throughput intensive workloads with large datasets and large I/O sizes, such as MapReduce, Kafka, log processing, data warehouse, and ETL workloads.

Cold HDD (`sc1`) volumes

SC1 is backed by hard disk drives (HDDs) and provides the lowest cost per GB of all EBS volume types. It is ideal for less frequently accessed workloads with large, cold datasets.

Changing the instance type

The possibility to resize an instance depends on whether the root device is an EBS volume. If it is, you can easily change the instance size by modifying its instance type, also known as resizing. However, if the root device is an instance store volume, you need to migrate your application to a new instance with the desired instance type.
Before changing the instance type of your Amazon EBS-backed instance, you must stop it. AWS will then move the instance to new hardware, but the instance ID will remain the same.
If your instance belongs to an Auto Scaling group, the Amazon EC2 Auto Scaling service considers the stopped instance as unhealthy and may terminate it, launching a replacement instance instead. To avoid this, you can temporarily suspend the scaling processes for the group while resizing your instance.

Reference: Amazon EBS features

S3

`QUESTION` S3 inventory

Is one of the tools Amazon S3 provides to help manage your storage. You can use it to audit and report on the replication and encryption status of your objects for business, compliance, and regulatory needs. You can also simplify and speed up business workflows and big data jobs using Amazon S3 inventory, which provides a scheduled alternative to the Amazon S3 synchronous List API operation.

Retain Until Date

A retention period safeguards an object version for a specified duration. When a retention period is assigned to an object version, Amazon S3 records a timestamp in the object version’s metadata, indicating when the retention period concludes. Once the retention period ends, the object version can be overwritten or deleted, unless a legal hold has also been placed on it.

You can assign a retention period to an object version either explicitly or through a default setting at the bucket level. Explicitly applying a retention period involves specifying a “Retain Until Date” for the object version. Amazon S3 stores this setting in the object version’s metadata and ensures the protection of the object version until the retention period expires.

References

S3 RTC S3 (Replication Time Control)

S3 Replication Time Control (S3 RTC) helps you meet compliance or business requirements for data replication and provides visibility into Amazon S3 replication times. S3 RTC replicates most objects that you upload to Amazon S3 in seconds, and 99.99 percent of those objects within 15 minutes. Amazon S3 events are available through Amazon SQS, Amazon SNS, or AWS Lambda.

Reference: Using S3 Replication Time Control

Amazon S3 Access Points

Allow you to create unique entry points for accessing your S3 buckets. Each access point can have its own access policy, allowing you to control access at a granular level. By using access points, you can assign specific permissions to each application or team accessing the shared bucket without affecting other applications. This helps in maintaining access control and minimizing the risk of unintended changes to the bucket policy.

Reference: Amazon S3 Access Points

Logs

S3 Server Access Logging

To track requests for access to your bucket, you can enable server access logging. Each access log record provides details about a single access request, such as the requester, bucket name, request time, request action, response status, and an error code, if relevant.

There is no extra charge for enabling server access logging on an Amazon S3 bucket, and you are not charged when the logs are PUT to your bucket. However, any log files that the system delivers to your bucket accrue the usual charges for storage. You can delete these log files at any time. Subsequent reads and other requests to these log files are charged normally, as for any other object, including data transfer charges.

By default, logging is disabled. When logging is enabled, logs are saved to a bucket in the same AWS Region as the source bucket.

Reference: Using Amazon S3 access logs to identify requests

As a Website

If you use an Amazon S3 bucket configured as a website endpoint, you must set it up with CloudFront as a custom origin. You can’t use the origin access identity feature. However, you can restrict access to content on a custom origin by setting up custom headers and configuring your origin to require them.

Enable MFA-Delete

You should note that only the bucket owner (root account) can enable MFA Delete only via the AWS CLI. However, the bucket owner, the AWS account that created the bucket (root account), and all authorized IAM users can enable versioning.

Vault Lock Policy

S3 Glacier Vault Lock allows you to easily deploy and enforce compliance controls for individual S3 Glacier vaults with a vault lock policy. You can specify controls such as “write once read many” (WORM) in a vault lock policy and lock the policy from future edits. Once locked, the policy can no longer be changed.

`QUESTION` Snowball Edge

AWS Snowball is a service designed for large-scale data transfers. Snowball Edge appliances are rugged, petabyte-scale data transfer devices that can be used for offline data migration. By using multiple instances of the AWS Snowball client and multiple Snowball Edge Appliances, the company can achieve fast and cost-effective data migration.

Using multiple instances of the AWS Snowball client and Snowball Edge Appliances allows for parallel data transfers, significantly reducing the migration time. It also avoids the need for network-based transfers, which could be slower and potentially costly due to data transfer charges.

Amazon S3 Storage Classes

Amazon S3 Standard-Infrequent Access (S3 Standard-IA)

S3 Standard-IA is for data that is accessed less frequently, but requires rapid access when needed. S3 Standard-IA offers the high durability, high throughput, and low latency of S3 Standard, with a low per GB storage price and per GB retrieval charge.

`QUESTION` Access to Amazon S3 via Direct Connect

It’s not possible to directly access an S3 bucket through a private virtual interface (VIF) using Direct Connect. This is true even if you have an Amazon Virtual Private Cloud (Amazon VPC) endpoint for Amazon S3 in your VPC because VPC endpoint connections can't extend outside of a VPC. Additionally, Amazon S3 resolves to public IP addresses, even if you enable a VPC endpoint for Amazon S3.

However, you can establish access to Amazon S3 using Direct Connect by following these steps (This configuration doesn’t require a VPC endpoint for Amazon S3, because traffic doesn’t traverse the VPC):

Create a connection. You can request a dedicated connection or a hosted connection.
Establish a cross-network connection with the help of your network provider, and then create a public virtual interface for your connection.
Configure an end router for use with the public virtual interface.

After the BGP is up and established, the Direct Connect router advertises all global public IP prefixes, including Amazon S3 prefixes. Traffic heading to Amazon S3 is routed through the Direct Connect public virtual interface through a private network connection between AWS and your data center or corporate network.

Amazon S3 - Security 👀

User-Based
- IAM Policies - which API calls should be allowed for a specific user from IAM
Resource-Based
- Bucket Policies - bucket wide rules from the S3 console - allows cross account
- Object Access Control List (ACL) - finer grain (can be disabled)
- Bucket Access Control List (ACL) - less common (can be disabled)
Note: an IAM principal can access an S3 object if
- The user IAM permissions ALLOW it OR the resource policy ALLOWS it
- AND there’s no explicit DENY
Encryption: encrypt objects in Amazon S3 using encryption keys

S3 Bucket Policies 👀 EXAM

Use S3 bucket for policy to:
- Grant public access to the bucket
- Force objects to be encrypted at upload
- Grant access to another account (Cross Account)
Optional Conditions on:
- Public IP or Elastic IP (not on Private IP)
- Source VPC or Source VPC Endpoint - only works with VPC Endpoints
- CloudFront Origin Identity
- MFA
Examples here: https://docs.aws.amazon.com/AmazonS3/latest/dev/example-bucket-policies.html

S3 Performance 👀 EXAM

Multi-Part upload:
- recommended for files > 100MB, must use for files > 5GB
- Can help parallelize uploads (speed up transfers)
S3 Transfer Acceleration
- Increase transfer speed by transferring file to an AWS edge location which will forward the data to the S3 bucket in the target region
- Compatible with multi-part upload

S3 Batch Operations

Eg Encrypt un-encrypted objects.

You can use S3 Inventory to get object list and use S3 Select to filter your objects.

S3 Inventory 👀 EXAM

List objects and their corresponding metadata (alternative to S3 List API operation)
Usage examples:
- Audit and report on the replication and encryption status of your objects
- Get the number of objects in an S3 bucket
- Identify the total storage of previous object versions
Generate daily or weekly reports
Output files: CSV, ORC, or Apache Parquet
You can query all the data using Amazon Athena, Redshift, Presto, Hive, Spark...
You can filter generated report using S3 Select
Use cases: Business, Compliance, Regulatory needs, …

Amazon S3 Analytics - Storage Class Analysis 👀 EXAM

Help you decide when to transition objects to the right storage class
Recommendations for Standard and Standard IA
Does NOT work for One-Zone IA or Glacier
Report is updated daily

24 to 48 hours to start seeing data analysis

Good first step to put together Lifecycle Rules

Amazon S3 Glacier - Vault Policies & Vault Lock 👀 EXAM

Each Vault has:
- ONE vault access policy
- ONE vault lock policy
Vault Policies are written in JSON
Vault Access Policy is like a bucket policy (restrict user / account permissions)
Vault Lock Policy is a policy you lock, for regulatory and compliance requirements.
- The policy is immutable, it can never be changed (that’s why it’s call LOCK)
- Example 1: forbid deleting an archive if less than 1 year old
- Example 2: implement WORM policy (write once read many)

Glacier - Notifications for Restore Operations

S3 Event Notifications

S3 supports the restoration of objects archivedto S3 Glacier storage classes
s3:ObjectRestore:Post => notify when object restoration initiated
s3:ObjectRestore:Completed => notify whenobject restoration completed

Amazon S3 - Object Encryption

You can encrypt objects in S3 buckets using one of 4 methods

Server-Side Encryption (SSE)

Server-Side Encryption with Amazon S3-Managed Keys (SSE-S3) - Enabled by Default
- Encrypts S3 objects using keys handled, managed, and owned by AWS
- Must set header "x-amz-server-side-encryption": "AES256"
Server-Side Encryption with KMS Keys stored in AWS KMS (SSE-KMS)
- Leverage AWS Key Management Service (AWS KMS) to manage encryption keys
- ust set header "x-amz-server-side-encryption": "aws:kms"
Server-Side Encryption with Customer-Provided Keys (SSE-C)
- When you want to manage your own encryption keys
- HTTPS must be used
Client-Side Encryption

Amazon S3 - Force Encryption in Transit aws:SecureTransport

CORS

Cross-origin resource sharing (CORS) defines a way for client web applications that are loaded in one domain to interact with resources in a different domain

Cross-Origin Resource Sharing (CORS)
Origin = scheme (protocol) + host (domain) + port
- example: https://www.example.com (implied port is 443 for HTTPS, 80 for HTTP)
Web Browser based mechanism to allow requests to other origins while visiting the main origin
Same origin: http://example.com/app1 & http://example.com/app2
Different origins: http://www.example.com & http://other.example.com
The requests won’t be fulfilled unless the other origin allows for the requests, using CORS Headers (example: **Access-Control-Allow-Origin**)

👀 EXAM question

If a client makes a cross-origin request on our S3 bucket, we need to enable the correct CORS headers. You can allow for a specific origin or for * (all origins)

Amazon S3 - MFA Delete

MFA (Multi-Factor Authentication) - force users to generate a code on a device (usually a mobile phone or hardware) before doing important operations on S3
MFA will be required to:
- Permanently delete an object version Google Authenticator
- Suspend Versioning on the bucket
MFA won’t be required to:
- Enable Versioning
- List deleted versions
To use MFA Delete, Versioning must be enabled on the bucket
Only the bucket owner (root account) can enable/disable MFA Delete

Allow access if users are MFA authenticated

use an MFA condition in a policy to check the following properties:

Existence-To simply verify that the user did authenticate with MFA, check that the aws:MultiFactorAuthPresent key is True in a Bool condition. The key is only present when the user authenticates with short-term credentials. Long-term credentials, such as access keys, do not include this key.
Duration-If you want to grant access only within a specified time after MFA authentication, use a numeric condition type to compare the aws:MultiFactorAuthAge key’s age to a value (such as 3600 seconds). Note that the aws:MultiFactorAuthAge key is not present if MFA was not used.

"Condition": {
  "Bool": "{
    "aws:MultiFactorAuthPresent": "true"
  }
}

S3 Access Logs

For audit purpose, you may want to log all access to S3 buckets
Any request made to S3, from any account, authorized or denied, will be logged into another S3 bucket
That data can be analyzed using data analysis tools…
The target logging bucket must be in the same AWS region

Amazon Athena

Serverless query service to analyze data stored in Amazon S3
Uses standard SQL language to query the files (built on Presto)
Supports CSV, JSON, ORC, Avro, and Parquet
Pricing: $5.00 per TB of data scanned
Commonly used with Amazon Quicksight for reporting/dashboards
Use cases: Business intelligence / analytics / reporting, analyze & query VPC Flow Logs, ELB Logs, CloudTrail trails, etc…
Exam Tip: analyze data in S3 using serverless SQL, use Athena

Amazon Athena - Performance Improvement

Use columnar data for cost-savings (less scan).
Compress data for smaller retrievals (bzip2, gzip, lz4, snappy, zlip, zstd…).
Partition datasets in S3 for easy querying on virtual columns.
Use larger files (> 128 MB) to minimize overhead.

Amazon Athena - Federated Query

Allows you to run SQL queries across data stored in relational, non-relational, object, and custom data sources (AWS or on-premises)

Uses Data Source Connectors that run on AWS Lambda to run Federated Queries (e.g., CloudWatch Logs, DynamoDB, RDS, …)

Store the results back in Amazon S3

AWS OpsHub

AWS OpsHub for Snow Family, that you can use to manage your devices and local AWS services. You use AWS OpsHub on a client computer to perform tasks such as unlocking and configuring single or clustered devices, transferring files, and launching and managing instances running on Snow Family Devices. You can use AWS OpsHub to manage both the Storage Optimized and Compute Optimized device types and the Snow device. The AWS OpsHub application is available at no additional cost to you.

AWS OpsHub takes all the existing operations available in the Snowball API and presents them as a graphical user interface. This interface helps you quickly migrate data to the AWS Cloud and deploy edge computing applications on Snow Family Devices.

When your Snow device arrives at your site, you download, install, and launch the AWS OpsHub application on a client machine, such as a laptop. After installation, you can unlock the device and start managing it and using supported AWS services locally. AWS OpsHub provides a dashboard that summarizes key metrics such as storage capacity and active instances on your device. It also provides a selection of AWS services that are supported on the Snow Family Devices. Within minutes, you can begin transferring files to the device.

Amazon FSx - Overview

Launch 3rd party high-performance file systems on AWS

FSx for Lustre

Lustre is a type of parallel distributed file system, for large-scale computing. The name Lustre is derived from “Linux” and “cluster.

Machine Learning, High Performance Computing (HPC)
Video Processing, Financial Modeling, Electronic Design Automation
Seamless integration with S3
- Can “read S3” as a file system (through FSx)
- Can write the output of the computations back to S3 (through FSx)
Can be used from on-premises servers (VPN or Direct Connect)

FSx Lustre - File System Deployment Options

Scratch File System: Temporary storage
Persistent File System

Long-term storage
Data is replicated within same AZ

FSx for Windows File Server

FSx for Windows is a fully managed Windows file system share drive
Supports SMB protocol & Windows NTFS
Microsoft Active Directory integration, ACLs, user quotas
Can be mounted on Linux EC2 instances
Supports Microsoft's Distributed File System (DFS) Namespaces (group files across multiple FS)

FSx for NetAppONTAP

Managed NetApp ONTAP on AWS
File System compatible with NFS, SMB, iSCSI protocols
Point-in-time instantaneous cloning (helpful for testing new workloads)

FSx for OpenZFS

Managed OpenZFS file system on AWS
File System compatible with NFS (v3, v4, v4.1, v4.2)
Point-in-time instantaneous cloning (helpful for testing new workloads)

AWS Storage Gateway

Bridge between on-premises data and cloud data

AWS Storage Gateway Architecture

File Gateway is POSIX compliant (Linux file system)
- POSIX metadata ownership, permissions, and timestamps stored in the object’s metadata in S3
Reboot Storage Gateway VM: (e.g., maintenance)
- File Gateway: simply restart the Storage Gateway VM
- Volume and Tape Gateway:
  - Stop Storage Gateway Service (AWS Console, VM local Console, Storage Gateway API)
  - Reboot the Storage Gateway VM
  - Start Storage Gateway Service (AWS Console, VM local Console, Storage Gateway API)

AWS Storage Gateway

Types of Storage Gateway:

S3 File Gateway

Configured S3 buckets are accessible using the NFS and SMB protocol
Most recently used data is cached in the file gateway
Supports S3 Standard, S3 Standard IA, S3 One Zone A, S3 Intelligent Tiering
Transition to S3 Glacier using a Lifecycle Policy

FSx File Gateway

Native access to Amazon FSx for Windows File Server
Local cache for frequently accessed data
Windows native compatibility (SMB, NTFS, Active Directory...)
Useful for group file shares and home directories

Volume Gateway

Block storage using iSCSI protocol backed by S3
Backed by EBS snapshots which can help restore on-premises volumes!
Cached volumes: low latency access to most recent data
Stored volumes: entire dataset is on premise, scheduled backups to S3

Tape Gateway

Some companies have backup processes using physical tapes (!)
With Tape Gateway, companies use the same processes but, in the cloud
Virtual Tape Library (VTL) backed by Amazon S3 and Glacier
Back up data using existing tape-based processes (and iSCSI interface)
Works with leading backup software vendors

Storage Gateway - Activations

Two ways to get Activation Key:

Using the Gateway VM CLI
Make a web request to the Gateway VM (Port 80) old way

Troubleshooting Activation Failures - 👀 exam

Make sure the Gateway VM has port 80 opened
Check that the Gateway VM has the correct time and synchronizing its time automatically to a Network Time Protocol (NTP) server

Amazon CloudFront

Content Delivery Network (CDN)

Improves read performance, content is cached at the edge
Improves users experience
216 Point of Presence globally (edge locations)
DDoS protection (because worldwide), integration with Shield, AWS Web Application Firewall

CloudFront - Origins

S3 bucket

For distributing files and caching them at the edge
Enhanced security with CloudFront Origin Access Control (OAC)
OAC is replacing Origin Access Identity (OAI)
CloudFront can be used as an ingress (to upload files to S3)

Custom Origin (HTTP)

Application Load Balancer
EC2 instance
S3 website (must first enable the bucket as a static S3 website)
Any HTTP backend you want

AwS Origin Shield - 👀 EXAM

Enabling the Origin Shield feature in CloudFront helps reduce the load on the origin server by adding an additional caching layer between CloudFront edge locations and `the origin. It improves cache hit ratios and reduces the number of requests hitting the origin by serving content from the Origin Shield cache.

CloudFront Origin Shield

AWS WAF (Web Application Firewall)

AWS WAF is a web application firewall that lets you monitor the HTTP and HTTPS requests that are forwarded to an Amazon CloudFront distribution, an Amazon API Gateway REST API, an Application Load Balancer, or an AWS AppSync GraphQL API.

`QUESTION` Change AWS Firewall Manager administration account

You can designate only one account in an organization as a Firewall Manager administrator account. To create a new Firewall Manager administrator account, you must revoke the original administrator account first.

CloudFront Origin Headers vs Cache Behavior

Origin Custom Headers:

Origin-level setting
Set a constant header / header value for all requests to origin

Behavior setting:

Cache-related settings
Contains the whitelist of headers to forward

CloudFront Caching TTL

**“Cache-Control: max-age”** is preferred to “Expires” header

CloudFront - Increasing Cache Ratio

Monitor the CloudWatch metric CacheHitRate

Specify how long to cache your objects: Cache-Control max-age header
Specify none or the minimally required headers
Specify none or the minimally required cookies
Specify none or the minimally required query string parameters
Separate static and dynamic distributions (two origins)

CloudFront with ALB sticky sessions - EXAM QUESTION

- Must forward / whitelist the cookie that controls the session affinity to the origin to allow the session affinity to work

Set a TTL to a value lesser than when the authentication cookie expire

Cookie: AWSALB=...

AWS Health Dashboard - Service History

Shows all regions, all services health
Shows historical information for each day
Has an RSS feed you can subscribe to
Previously called AWS Service Health Dashboard

AWS Health Dashboard - Your Account

Previously called AWS Personal Health Dashboard (PHD).
AWS Account Health Dashboard provides alerts and remediation guidance when AWS is experiencing events that may impact you.
While the Service Health Dashboard displays the general status of AWS services, Account Health Dashboard gives you a personalized view into the performance and availability of the AWS services underlying your AWS resources.
The dashboard displays relevant and timely information to help you manage events in progress and provides proactive notification to help you plan for scheduled activities.
Can aggregate data from an entire AWS Organization
Global service
Shows how AWS outages directly impact you & your AWS resources
Alert, remediation, proactive, scheduled activitie

Health Event NotificationsC

Use EventBridge to react to changes for AWS Health events in your AWS account
Example: receive email notifications when EC2 instances in your AWS account are scheduled for updates
This is possible for Account events (resources that are affected in your account) and Public Events (Regional availability of a service)
Use cases: send notifications, capture event information, take corrective action…

👀 `QUESTION` AWS Personal Health Dashboard

AWS Personal Health Dashboard provides alerts and remediation guidance when AWS is experiencing events that may impact you. While the Service Health Dashboard displays the general status of AWS services, Personal Health Dashboard gives you a personalized view into the performance and availability of the AWS services underlying your AWS resources.

What’s more, Personal Health Dashboard proactively notifies you when AWS experiences any events that may affect you, helping provide quick visibility and guidance to help you minimize the impact of events in progress, and plan for any scheduled changes, such as AWS hardware maintenance.

The AWS Health API provides programmatic access to the AWS Health information that appears in the AWS Personal Health Dashboard. You can use the API operations to get information about events that might affect your AWS services and resources.

AWS Organizations

If you have created an organization in AWS Organizations, you can also create a trail that will log all events for all AWS accounts in that organization. This is referred to as an organization trail.

Offers policy-based management for multiple AWS accounts. With Organizations, you can create groups of accounts, automate account creation, and apply and manage policies for those groups. Organizations enable you to centrally manage policies across multiple accounts without requiring custom scripts and manual processes. It allows you to create Service Control Policies (SCPs) that centrally control AWS service use across multiple AWS accounts.

Global service.
Allows to manage multiple AWS accounts.
The main account is the management account.
Other accounts are member accounts.
Member accounts can only be part of one organization.
Consolidated Billing across all accounts - single payment method.
Pricing benefits from aggregated usage (volume discount for EC2, S3…).
Shared reserved instances and Savings Plans discounts across accounts.
API is available to automate AWS account creation.

Advantages

Multi Account vs One Account Multi VPC.
Use tagging standards for billing purposes.
Enable CloudTrail on all accounts, send logs to central S3 account.
Send CloudWatch Logs to central logging account.
Establish Cross Account Roles for Admin purposes.

Security: Service Control Policies (SCP)

IAM policies applied to OU or Accounts to restrict Users and Roles.
They do not apply to the management account (full admin power).
Must have an explicit allow (does not allow anything by default - like IAM).

AWS Organizations - Reserved Instances

For billing purposes, the consolidated billing feature of AWS Organizations treats all the accounts in the organization as one account.
This means that all accounts in the organization can receive the hourly cost benefit of Reserved Instances that are purchased by any other account.
The payer account (master account) of an organization can turn off Reserved Instance (RI) discount and Savings Plans discount sharing for any accounts in that organization, including the payer account
This means that RIs and Savings Plans discounts aren’t shared between any accounts that have sharing turned off.
To share an RI or Savings Plans discount with an account, both accounts must have sharing turned on.

AWS Organizations - IAM Policies

Use aws:PrincipalOrgID condition key in your resource-based policies to restrict access to IAM principals from accounts in an AWS Organization``

{
  "Version": "2012-10-17",
  "Statement": {
    "Sid": "AllowGetObject",
    "Effect": "Allow",
    "Principal": "*",
    // "Principal":{"AWS":"*"} // Equivalent to above statement
    "Action": "s3:GetObject",
    "Resource": "arn:aws:s3:::policy-heneral-luna/*",
    "Condition": {
      "StringEquals": {
        "aws: PrincipalOrgID": ["o-xxxxxxxxxxx"]
      }
    }
  }
}

AWS Organizations - Tag Policies

Helps you standardize tags across resources in an AWS Organization
Ensure consistent tags, audit tagged resources, maintain proper resources categorization, …
You define tag keys and their allowed values
Helps with AWS Cost Allocation Tags and Attribute-based Access Control
Prevent any non-compliant tagging operations on specified services and resources (has no effect on resources without tags)
Generate a report that lists all tagged/non-compliant resources
Use CloudWatch Events to monitor non-compliant tags

AWS Control Tower - <- AWS Organizations - 👀 EXAM

  Offers the easiest way to `set up` and `govern` a `secure, multi-account AWS environment`.

Offers the easiest way to set up and govern a secure, multi-account AWS environment. It establishes a landing zone that is based on the best-practices blueprints and enables governance using guardrails you can choose from a pre-packaged list. The landing zone is a well-architected, multi-account baseline that follows AWS best practices. Guardrails implement governance rules for security, compliance, and operations.

Benefits:
- Automate the set up of your environment in a few clicks
- Automate ongoing policy management using guardrails
- Detect policy violations and remediate them
- Monitor compliance through an interactive dashboard
AWS Control Tower runs on top of AWS Organizations:
- It automatically sets up AWS Organizations to organize accounts and implement SCPs (Service Control Policies)

AWS Control Tower provides three methods for creating member accounts:

Through the Account Factory console that is part of AWS Service Catalog.
Through the Enroll account feature within AWS Control Tower.
From your AWS Control Tower landing zone’s management account, using Lambda code and appropriate IAM roles.

AWS Control Tower

It establishes a landing zone that is based on the best-practices blueprints and enables governance using guardrails you can choose from a pre-packaged list. The landing zone is a well-architected, multi-account baseline that follows AWS best practices. Guardrails implement governance rules for security, compliance, and operations.

👀 AWS Service Catalog

AWS Service Catalog allows organizations to create and manage catalogs of IT services that are approved for use on AWS. These IT services can include everything from virtual machine images, servers, software, and databases to complete multi-tier application architectures. AWS Service Catalog allows you to centrally manage deployed IT services and your applications, resources, and metadata. This helps you achieve consistent governance and meet your compliance requirements while enabling users to quickly deploy only the approved IT services they need.

To make your Service Catalog products available to users who are not in your AWS accounts, such as users who belong to other organizations or to other AWS accounts in your organization, you share your portfolios with them. You can share in several ways, including account-to-account sharing, organizational sharing, and deploying catalogs using stack sets.

Before you share your products and portfolios to other accounts, you must decide whether you want to share a reference of the catalog or to deploy a copy of the catalog into each recipient account. Note that if you deploy a copy, you must redeploy if there are updates you want to propagate to the recipient accounts.

Users that are new to AWS have too many options, and may create stacks that are not compliant / in line with the rest of the organization
Some users just want a quick self-service portal to launch a set of authorized products pre-defined by admins
Includes: virtual machines, databases, storage options, etc…

Share portfolios with individual AWS accounts or AWS Organizations.

AWS Service Catalog - TagOptions Library - EXAM

Easily manage tags on provisioned products
TagOption:
- Key-value pair managed in AWS Service Catalog
- Used to create an AWS Tag
Can be associated with Portfolios and Products.
Use cases: proper resources tagging, defined allowed tags, …
Can be shared with other AWS accounts and AWS Organizations.
A consistent taxonomy - 👀 EXAM

Cost Explorer - 👀 EXAM - OJO

Visualize, understand, and manage your AWS costs and usage over time
Create custom reports that analyze cost and usage data.
Analyze your data at a high level: total costs and usage across all accounts
Or Monthly, hourly, resource level granularity
Choose an optimal Savings Plan (to lower prices on your bill)
Forecast usage up to 12 months based on previous usage

`👀` Collect information about the service costs of each developer

The AWS-generated tag createdBy defines and applies to supported AWS resources for cost allocation purposes. To use the AWS-generated tags, a management account owner must activate it in the Billing and Cost Management console. When a management account owner activates the tag, it is also activated for all member accounts.
Cost Explorer is a tool that enables you to view and analyze your costs and usage. You can explore your usage and costs using the main graph, the Cost Explorer cost and usage reports, or the Cost Explorer RI reports.
AWS Cost Explorer provides the following prebuilt reports:
- EC2 RI Utilization % offers relevant data to identify and act on opportunities to increase your Reserved Instance usage efficiency. It’s calculated by dividing Reserved Instance used hours by total Reserved Instance purchased hours.
- EC2 RI Coverage % shows how much of your overall instance usage is covered by Reserved Instances. This lets you make informed decisions about when to purchase or modify a Reserved Instance to ensure

AWS Budgets

Create budget and send alarms when costs exceeds the budget.
4 types of budgets: Usage, Cost, Reservation, Savings Plans.
- Usage e.g. to create a cost budget exclusively for data transfer expenses
For Reserved Instances (RI)
- Track utilization
- Supports EC2, ElastiCache, RDS, Redshift
Up to 5 SNS notifications per budget
Can filter by: Service, Linked Account, Tag, Purchase Option, Instance Type, Region, Availability Zone, API Operation, etc…
Same options as AWS Cost Explorer!
2 budgets are free, then $0.02/day/budget

Cost Allocation Tags - EXAM

Use cost allocation tags to track your AWS costs on a detailed level
AWS generated tags
- Automatically applied to the resource you create
- Starts with Prefix aws: (e.g. aws: createdBy)
User-defined tags
- Defined by the user
- Starts with Prefix user:

Cost and Usage Reports - EXAM

Dive deeper into your AWS costs and usage
The AWS Cost & Usage Report contains the most comprehensive set of AWS cost and usage data available
Includes additional metadata about AWS services, pricing, and reservations (e.g., Amazon EC2 Reserved Instances (RIs))
The AWS Cost & Usage Report lists AWS usage for each:
- service category used by an account
- in hourly or daily line items
- any tags that you have activated for cost allocation purposes
Can be configured for daily exports to S3
Can be integrated with Athena, Redshift or QuickSight

AWS Compute Optimizer - 👀 EXAM OJO - 👀

Reduce costs and improve performance by recommending optimal AWS resources for your workloads
Helps you choose optimal configurations and right-size your workloads (over/under provisioned) - 👀 EXAM
Uses Machine Learning to analyze your resources configurations and their utilization CloudWatch metrics
Supported resources
- EC2 instances
- EC2 Auto Scaling Groups
- EBS volumes
- Lambda functions
Lower your costs by up to 25%
Recommendations can be exported to S3

IAM

If an IAM user, with full access to IAM and Amazon S3, assigns a bucket policy to an Amazon S3 bucket and doesn’t specify the AWS account root user as a principal, the root user is denied access to that bucket.

To fix this issue, the CTO needs to ensure that an IAM user with full access to both IAM and Amazon S3 explicitly includes the AWS account root user as a principal in the bucket policy of the S3 bucket. By adding the root user as a principal, access will be granted to the CTO and they will be able to access the S3 bucket in their AWS account.

Reference:

https://docs.aws.amazon.com/IAM/latest/UserGuide/troubleshoot_iam-s3.html

IAM Security Tools

IAM Credentials Report (account-level)
- a report that lists all your account’s users and the status of their various credentials
IAM Access Advisor (user-level) - QUESTION
- Access advisor shows the service permissions granted to a user and when those services were last accessed.
- You can use this in1formation to revise your policies.

IAM Access Analyzer - 👀 EXAM

helps you identify the resources in your organization and accounts, such as Amazon S3 buckets or IAM roles, shared with an external entity. This lets you identify unintended access to your resources and data, which is a security risk. Access Analyzer identifies resources shared with external principals by using logic-based reasoning to analyze the resource-based policies in your AWS environment. For each instance of a resource shared outside of your account, Access Analyzer generates a finding.

Find out which resources are shared externally - 👀 EXAM
- S3 Buckets
- IAM Roles
- KMS Keys
- Lambda Functions and Layers
- SQS queues
- Secrets Manager Secrets
Define Zone of Trust = AWS Account or AWS Organization
Access outside zone of trusts => findings

IAM Policy Types

You manage access in AWS by creating policies and attaching them to IAM identities (users, groups of users, or roles) or AWS resources. A policy is an object in AWS that, when associated with an identity or resource, defines their permissions. Resource-based policies are JSON policy documents that you attach to a resource such as an Amazon S3 bucket. These policies grant the specified principal permission to perform specific actions on that resource and define under what conditions this applies.

Identity-based policies are attached to an IAM user, group, or role. These policies let you specify what that identity can do (its permissions).
Resource-based policies are attached to a resource. For example, you can attach resource-based policies to Amazon S3 buckets, Amazon SQS queues, and AWS Key Management Service encryption keys.
Identity-based policies and resource-based policies are both permissions policies and are evaluated together. For a request to which only permissions policies apply, AWS first checks all policies for a Deny. If one exists, then the request is denied. Then AWS checks for each Allow. If at least one policy statement allows the action in the request, the request is allowed. It doesn’t matter whether the Allow is in the identity-based policy or the resource-based policy.

References: Identity-based policies and resource-based policies - 👀 EXAM

Trust policy

Defines which principal entities (accounts, users, roles, and federated users) can assume the role. An IAM role is both an identity and a resource that supports resource-based policies. For this reason, you must attach both a trust policy and an identity-based policy to an IAM role. The IAM service supports only one type of resource-based policy called a role trust policy, which is attached to an IAM role.

References: Identity-based policies and resource-based policies

Identity Federation

Federation lets users outside of AWS to assume temporary role for accessing AWS resources.
These users assume identity provided access role.
Federation assumes a form of 3rd party authentication
- LDAP
- Microsoft Active Directory (~= SAML) - 👀 EXAM
- Single Sign On - 👀 EXAM
- Open ID
- Cognito - 👀 EXAM
Using federation, you don’t need to create IAM users (user management is outside of AWS).

Custom Identity Broker Application

For Enterprises

Use only if identity provider is not compatible with SAML 2.0.
The identity broker must determine the appropriate IAM policy.

AWS DataSync

Move large amount of data to and from
- On-premises / other cloud to AWS (NFS, SMB, HDFS, S3 API…) - needs agent
- AWS to AWS (different storage services) - no agent needed
Can synchronize to:
- Amazon S3 (any storage classes - including Glacier)
- Amazon EFS
- Amazon FSx (Windows, Lustre, NetApp, OpenZFS…)
Replication tasks can be scheduled hourly, daily, weekly.
File permissions and metadata are preserved (NFS POSIX, SMB...). - Exam
One agent task can use 10 Gbps, can setup a bandwidth limit

Maybe for large quantity of data you can use AWS Snowcone with has the DataSync agent pre-installed.

AWS STS - Security Token Service

Allows to grant limited and temporary access to AWS resources.
Token is valid for up to one hour (must be refreshed)
AssumeRole
- Within your own account: for enhanced security
- Cross Account Access: assume role in target account to perform actions there
AssumeRoleWithSAML
- return credentials for users logged with SAML
AssumeRoleWithWebIdentity
- return creds for users logged with an IdP (Facebook Login, Google Login, OIDC compatible…)
- AWS recommends against using this, and using Cognito instead
GetSessionToken
- for MFA, from a user or AWS account root user

Cognito Identity Pools - IAM Roles

Default IAM roles for authenticated and guest users
Define rules to choose the role for each user based on the user’s ID
You can partition your users’ access using policy variables.
IAM credentials are obtained by Cognito Identity Pools through STS
The roles must have a “trust” policy of Cognito Identity Pools

Cognito User Pools vs Identity Pools

Cognito User Pools (for authentication = identity verification):
- Database of users for your web and mobile application
- Allows to federate logins through Public Social, OIDC, SAML…
- Can customize the hosted UI for authentication (including the logo)]
- Has triggers with AWS Lambda during the authentication flow
- Adapt the sign-in experience to different risk levels (MFA, etc…)
Cognito Identity Pools (for authorization = access control):
Obtain AWS credentials for your users
Users can login through Public Social, OIDC, SAML & Cognito User Pools
Users can be unauthenticated (guests)
Users are mapped to IAM roles & policies, can leverage policy variables
CUP + CIP = manage user / password + access AWS services

Amazon Route 53

A highly available, scalable, fully managed and Authoritative DNS
- Authoritative = the customer (you) can update the DNS records
Route 53 is also a Domain Registrar
Ability to check the health of your resources
The only AWS service which provides 100% availability SLA
Why Route 53? 53 is a reference to the traditional DNS port

Route 53 - Records

How you want to route traffic for a domain
Each record contains:
- Domain/subdomain Name - e.g., example.com
- Record Type - e.g., A or AAAA
- Value - e.g., 12.34.56.78
- Routing Policy - how Route 53 responds to queries
- TTL - amount of time the record cached at DNS Resolvers
Route 53 supports the following DNS record types:
(must know) A / AAAA / CNAME / NS
(advanced) CAA / DS / MX / NAPTR / PTR / SOA / TXT / SPF / SRV

Route 53 - Record Types

A - maps a hostname to IPv4
AAAA - maps a hostname to IPv6
CNAME - maps a hostname to another hostname
- The target is a domain name which must have an A or AAAA record
- Can’t create a CNAME record for the top node of a DNS namespace (Zone Apex)
- Example: you can’t create for example.com, but you can create for www.example.com
NS - Name Servers for the Hosted Zone
Control how traffic is routed for a domain

Route 53 - Hosted Zones

A container for records that define how to route traffic to a domain and its subdomains
Public Hosted Zones - contains records that specify how to route traffic on the Internet (public domain names) application1.mypublicdomain.com
Private Hosted Zones - contain records that specify how you route traffic within one or more VPCs (private domain names) application1.company.internal
You pay $0.50 per month per hosted zone

Route 53 - Records TTL (Time To Live)

High TTL - e.g., 24 hr
- Less traffic on Route 53
- Possibly outdated records
Low TTL - e.g., 60 sec.
- More traffic on Route 53 ($$)
- Records are outdated for less time
- Easy to change records
Except for Alias records, TTL is mandatory for each DNS record

CNAME vs Alias

AWS Resources (Load Balancer, CloudFront…) expose an AWS hostname:
- lb1-1234.us-east-2.elb.amazonaws.com and you want myapp.mydomain.com
CNAME:
- Points a hostname to any other hostname. (app.mydomain.com => blabla.anything.com)
- ONLY FOR NON ROOT DOMAIN (aka. something.mydomain.com)
Alias:
- Points a hostname to an AWS Resource (app.mydomain.com => blabla.amazonaws.com)
- Works for ROOT DOMAIN and NON ROOT DOMAIN (aka mydomain.com)
- Free of charge.
- Native health check.

Route 53 - Alias Records

Maps a hostname to an AWS resource
An extension to DNS functionality
Automatically recognizes changes in the resource’s IP addresses
Unlike CNAME, it can be used for the top node of a DNS namespace (Zone Apex), e.g.: example.com
Alias Record is always of type A/AAAA for AWS resources (IPv4 / IPv6)
You can’t set the TTL

Route 53 - Alias Records Targets

You cannot set an ALIAS record for an EC2 DNS name - 👀 EXAM

Route 53 - Routing Policies

Define how Route 53 responds to DNS queries
Don’t get confused by the word “Routing”
- It’s not the same as Load balancer routing which routes the traffic
- DNS does not route any traffic, it only responds to the DNS queries
Route 53 Supports the following Routing Policies
- Simple
- Weighted
- Failover
- Latency based
- Geolocation
- Multi-Value Answer
- Geoproximity (using Route 53 Traffic Flow feature)

Types of health checks - 👀 EXAM

Health checks that monitor an endpoint - You can configure a health check that monitors an endpoint that you specify either by IP address or by domain name. At regular intervals that you specify, Route 53 submits automated requests over the internet to your application, server, or other resources to verify that it’s reachable, available, and functional. Optionally, you can configure the health check to make requests similar to those that your users make, such as requesting a web page from a specific URL.
Health checks that monitor other health checks (calculated health checks) - You can create a health check that monitors whether Route 53 considers other health checks healthy or unhealthy. One situation where this might be useful is when you have multiple resources that perform the same function, such as multiple web servers, and your chief concern is whether some minimum number of your resources are healthy. You can create a health check for each resource without configuring notifications for those health checks. Then you can create a health check that monitors the status of the other health checks, and that notifies you only when the number of available web resources drops below a specified threshold.
Health checks that monitor CloudWatch alarms - You can create CloudWatch alarms that monitor the status of CloudWatch metrics, such as the number of throttled read events for an Amazon DynamoDB database or the number of Elastic Load Balancing hosts that are considered healthy. After you create an alarm, you can create a health check that monitors the same data stream that CloudWatch monitors for the alarm.

“Evaluate Target Health” - 👀 EXAM

You need to set the Evaluate Target Health flag to true on Route 53. This way, Route 53 will check both ALB entry to ensure that your ALBs are responding.

Routing Policies - Weighted

Control the % of the requests that go to each specific resource
Assign each record a relative weight:
- traffic (%) = weight for a specific record /sum of all weight records
- Weights don’t need to sum up to 100
DNS records must have the same name and type
Can be associated with Health Checks.
Use cases: load balancing between regions, testing new application versions…
Assign a weight of 0 to a record to stop sending traffic to a resource.
If all records have weight of 0, then all records will be returned equally.

Routing Policies - Latency-based

Redirect to the resource that has the least latency close to us
Super helpful when latency for users is a priority
Latency is based on traffic between users and AWS Regions
Germany users may be directed to the US (if that’s the lowest latency)
Can be associated with Health Checks (has a failover capability)

Route 53 - Health Checks

HTTP Health Checks are only for public resources
Health Check => Automated DNS Failover:
1. Health checks that monitor an endpoint (application, server, other AWS resource)
2. Health checks that monitor other health checks (Calculated Health Checks)
3. Health checks that monitor CloudWatch Alarms (full control !!) - e.g., throttles of DynamoDB, alarms on RDS, custom metrics, … (helpful for private resources)
Health Checks are integrated with CW metrics

Health Checks - Private Hosted Zones

Route 53 health checkers are outside the VPC.
They can’t access private endpoints (private VPC or on-premises resource)
You can create a CloudWatch Metric and associate a CloudWatch Alarm, then create a Health Check that checks the alarm itself.

Routing Policies - Geolocation

Different from Latency-based!
This routing is based on user location
Specify location by Continent, Country or by US State (if there’s overlapping, most precise location selected)
Should create a “Default” record (in case there’s no match on location)
Use cases: website localization, restrict content distribution, load balancing, …
Can be associated with Health Checks

Routing Policies - Geoproximity

Route traffic to your resources based on the geographic location of users and resources
Ability to shift more traffic to resources based on the defined bias
To change the size of the geographic region, specify bias values:
- To expand (1 to 99) - more traffic to the resource
- To shrink (-1 to -99) - less traffic to the resource
Resources can be:
- AWS resources (specify AWS region)
- Non-AWS resources (specify Latitude and Longitude)
You must use Route 53 Traffic Flow to use this feature

Route 53 - Traffic flow

Simplify the process of creating and maintaining records in large and complex configurations
Visual editor to manage complex routing decision trees
Configurations can be saved as Traffic Flow Policy
- Can be applied to different Route 53 Hosted Zones (different domain names)
- Supports versioning

Route 53 - Hybrid DNS - 👀 EXAM

By default, Route 53 Resolver automatically answers DNS queries for:
- Local domain names for EC2 instances
- Records in Private Hosted Zones
- Records in public Name Servers
Hybrid DNS - resolving DNS queries between VPC (Route 53 Resolver) and your networks (other DNS Resolvers)
Networks can be:
- VPC itself / Peered VPC
- On-premises Network (connected through Direct Connect or AWS VPN)

Route 53 - Resolver Endpoints - QUESTION

Inbound Endpoint
- DNS Resolvers on your network can forward DNS queries to Route 53 Resolver
- Allows your DNS Resolvers to resolve domain names for AWS resources (e.g., EC2 instances) and records in Route 53 Private Hosted Zones
Outbound Endpoint
- Route 53 Resolver conditionally forwards DNS queries to your DNS Resolvers
- Use Resolver Rules to forward DNS queries to your DNS Resolvers
Associated with one or more VPCs in the same AWS Region
Create in two AZs for high availability
Each Endpoint supports 10,000 queries per second per IP address

Route 53 - Resolver Rules

Control which DNS queries are forwarded to DNS Resolvers on your network
Conditional Forwarding Rules (Forwarding Rules)
- Forward DNS queries for a specified domain and all its subdomains to target IP addresses
System Rules
- Selectively overriding the behavior defined in Forwarding Rules (e.g., don’t forward DNS queries for a subdomain acme.example.com)
Auto-defined System Rules
- Defines how DNS queries for selected domains are resolved (e.g., AWS internal domain names, Privated Hosted Zones)
If multiple rules matched, Route 53 Resolver chooses the most specific match
Resolver Rules can be shared across accounts using AWS RAM
- Manage them centrally in one account
- Send DNS queries from multiple VPC to the target IP defined in the rule

ELB

Elastic Load Balancing and AWS X-Ray

Elastic Load Balancing application load balancers add a trace ID to incoming HTTP requests in a header named X-Amzn-Trace-Id.

  X-Amzn-Trace-Id: Root=1-5759e988-bd862e3fe1be46a994272793

Load balancers do not send data to X-Ray, and do not appear as a node on your service map.

ELB access logs

ELB access logs is an optional feature of Elastic Load Balancing that is disabled by default. The access logs capture detailed information about requests sent to your load balancer. Each log contains information such as the time the request was received, the client's IP address, latencies, request paths, and server responses. You can use these access logs to analyze traffic patterns and troubleshoot issues. Each access log file is automatically encrypted using SSE-S3 before it is stored in your S3 bucket and decrypted when you access it. You do not need to take any action; the encryption and decryption is performed transparently

`VPC Flow Logs` only captures information about the `IP traffic` going to and from network interfaces in a VPC - 0JO

Reference: Access logs for your Application Load Balancer

CloudTrail logs

Elastic Load Balancing is integrated with AWS CloudTrail, a service that provides a record of actions taken by a user, role, or an AWS service in Elastic Load Balancing. CloudTrail captures all API calls for Elastic Load Balancing as events. The calls captured include calls from the AWS Management Console and code calls to the Elastic Load Balancing API operations.

Amazon Machine Images (AMIs)

The key points to consider before planning the expansion and sharing of Amazon Machine Images (AMIs) are:

AMIs are regional resources and can be shared across Regions: AMIs are specific to a particular AWS Region. If you want to use an AMI in a different Region, you need to copy the AMI to that Region. Sharing an AMI across Regions requires creating a new copy in each desired Region. 2/ You need to share any CMKs used to encrypt snapshots and any Amazon EBS snapshots that the AMI references: If the AMI references Amazon Elastic Block Store (EBS) snapshots, you must also share those snapshots. Additionally, if the snapshots are encrypted using customer-managed customer master keys (CMKs), you need to share the CMKs as well.

Application-consistent AMI: Create the AMI by disabling the No reboot option.
Crash-consistent AMI: If No reboot option is selected, the AMI will be crash-consistent (all the volumes are snapshotted at the same time), but not application-consistent (all the operating system buffers are not flushed to disk before the snapshots are created).

EC2

Errors

`Client.InternalError: Client error on launch`

error is caused when an Auto Scaling group attempts to launch an instance that has an encrypted EBS volume, but the service-linked role does not have access to the customer-managed CMK used to encrypt it. Additional setup is required to allow the Auto Scaling group to launch instances.

Termination Policy

You can't enable termination protection for Spot Instances, a Spot Instance is terminated when the Spot price exceeds the amount you’re willing to pay for Spot Instances. However, you can prepare your application to handle Spot Instance interruptions.
To prevent instances that are part of an Auto Scaling group from terminating on scale in, use instance protection. The DisableApiTermination attribute does not prevent Amazon EC2 Auto Scaling from terminating an instance.

Spot Instances Interruptions

You can specify that Amazon EC2 should do one of the following when it interrupts a Spot Instance:

Stop the Spot Instance
Hibernate the Spot Instance
Terminate the Spot Instance

The default is to terminate Spot Instances when they are interrupted.

`EXAM` Spot Fleet Config Cost Optimization

Using lowestPrice allocation strategy a Spot Fleet automatically deploys the lowest price combination of instance types and Availability Zones based on the current Spot price across the number of Spot pools specified. You can use this combination to avoid the most expensive Spot Instances.

Spot Fleets allow us to automatically request Spot Instances with the lowest price

You can specify one of the following allocation strategies:

priceCapacityOptimized
capacityOptimized
diversified
lowestPrice
InstancePoolsToUseCount

`QUESTION` Get public IP address

EC2 instances in AWS have metadata associated with them that can be accessed from within the instance. This metadata includes information about the instance, such as its IP address, instance type, security groups, and more.

Can make an HTTP GET request to a specific URL provided by the instance metadata service. The URL is http://169.254.169.254/latest/meta-data/public-ipv4.

👀 EC2 Detailed monitoring 👀

Metrics are the fundamental concept in CloudWatch. A metric represents a time-ordered set of data points that are published to CloudWatch. Think of a metric as a variable to monitor, and the data points as representing the values of that variable over time.

By default, your instance is enabled for basic monitoring. You can optionally enable detailed monitoring. After you enable detailed monitoring, the Amazon EC2 console displays monitoring graphs with a 1-minute period for the instance. .In Basic monitoring, data is available automatically in 5-minute periods at no charge

Reference: Enable or turn off detailed monitoring for your instances

👀 EC2 Launch Troubleshooting - 👀 EXAM

InstanceLimitExceeded: if you get this error, it means that you have reached your limit of max number of vCPUs per region. ``InsufficientInstanceCapacity: if you get this error, it meansAWS does not have that enough On-Demand capacity` in the particular AZ where the instance is launched.

Instance Terminates Immediately (goes from pending to terminated) - 👀 EXAM

You’ve reached your EBS volume limit.
An EBS snapshot is corrupt.
The root EBS volume is encrypted and you do not have permissions to access the KMS key for decryption.
The instance store-backed AMI that you used to launch the instance is missing a required part (an image.part.xx file).

👀 EC2 SSH Troubleshooting 👀

SG is not configured correctly
NACL is not configured correctly
Check the route table for the subnet (routes traffic destined outside VPC to IGW)
Instance doesn’t have a public IPv4
CPU load of the instance is high

EC2 Instances Purchasing Options

On-Demand Instances - short workload, predictable pricing, pay by second
Reserved (1 & 3 years)
Reserved Instances - long workloads
Convertible Reserved Instances - long workloads with flexible instances
Savings Plans (1 & 3 years) -commitment to an amount of usage, long workload
Spot Instances - short workloads, cheap, can lose instances (less reliable)
Dedicated Hosts - book an entire physical server, control instance placement
Dedicated Instances - no other customers will share your hardware
Capacity Reservations - reserve capacity in a specific AZ for any duration

AWS Storage Gateway

AWS Storage Gateway is a set of hybrid cloud storage services that provide on-premises access to virtually unlimited cloud storage.

AWS Storage Gateway uses SSL/TLS (Secure Socket Layers/Transport Layer Security) to encrypt data that is transferred between your gateway appliance and AWS storage. By default, Storage Gateway uses Amazon S3-Managed Encryption Keys (SSE-S3) to server-side encrypt all data it stores in Amazon S3. You have an option to use the Storage Gateway API to configure your gateway to encrypt data stored in the cloud using server-side encryption with AWS Key Management Service (SSE-KMS) customer master keys (CMKs).

File, Volume and Tape Gateway data is stored in Amazon S3 buckets by AWS Storage Gateway. Tape Gateway supports backing data to Amazon S3 Glacier apart from the standard storage.

Encrypting a file share: For a file share, you can configure your gateway to encrypt your objects with AWS KMS-managed keys by using SSE-KMS.

Encrypting a volume: For cached and stored volumes, you can configure your gateway to encrypt volume data stored in the cloud with AWS KMS-managed keys by using the Storage Gateway API.

Encrypting a tape: For a virtual tape, you can configure your gateway to encrypt tape data stored in the cloud with AWS KMS-managed keys by using the Storage Gateway API.

Tape Gateway

Tape Gateway enables you to replace using physical tapes on-premises with virtual tapes in AWS without changing existing backup workflows. Tape Gateway supports all leading backup applications and caches virtual tapes on-premises for low-latency data access. Tape Gateway encrypts data between the gateway and AWS for secure data transfer and compresses data and transitions virtual tapes between Amazon S3 and Amazon S3 Glacier, or Amazon S3 Glacier Deep Archive, to minimize storage costs.

File Gateway

File Gateway provides a seamless way to connect to the cloud in order to store application data files and backup images as durable objects in Amazon S3 cloud storage. File Gateway offers SMB or NFS-based access to data in Amazon S3 with local caching.

Volume Gateway

You can configure the AWS Storage Gateway service as a Volume Gateway to present cloud-based iSCSI block storage volumes to your on-premises applications. The Volume Gateway provides either a local cache or full volumes on-premises while also storing full copies of your volumes in the AWS cloud. Volume Gateway also provides Amazon EBS Snapshots of your data for backup, disaster recovery, and migration. It’s easy to get started with the Volume Gateway: Deploy it as a virtual machine or hardware appliance, give it local disk resources, connect it to your applications, and start using your hybrid cloud storage for block data.

Reference: AWS Storage Gateway

👀 Storage Optimized Instances

Designed for workloads that require high, sequential read and write access to very large data sets on local storage. They are optimized to deliver tens of thousands of low-latency, random I/O operations per second (IOPS) to applications compared with EBS-backed EC2 instances.

Posts

RDP traffic: Port 3389, TCP protocol.

👀 AWS Directory Services

service that automatically creates an AWS security group in your VPC with network rules for traffic in and out of AWS managed domain controllers. The default inbound rules allow traffic from any source (0.0.0.0/0) to ports required by Active Directory. These rules do not introduce security vulnerabilities, as traffic to the domain controllers is limited to traffic from your VPC, other peered VPCs, or networks connected using AWS Direct Connect, AWS Transit Gateway or Virtual Private Network.

By default, AWS Directory Services creates security groups that allow unrestricted access, which can be `flagged as a security concern. To address this issue, you need to review the security group rules and make necessary adjustments to restrict access based on your specific requirements and security best practices.

Using AWS Trusted Advisor can provide additional insights into security best practices and potential misconfigurations, but it may not specifically highlight the security group issue related to AWS Directory Services.

`QUESTION` SAML federation

SAML federation between AWS and the corporate Active Directory and mapping Active Directory groups to IAM groups is the recommended way to make access more secure and streamlined.

Enhanced networking

QUESTION Consider using enhanced networking for the following scenarios of network performance issues:

If your packets-per-second rate reaches its ceiling, consider moving to enhanced networking. If your rate reaches its ceiling, you’ve likely reached the upper thresholds of the virtual network interface driver.
If your throughput is near or exceeding 20K packets per second (PPS) on the VIF driver, it’s a best practice to use enhanced networking.

All current generation instance types support enhanced networking, except for T2 instances.

Cost Allocation Tags

A tag is a label that you or AWS assigns to an AWS resource. Each tag consists of a key and a value. For each resource, each tag key must be unique, and each tag key can have only one value. You can use tags to organize your resources, and cost allocation tags to track your AWS costs on a detailed level. After you activate cost allocation tags, AWS uses the cost allocation tags to organize your resource costs on your cost allocation report, to make it easier for you to categorize and track your AWS costs.

👀 `QUESTION` AWS Resource Groups Tag Editor

With Resource Groups, you can create, maintain, and view a collection of resources that share common tags. Tag Editor manages tags across services and AWS Regions. Tag Editor can perform a global search and can edit a large number of tags at one time.

OpsWorks

AWS OpsWorks is a configuration management service that provides managed instances of Chef and Puppet.

👀 Chef Server

You can add nodes automatically to your Chef Server using the unattended method. The recommended method of unattended (or automatic) association of new nodes is to configure the Chef Client Cookbook. With this method, a script is used to run the opsworks-cm API associate-node command to associate a new node with your Chef server. Steps are found in the references.

`QUESTION` AWS Service Health Dashboard

Publishes the most up-to-the-minute information on the status and availability of all AWS services in tabular form for all Regions that AWS is present in. You can check on this page https://status.aws.amazon.com/ to get current status information.

👀 Cost Allocation Tags - Account Level

Protecting data using encryption - QUESTION

SSE-S3 - Server-Side Encryption with Amazon S3-Managed Keys (SSE-S3)

Using SSE-S3 each object is encrypted with a unique key employing strong multi-factor encryption. As an additional safeguard, it encrypts the key itself with a master key that it regularly rotates. Amazon S3 server-side encryption uses one of the strongest block ciphers available, 256-bit Advanced Encryption Standard (AES-256), to encrypt your data.

SSE-KMS

Similar to SSE-S3 and also provides you with an audit trail of when your key was used and by whom. Additionally, you have the option to create and manage encryption keys yourself.

SSE-C

You manage the encryption keys and Amazon S3 manages the encryption as it writes to disks and decryption when you access your objects.

Client-Side Encryption

You can encrypt data client-side and upload the encrypted data to Amazon S3. In this case, you manage the encryption process, the encryption keys, and related tools.

AWS Elastic Beanstalk

deployment policy in Elastic Beanstalk:

All at once: Deploy the new version to all instances simultaneously. All instances in your environment are out of service for a short time while the deployment occurs.
Rolling: Deploy the new version in batches. Each batch is taken out of service during the deployment phase, reducing your environment’s capacity by the number of instances in a batch.
Rolling with additional batch: Deploy the new version in batches, but first launch a new batch of instances to ensure full capacity during the deployment process.
Immutable: Deploy the new version to a fresh group of instances by performing an immutable update.

With deployment policies such as ‘All at once’, AWS Elastic Beanstalk performs an in-place update when you update your application versions and your application can become unavailable to users for a short period of time. You can avoid this downtime by performing a blue/green deployment, where you deploy the new version to a separate environment, and then swap CNAMEs (via Route 53) of the two environments to redirect traffic to the new version instantly. In case of any deployment issues, the rollback process is very quick via swapping the URLs for the two environments.

Reference: Deploying applications to Elastic Beanstalk environments

Dedicated Hosts and Dedicated Instances

Dedicated Instances

Are Amazon EC2 instances that run in a virtual private cloud (VPC) on hardware that’s dedicated to a single customer. Dedicated Instances that belong to different AWS accounts are physically isolated at a hardware level, even if those accounts are linked to a single-payer account. Note that Dedicated Instances may share hardware with other instances from the same AWS account that are not Dedicated Instances.

Dedicated Host

Is a physical server with EC2 instance capacity fully dedicated to your use.

Dedicated Hosts allow you to use your existing software licenses on EC2 instances. With a Dedicated Host, you have visibility and control over how instances are placed on the server.
Dedicated Hosts allow you to use your existing per-socket, per-core, or per-VM software licenses, including Windows Server, Microsoft SQL Server, SUSE, and Linux Enterprise Server.

Reference: Dedicated Hosts

AWS CloudHSM (Hardware Security Module)

CloudHSM provides tamper-resistant hardware that is available in multiple Availability Zones (AZs), ensuring high availability and durability of the keys.

AWS CloudHSM provides dedicated hardware security modules to store and manage cryptographic KEYS securely. It offers FIPS 140-2 Level 3 validated HSMs, which are ideal for meeting compliance requirements. With CloudHSM, you have full control over the key lifecycle and can perform key operations within the HSM, ensuring strong security and compliance for your keys.

You can use stack sets to deploy your catalog to many accounts at the same time. If you want to share a reference (an imported version of your portfolio that stays in sync with the original), you can use account-to-account sharing or you can share using AWS Organizations.

Amazon EFS - Elastic File System

Use cases: content management, web serving, data sharing, Wordpress
Uses NFSv4.1 protocol
Uses security group to control access to EFS
Compatible with Linux based AMI (not Windows)
Encryption at rest using KMS
POSIX file system (~Linux) that has a standard file API
File system scales automatically, pay-per-use, no capacity planning!

Creating security groups

Regardless, to enable traffic between an EC2 instance and a mount target (and thus the file system), you must configure the following rules in these security groups:

The security groups that you associate with a mount target must allow inbound access for the TCP protocol on the NFS port from all EC2 instances on which you want to mount the file system.
Each EC2 instance that mounts the file system must have a security group that allows outbound access to the mount target on the NFS port.

EFS vs EBS

EFS - Access Points

Easily manage applications access to NFS environments
Enforce a POSIX user and group to use when accessing the file system
Restrict access to a directory within the file system and optionally specify a different root directory
Can restrict access from NFS clients using IAM policies

EFS - Operations

Operations that can be done in place:
- Lifecycle Policy (enable IA or change IA settings)
- Throughput Mode and Provisioned Throughput Numbers
- EFS Access Points
Operations that require a migration using DataSync (replicates all file attributes and metadata)
- Migration to encrypted EFS
- Performance Mode (e.g. Max IO)

Amazon Data Lifecycle Manager - `👀 EXAM`

Automate the creation, retention, and deletion of EBS snapshots and EBS-backed AMIs.
Schedule backups, cross-account snapshot copies, delete outdated backups, …
Uses resource tags to identify the resources (EC2 instances, EBS volumes).
Can’t be used to manage snapshots/AMIs created outside DLM.
Can’t be used to manage instance-store backed AMIs

EFS - Storage Classes

Storage Tiers (lifecycle management feature - move file after N days)

Standard: for frequently accessed files
Infrequent access (EFS-IA): cost to retrieve files, lower price to store. Enable EFS-IA with a Lifecycle Policy

Availability and durability

Standard: Multi-AZ, great for prod
One Zone: One AZ, great for dev, backup enabled by default, compatible with IA (EFS One Zone-IA)
Over 90% in cost savings

EFS - CloudWatch Metrics

PercentIOLimit

How close the file system reaching the I/O limit (General Purpose)
If at 100%, move to Max I/O (migration) - 👀 EXAM

BurstCreditBalance

The number of burst credits the file system can use to achieve higher throughput levels

StorageBytes

File system’s size in bytes (15 minutes interval)
Dimensions: Standard, IA, Total (Standard + IA)

Enforce creation that is encrypted at rest

Use the elasticfilesystem:Encrypted IAM condition key in AWS IAM identity-based policies to mandate users for creating only encrypted-at-rest Amazon EFS file systems

You can create an AWS Identity and Access Management (IAM) identity-based policy to control whether users can create Amazon EFS file systems that are encrypted at rest. The Boolean condition key elasticfilesystem:Encrypted specifies the type of file system, encrypted or unencrypted, that the policy applies to. You use the condition key with the elasticfilesystem:CreateFileSystem action and the policy effect, allow or deny, to create a policy for creating encrypted or unencrypted file systems.

Define Service Control Policies (SCPs) inside AWS Organizations to enforce EFS encryption for all AWS accounts in your organization.

Service control policies (SCPs) are a type of organization policy that you can use to manage permissions in your organization. SCPs offer central control over the maximum available permissions for all accounts in your organization.

An SCP restricts permissions for IAM users and roles in member accounts, including the member account’s root user. If a user or role has an IAM permission policy that grants access to an action that is either not allowed or explicitly denied by the applicable SCPs, the user or role can’t perform that action. You can also define service control policies (SCPs) inside AWS Organizations to enforce EFS encryption for all AWS accounts in your organization.

EC2 instances are unable to mount the file system.

The security groups that you associate with a mount target must allow inbound access for the TCP protocol on the NFS port from the security group used by the instances. To connect your Amazon EFS file system to your Amazon EC2 instance, you must create two security groups: one for your Amazon EC2 instance and another for your Amazon EFS mount target.

Reference: Creating security groups

Mounting EFS file systems

Mounting on supported EC2 instances.
Mounting with IAM authorization.
Mounting with Amazon EFS access points.
Mounting with an on-premise Linux client.
Auto-mounting EFS file systems when an EC2 instance reboots.
Mounting a file system when creating a new EC2 instance.

AWS X-Ray

Debugging in Production, the good old way:
- Test locally
- Add log statements everywhere
- Re-deploy in production
Log formats differ across applications and log analysis is hard.
Debugging: one big monolith “easy”, distributed services “hard”
No common views of your entire architecture

S3 - AWS X-Ray integrates with Amazon S3 to trace upstream requests to update your application’s S3 buckets.
Lambda functions - Lambda runs the X-Ray daemon and records a segment with details about the function invocation and execution.
API Gateway APIs - You can use X-Ray to trace and analyze user requests as they travel through your Amazon API Gateway APIs to the underlying services.

AWS X-Ray advantages

Troubleshooting performance (bottlenecks)
Understand dependencies in a microservice architecture
Pinpoint service issues
Review request behavior
Find errors and exceptions
Are we meeting time SLA?
Where I am throttled?
Identify users that are impacted

DNS Resolution

DNS Resolution is used to enable resolution of public DNS hostnames to private IP addresses when queried from the peered VPC.

`👀` Traffic Mirroring

Traffic Mirroring provides the ability to create a copy of a packet flow to examine the contents of a packet. This feature is useful for threat monitoring, content inspection, and troubleshooting.

A packet is truncated to the MTU value when both of the following are true:

The traffic mirror target is a standalone instance.
The traffic packet size from the mirror source is greater than the MTU size for the traffic mirror target.

👀 IMPORTANT NOTES 👀

By default, Amazon EC2 and Amazon VPC use the IPv4 addressing protocol

Amazon EC2 and Amazon VPC support both the IPv4 and IPv6 addressing protocols. By default, Amazon EC2 and Amazon VPC use the IPv4 addressing protocol; you can’t disable this behavior. When you create a VPC, you must specify an IPv4 CIDR block (a range of private IPv4 addresses). You can optionally assign an IPv6 CIDR block to your VPC and subnets, and assign IPv6 addresses from that block to instances in your subnet.

Dynamo DB

Cross Account access

When you export your DynamoDB tables from Account A to an S3 bucket in Account B, the objects are still owned by Account A. The AWS Identify Access Management (IAM) users in Account B can’t access the objects by default. The export function doesn’t write data with the access control list (ACL) bucket-owner-full-control. As a workaround to this object ownership issue, include the PutObjectAcl permission on all exported objects after the export is complete. This workaround grants access to all exported objects for the bucket owners in Account B.

`ClientConnections` Metric

To track the number of Amazon EC2 instances that are connected to a file system, you can monitor the Sum statistic of the ClientConnections metric. To calculate the average ClientConnections for periods greater than one minute, divide the sum by the number of minutes in the period.

👀 AWS Budgets

Give you the ability to set custom budgets that alert you when your costs or usage exceed (or are forecasted to exceed) your budgeted amount.

You can also use AWS Budgets to set reservation utilization or coverage targets and receive alerts when your utilization drops below the threshold you define. Reservation alerts are supported for Amazon EC2, Amazon RDS, Amazon Redshift, Amazon ElastiCache, and Amazon Elasticsearch reservations.

👀 AWS Cost

In AWS Cost and Usage Reports, you can choose to have AWS publish billing reports to an Amazon Simple Storage Service (Amazon S3) bucket that you own. You can receive reports that break down your costs by the hour or month, by product or product resource, or by tags that you define yourself. AWS updates the report in your bucket once a day in a comma-separated value (CSV) format. You can view the reports using spreadsheet software such as Microsoft Excel or Apache OpenOffice Calc or access them from an application using the Amazon S3 API.

AWS Database Migration Service (DMS)

Amazon Macie

is a fully managed data security and data privacy service that uses machine learning and pattern matching to help you discover, monitor, and protect sensitive data in your AWS environment. Macie automates the discovery of sensitive data, such as personally identifiable information (PII) and financial data, to provide you with a better understanding of the data that your organization stores in Amazon S3. Amazon Macie only supports S3 as a data source.

👀 Run Command

`👀` EC2Rescue

EC2Rescue can help you diagnose and troubleshoot problems on Amazon EC2 Linux and Windows Server instances.

You can run the tool manually, as described in Using EC2Rescue for Linux Server and Using EC2Rescue for Windows Server. Or, you can run the tool automatically by using Systems Manager Automation and the AWSSupport-ExecuteEC2Rescue document. The AWSSupport-ExecuteEC2Rescue document is designed to perform a combination of Systems Manager actions, AWS CloudFormation actions, and Lambda functions that automate the steps normally required to use EC2Rescue.

EC2Rescue for EC2 Windows is a convenient, straightforward, GUI-based troubleshooting tool that can be run on your Amazon EC2 Windows Server instances to troubleshoot operating system-level issues and collect advanced logs and configuration files for further analysis. EC2Rescue simplifies and expedites the troubleshooting of EC2 Windows instances.

Service Control Policies

Groups

Can be granted permissions using access control policies - Groups can be granted permissions using access control policies. This makes it easier to manage permissions for a collection of users, rather than having to manage permissions for each user.https://docs.aws.amazon.com/IAM/latest/UserGuide/id_groups.html

Billing Stuff

Activate a cost allocation tag that is named Department in the AWS Billing and Cost Management console in the Organizations management account. Use a tag policy to mandate a Department tag on new resources.

Correct. You must activate a tag in the Billing and Cost Management console before viewing the expense by cost allocation tag. You should mandate the use of tags to ensure that the resources are tagged correctly.

https://docs.aws.amazon.com/organizations/latest/userguide/orgs_manage_policies_tag-policies.html

👀 Generate Billing Alerts

Before you can create an alarm for your estimated charges, you must enable billing alerts on your Accounts Preferences page first, so that you can monitor your estimated AWS charges and create an alarm using billing metric data. After you enable billing alerts, you cannot disable data collection, but you can delete any billing alarms that you created.

Use the AWS Resource Groups Tag Editor to identify resources that are not tagged in each account. Apply a tag that is named Department to any untagged resources.

With Resource Groups, you can create, maintain, and view a collection of resources that share common tags. Tag Editor manages tags across services and AWS Regions. Tag Editor can perform a global search and can edit a large number of tags at one time.

For more information about resource groups and tagging, see Tag Editor.

👀 Billing Preferences

The management account of an organization can change this setting by turning off RI (Reserved Instances) sharing for an individual member account the more suitable service is AWS WAF

👀 AWS Shield Advanced

is more suitable to be used against distributed denial of service (DDoS) attacks but NOT for common web exploits such as cross-site scripting, SQL injection, and brute-force HTTP flood attacks.

👀 A placement group

is a logical grouping of instances within a single Availability Zone. By placing the EC2 instances in a placement group, you can ensure that the instances are physically located close to each other, which can significantly reduce network latency between them. This can improve the performance of inter-instance communication and reduce the overall latency in data transfer.

Sometimes you want control over the EC2 Instance placement strategy, When you create a placement group, you specify one of the following strategies for the group:

Cluster-clusters instances into a low-latency group in a single Availability Zone.
- Pros: Great network (10 Gbps bandwidth between instances with Enhanced Networking enabled - recommended)
- Cons: If the rack fails, all instances fails at the same time
- Use case: Big Data job that needs to complete fast
Spread-spreads instances across underlying hardware (max 7 instances per group per AZ) - critical applications
- Pros:
  - Can span across Availability Zones (AZ)
  - Reduced risk is simultaneous failure
  - EC2 Instances are on different physical hardware
- Cons:
  - Limited to 7 instances per AZ per placement group
- Use case:
  - Application that needs to maximize high availability
  - Critical Applications where each instance must be isolated from failure from each other
Partition-spreads instances across many different partitions (which rely on different sets of racks) within an AZ. Scales to 100s of EC2 instances per group (Hadoop, Cassandra, Kafka)
- Up to 7 partitions per AZ Can span across multiple AZs in the same region
- Up to 100s of EC2 instances
- The instances in a partition do not share racks with the instances in the other partitions
- A partition failure can affect many EC2 but won’t affect other partitions
- EC2 instances get access to the partition information as metadata
- Use cases: HDFS, HBase, Cassandra, Kafka

👀 Access Analyzer

helps you identify the resources in your organization and accounts, such as Amazon S3 buckets or IAM roles, shared with an external entity. This lets you identify unintended access to your resources and data, which is a security risk. Access Analyzer identifies resources shared with external principals by using logic-based reasoning to analyze the resource-based policies in your AWS environment. For each instance of a resource shared outside of your account, Access Analyzer generates a finding.

AWS System Manager

👀 AWS Systems Manager provides a unified, centralized way to manage both your Amazon EC2 instances and on-premises servers (including Raspbian systems, devices such as Raspberry Pi through a single interface). It offers a wide range of capabilities, including inventory management, patch management, automation, and configuration management, allowing you to efficiently manage your hybrid infrastructure from a single console. With Systems Manager, you can maintain consistent configurations, apply patches, and automate administrative tasks for your on-premises servers, just like you would for your EC2 instances.

Helps you manage your EC2 and On-Premises systems at scale.
Get operational insights about the state of your infrastructure.
Easily detect problems.
Patching automation for enhanced compliance.
Works for both Windows and Linux OS.
Integrated with CloudWatch metrics / dashboards.
Integrated with AWS Config.
Free service.

Main Features for the EXAM

Resource Groups
Shared Resources Documents
Change Management
- Automation
- Maintenance Windows
Application Management
- Parameter Store
Node Management
- Inventory
- Session Manager
- Run Command
- State Manager
- Patch Manager
Create custom runbooks or use pre-defined runbooks maintained by AWS.
Receive notifications about Automation tasks and runbooks by using Amazon EventBridge.
Monitor Automation progress and details by using the AWS Systems Manager console.

👀 Fleet Manager

Helps you remotely manage your server fleet that runs on AWS or on premises. With Fleet Manager, you can gather data from individual instances to perform common troubleshooting and management tasks from a single console. However, you cannot use Fleet Manager to upload a script to start or stop instances.

Recover impaired instances

A Systems Manager Automation document defines the Automation workflow (the actions that Systems Manager performs on your managed instances and AWS resources). Automation includes several pre-defined Automation documents that you can use to perform common tasks like restarting one or more EC2 instances or creating an Amazon Machine Image (AMI).

Use the AWSSupport-ExecuteEC2Rescue document to recover impaired instances.

👀 AWS Systems Manager Inventory

AWS Systems Manager Inventory provides visibility into your Amazon EC2 and on-premises computing environment. You can use Inventory to collect metadata from your managed instances. You can store this metadata in a central Amazon Simple Storage Service (Amazon S3) bucket, and then use built-in tools to query the data and quickly determine which instances are running the software and configurations required by your software policy, and which instances need to be updated. You can configure Inventory on all of your managed instances by using a one-click procedure. You can also configure and view inventory data from multiple AWS Regions and AWS accounts.

If the pre-configured metadata types collected by Systems Manager Inventory don’t meet your needs, then you can create custom inventory. Custom inventory is simply a JSON file with information that you provide and add to the managed instance in a specific directory. When Systems Manager Inventory collects data, it captures this custom inventory data.

Systems Manager Inventory collects only metadata from your managed instances. Inventory doesn’t access proprietary information or data.

AWS Tags

You can add text key-value pairs called Tags to many AWS resources
Commonly used in EC2
Free naming, common tags are Name, Environment, Team …
They’re used for
- Resource grouping
- Automation
- Cost allocation

Resource Groups

Create, view or manage logical group of resources thanks to tags.
Allows creation of logical groups of resources such as
- Applications
- Different layers of an application stack
- Production versus development environments
Regional service
Works with EC2, S3, DynamoDB, Lambda, etc…

SSM - Inventory

Collect metadata from your managed instances (EC2/On-premises)
Metadata includes installed software, OS drivers, configurations, installed updates, running services …
View data in AWS Console or store in S3 and query and analyze using Athena and QuickSight
Specify metadata collection interval (minutes, hours, days)
Query data from multiple AWS accounts and regions
Create Custom Inventory for your custom metadata (e.g., rack location of each managed instance)

AWS Systems Manager - Distributor

Allows you to securely distribute and install software packages, like your custom software’s .msi installer, across a large set of instances.

AWS Systems Manager State Manager

Uses associations to enforce a desired state for your instances. By setting up an association to run the AWS-ConfigureAWSPackage document, you’re effectively telling the Systems Manager to install or update the specified package (in this case, your custom software) on instances with the specified tags

Scalability & High Availability

Scalability means that an application / system can handle greater loads by adapting.

Scalability is linked but different to High Availability

Vertical Scalability

Vertically scalability means increasing the size of the resource (instance)

Horizontal Scalability

Horizontal Scalability means increasing the number of instances / systems for your application

High Availability & Scalability For EC2

Vertical Scaling: Increase instance size (= scale up / down)
- From: t2.nano - 0.5G of RAM, 1 vCPU
- To: u-12tb1.metal - 12.3 TB of RAM, 448 vCPUs
Horizontal Scaling: Increase number of instances (= scale out / in)
- Auto Scaling Group
- Load Balancer
High Availability: Run instances for the same application across multi-AZ
- Auto Scaling Group multi-AZ
- Load Balancer multi-AZ

Gateway Load Balancer

Uses the GENEVE protocol on port 6081

Application Load Balancers

Monitoring

**RequestCountPerTarget**
👀 EXAM SpilloverCount represents the total number of requests that were rejected because the surge queue is full. To solve this use-case, you need to configure the Auto Scaling groups to scale your instances based on the SurgeQueueLength metric.
**SurgeQueueLength**: The total of requests (HTTP listener) or connections (TCP listener) that are pending routing to a healthy instance. Help to scale out ASG. Max value is 1024

Target Groups Settings

deregisteration_delay.timeout_seconds: time the load balancer waits before deregistering a target.
slow_start.duration_seconds: (see next slide).
load_balancing.algorithm.type: how the load balancer selects targets when routing requests (Round Robin, Least Outstanding Requests).
stickiness.enabled.
stickiness.type: application-based or duration-based cookie.
stickiness.app_cookie.cookie_name: name of the application cookie.
stickiness.app_cookie.duration_seconds: application-based cookie expiration period.
stickiness.lb_cookie.duration_seconds: duration-based cookie expiration period.

ASG

Good metrics to scale on

CPUUtilization: Average CPU utilization across your instances (Processing power)
RequestCountPerTarget: to make sure the number of requests per EC2 instances is stable
Average Network In / Out (if you’re application is network bound)
Any custom metric (that you push using CloudWatch)

ApproximateNumberOfMessages -

AWS Auto Scaling

Backbone service of auto scaling for scalable resources in AWS:

Amazon EC2 Auto Scaling groups: Launch or terminate EC2 instances
Amazon EC2 Spot Fleet requests: Launch or terminate instances from a Spot Fleet request, or automatically replace instances that get interrupted for price or capacity reasons.
Amazon ECS: Adjust the ECS service desired count up or down
Amazon DynamoDB (table or global secondary index):WCU & RCU
Amazon Aurora: Dynamic Read Replicas Auto Scaling

Target Groups

EC2 instances (can be managed by an Auto Scaling Group) - HTTP
ECS tasks (managed by ECS itself) - HTTP
Lambda functions - HTTP request is translated into a JSON event
IP Addresses - must be private IPs
ALB can route to multiple target groups
Health checks are at the target group level

UpdatePolicy Attribute

Use it to handle updates for below resources

AWS::AppStream::Fleet
AWS::AutoScaling::AutoScalingGroup
AWS::ElastiCache::ReplicationGroup
AWS::OpenSearchService::Domain
AWS::Elasticsearch::Domain
AWS::Lambda::Alias

AutoScalingReplacingUpdate policy - `EXAM`

AutoScalingRollingUpdate policy

With rolling updates, you can specify whether CloudFormation performs updates in batches or all at once for instances that are in an Auto Scaling group. The AutoScalingRollingUpdate policy is the only CloudFormation feature that provides such an incremental update throughout the Auto Scaling group.

AutoScalingScheduledAction policy

Applies when you update a stack that includes an Auto Scalling group with an associated scheduled action.

Lambda Tracing with X-Ray

Enable in Lambda configuration (Active Tracing)
Environment variables to communicate with X-Ray
- _X_AMZN_TRACE_ID: contains the tracing header
- AWS_XRAY_CONTEXT_MISSING: by default, LOG_ERROR
- AWS_XRAY_DAEMON_ADDRESS: the X-Ray Daemon IP_ADDRESS:PORT

Lambda Function Configuration

RAM -The more RAM you add, the more vCPU credits you get
- At 1,792 MB, a function has the equivalent of one full vCPU

If your application is CPU-bound (computation heavy), increase RAM - *EXAM
Timeout: default 3 seconds, maximum is 900 seconds (15 minutes)

Cold Starts & Provisioned Concurrency

Cold Start:
- New instance => code is loaded and code outside the handler run (init)
- If the init is large (code, dependencies, SDK…) this process can take some time.
- First request served by new instances has higher latency than the rest
Provisioned Concurrency:
- Concurrency is allocated before the function is invoked (in advance)
- So the cold start never happens and all invocations have low latency
- Application Auto Scaling can manage concurrency (schedule or target utilization)
Note:
- Note: cold starts in VPC have been dramatically reduced in Oct & Nov 2019
- https://aws.amazon.com/blogs/compute/announcing-improved-vpc-networking-for-aws-lambda-functions/

Lambda Monitoring - CloudWatch Metrics

Invocations - number of times your function is invoked (success/failure)
Duration - amount of time your function spends processing an event
Errors - number of invocations that result in a function error
Throttles - number of invocation requests that are throttled (no concurrency available)
DeadLetterErrors - number of times Lambda failed to send an event to a DLQ (async invocations)
IteratorAge - time between when a Stream receives a record and when the Event Source Mapping sends the event to the function (for Event Source Mapping that reads from Stream)
ConcurrentExecutions - number of function instances that are processing events

CodeDeploy - 👀 EXAM

CodeDeploy is a deployment service that automates application deployments to Amazon EC2 instances, on-premises instances, or serverless Lambda functions. It allows you to rapidly release new features, update Lambda function versions, avoid downtime during application deployment, and handle the complexity of updating your applications, without many of the risks associated with error-prone manual deployments.

Application: Container
Deployment Group : Setup config setting in code deploy
- Refers to the set of instances of Lambda functions where you deply the code revision
- You can create multiple deployment groups within the application
Deployment Configuration
- Set of conditions and deployment rules that CodeDeploy applies during a deployment.
Application Specificaton (AppSpec) file
- manages deployment stages as lifecycle event hooks.

ElasticSearch - OpenSearch

Amazon ElasticSearch Service is now Amazon OpenSearch Service

May be called Amazon ES at the exam
Managed version of ElasticSearch (open source project)
Needs to run on servers (not a serverless offering)
Use cases:
- Log Analytics
- Real Time application monitoring
- Security Analytics
- Full Text Search
- Clickstream Analytics
- Indexing

ElasticSearch Access Policy

IP-based Policies
- Resource-based policies used to restrict access to an ES domain to IP address(es) or CIDR blocks
- Allows unsigned requests to an ES domain (e.g., curl, Kibana, …)

ElasticSearch - Production Setup

It’s recommended to:

Use 3 dedicated Master nodes
Use at least 2 Data nodes per AZ (for replication)
Deploy the domain across 3 AZ
Create at least one replica or each index in the cluster

AWS IAM Identity Center (successor to AWS Single Sign-On) - EXAM

One login (single sign-on) for all your
- AWS accounts in AWS Organizations
- Business cloud applications (e.g., Salesforce, Box, Microsoft 365, …)
- SAML2.0-enabled applications
- EC2 Windows Instances
Identity providers
Built-in identity store in IAM Identity Center
3rd party: Active Directory (AD), OneLogin, Okta…

AWS Systems Manager - AWS Systems Manager OpsCenter

Provides a central location where operations engineers and IT professionals can manage operational work items (OpsItems) related to AWS resources.

An OpsItem is any operational issue or interruption that needs investigation and remediation. Using OpsCenter, you can view contextual investigation data about each OpsItem, including related OpsItems and related resources. You can also run Systems Manager Automation runbooks to resolve OpsItems.

Amazon EMR - Optional

is the industry-leading cloud big data solution for petabyte-scale data processing, interactive analytics, and machine learning using open-source frameworks such as Apache Spark, Apache Hive, and Presto.

AWS Global Accelerator

service that improves the availability and performance of your applications with local or global users. It provides static IP addresses that act as a fixed entry point to your application endpoints in a single or multiple AWS Regions, such as your Application Load Balancers, Network Load Balancers or Amazon EC2 instances. AWS Global Accelerator will not help in accelerating the file transfer speeds into S3 for the given use-case.

Global Accelerator service does not work with S3. It only supports endpoints like application load balancers, network load balancers, EC2 instances, or elastic IP addresses.

ASG WARM Pool

A warm pool gives your the ability to decrease latency for apps that have exceptionally long boot times, eg because instance need to write massive amounts of data to disk.

How to get an account activation email? (can take up to 24 hours, usually few minutes)

After you choose a Support plan, a confirmation page indicates that your account is being activated. Accounts are usually activated within a few minutes, but the process might take up to 24 hours.

You can sign in to your AWS account during this time. The AWS home page might display a button that shows “Complete Sign Up” during this time, even if you’ve completed all the steps in the sign-up process.

When your account is fully activated, you’ll receive a confirmation email. After you receive this email, you have full access to all AWS services.

Troubleshooting delays in account activation

Account activation can sometimes be delayed. If the process takes more than 24 hours, check the following:

Finish the account activation process. You might have accidentally closed the window for the sign-up process before you’ve added all the necessary information. To finish the sign-up process, open https://aws-portal.amazon.com/gp/aws/developer/registration/index.html and sign in using the email address and password you chose for the account.

Check the information associated with your payment method. Check Payment Methods in the AWS Billing and Cost Management console. Fix any errors in the information.

Contact your financial institution. Financial institutions occasionally reject authorization requests from AWS for various reasons. Contact your payment method’s issuing institution and ask that they approve authorization requests from AWS. Note: AWS cancels the authorization request as soon as it’s approved by your financial institution. You aren’t charged for authorization requests from AWS. Authorization requests might still appear as a small charge (usually 1 USD) on statements from your financial institution.

Check your email for requests for additional information. Check your email to see if AWS needs any information from you to complete the activation process.

Try a different browser.

Contact AWS Support. Contact AWS Support for help. Be sure to mention any troubleshooting steps that you already tried. Note: Don’t provide sensitive information, such as credit card numbers, in any correspondence with AWS.

How to Contact AWS Support?

The top right corner will have Support > Support Center.

Ask them to activate the account, that can take up to 24 hours

Where to find more details?

Details can be found here: https://aws.amazon.com/premiumsupport/knowledge-center/create-and-activate-aws-account/

Section 3: AWS Fundamentals: IAM & EC2

11. AWS Regions and AZs

AWS has Regions all around the world.
Names can be: us-east-1, eu-west-3…
A region is a cluster of data centers.
Most AWS services are region-scoped.

AWS Availability Zones

Each region has many availability zones (usually 3, min is 2, max is 6). Example:
- ap-southeast-2a
- ap-southeast-2b
- ap-southeast-2c
Each availability zone (AZ) is one or more discrete data centers with redundant power, networking, and connectivity.
They’re separate from each other, so that they’re isolated from disasters.
They’re connected with high bandwidth, ultra-low latency networking.

This helps guarantee that multi AZ won’t all fail at once (due to a meteorological disaster for example). Read more here: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.RegionsAndAvailabilityZones.html

https://aws.amazon.com/about-aws/global-infrastructure/

IAM Introduction

IAM (Identity and Access Management)
Your whole AWS security is there:
- Users
- Groups
- Roles
Root account should never be used (and shared)
Users must be created with proper permissions.
IAM is at the center of AWS.
Policies are written in JSON (JavaScript Object Notation.
IAM has a global view.
Permissions are governed by Policies (JSON).
MFA (Multi Factor Authentication) can be setup. ***
IAM has predefined “managed policies”.
We’ll see IAM policies in details in the future.
It’s best to give users the minimal amount of permissions they need to perform their job (least privilege principles).

IAM is a global service (encompasses all regions)

Q: You are getting started with AWS and your manager wants things to remain simple yet secure. He wants the management of engineers to be easy, and not re-invent the wheel every time someone joins your company. What will you do? A: I’ll create multiple IAM users and groups, and assign policies to groups. New users will be added to groups

IAM Federation

Big enterprises usually integrate their own repository of users with IAM
This way, one can login into AWS using their company credentials
Identity Federation uses the SAML standard (Active Directory)

IAM 101 Brain Dump

One IAM User per PHYSICAL PERSON
One IAM Role per Application
IAM credentials should NEVER BE SHARED
Never, ever, ever, ever, write IAM credentials in code. EVER.
And even less, NEVER EVER EVER COMMIT YOUR IAM credentials
Never use the ROOT account except for initial setup.
Never use ROOT IAM Credentials

What is EC2?

EC2 is one of most popular of AWS offering
It mainly consists in the capability of :
Renting virtual machines (EC2)
Storing data on virtual drives (EBS)
Distributing load across machines (ELB)
Scaling the services using an auto-scaling group (ASG)
Knowing EC2 is fundamental to understand how the Cloud works

Hands-On: Launching an EC2 Instance running Linux

We’ll be launching our first virtual server using the AWS Console
We’ll get a first high level approach to the various parameters
We’ll learn how to start / stop / terminate our instance.

Add Tags Define pairs, to identify the instances.

Security Group Define the firewall around the instances

How to SSH into your EC2 Instance

Linux / Mac OS X

We’ll learn how to SSH into your EC2 instance using Linux / Mac.
SSH is one of the most important function. It allows you to control a remote machine, all using the command line.
We will see how we can configure OpenSSH ~/.ssh/config to facilitate the SSH into our EC2 instances.

Public DNS (IPv4) ec2-54-153-150-4.ap-southeast-2.compute.amazonaws.com IPv4 Public IP 54.153.150.4

ssh ec2-user@ec2-54-153-150-4.ap-southeast-2.compute.amazonaws.com ssh ec2-user@54.153.150.4 ssh -i EC2Tutorial.pem ec2-user@54.153.150.4

Q: You are getting a permission error exception when trying to SSH into your Linux Instance A: the key is missing permissions chmod 0400 The exam expects you to know this, even if you used Windows / Putty to SSH into your instances. If you’re a windows user, just have a quick look at the Linux SSH lecture!

EC2 Instance Connect

Connect to your EC2 instance within your browser
No need to use your key file that was downloaded
The “magic” is that a temporary key is uploaded onto EC2 by AWS
Works only out-of-the-box with Amazon Linux 2
Need to make sure the port 22 is still opened!

EC2 Instance Connect (browser-based SSH connection): Connect to your instance using SSH in the console

Introduction to Security Groups

Security Groups are the fundamental of network security in AWS
They control how traffic is allowed into or out of our EC2 Machines.
It is the most fundamental skill to learn to troubleshoot networking issues.
In this lecture, we’ll learn how to use them to allow, inbound and outbound ports.

Security Groups Deeper Dive

Security groups are acting as a “firewall” on EC2 instances.
They regulate:
- Access to Ports
- Authorised IP ranges – IPv4 and IPv6
- Control of inbound network (from other to the instance)
- Control of outbound network (from the instance to other)

Good to know

Can be attached to multiple instances
Locked down to a region / VPC combination
Does live “outside” the EC2 – if traffic is blocked the EC2 instance won’t see it
It’s good to maintain one separate security group for SSH access
If your application is not accessible (time out), then it’s a security group issue
If your application gives a “connection refused“ error, then it’s an application error or it’s not launched
All inbound traffic is blocked by default
All outbound traffic is authorised by default

Q: Security groups can reference all of the following except: - IP Address - CIDR block - Security Group - DNS name [ok]

Private vs Public IP (IPv4)

Networking has two sorts of IPs. IPv4 and IPv6:
IPv4: 1.160.10.240
IPv6: 3ffe:1900:4545:3:200:f8ff:fe21:67cf
In this course, we will only be using IPv4.
IPv4 is still the most common format used online.
IPv6 is newer and solves problems for the Internet of Things (IoT).
IPv4 allows for 3.7 billion different addresses in the public space
IPv4: [0-255].[0-255].[0-255].[0-255].

Fundamental Differences

Public IP:
- Public IP means the machine can be identified on the internet (WWW)
- Must be unique across the whole web (not two machines can have the same public IP).
- Can be geo-located easily
Private IP:
- Private IP means the machine can only be identified on a private network only
- The IP must be unique across the private network
- BUT two different private networks (two companies) can have the same IPs.
- Machines connect to WWW using a NAT + internet gateway (a proxy)
- Only a specified range of IPs can be used as private IP

Elastic IP

With an Elastic IP address, you can mask the failure of an instance or software by rapidly remapping the address to another instance in your account.
You can only have 5 Elastic IP in your account (you can ask AWS to increase that).
Overall, try to avoid using Elastic IP:
- They often reflect poor architectural decisions
- Instead, use a random public IP and register a DNS name to it
- Or, as we’ll see later, use a Load Balancer and don’t use a public IP best pattern (*)

Private vs Public IP (IPv4)

AWS EC2 – Hands On

By default, your EC2 machine comes with:
- A private IP for the internal AWS Network
- A public IP, for the WWW.
When we are doing SSH into our EC2 machines:
- We can’t use a private IP, because we are not in the same network
- We can only use the public IP.
If your machine is stopped and then started, the public IP can change

Launching an Apache Server on EC2

Let’s leverage our EC2 instance
We’ll install an Apache Web Server to display a web page
We’ll create an index.html that shows the hostname of our machine

yum install -y httpd.x86_64

systemctl start httpd.service

# Enable across reboot .
systemctl enable httpd.service

# First test page
echo "Hello World from $(hostname -f)" > /var/www/html/index.html

EC2 User Data

It is possible to bootstrap our instances using an EC2 User data script.
bootstrapping means launching commands when a machine starts.
That script is only run once at the instance first start
EC2 user data is used to automate boot tasks such as:
- Installing updates
- Installing software
- Downloading common files from the internet
- Anything you can think of
The EC2 User Data Script runs with the root user.

More EC2 User Data Hands-On

We want to make sure that this EC2 instance has an Apache HTTP server installed on it – to display a simple web page
For it, we are going to write a user-data script.
This script will be executed at the first boot of the instance.
Let’s get hands on!

Terminate instance.
Create instances

Linux 2

Step 3: User data, As text

#!/bin/bash
# Use this for your user data (script without newlines)
# install httpd (Linux 2 version)
yum update -y
yum install -y httpd.x86_64
systemctl start httpd.service
systemctl enable httpd.service
echo "Hello World from $(hostname -f)" > /var/www/html/index.html

EC2 Instance Launch Types

On Demand Instances: short workload, predictable pricing.
Reserved: (MINIMUM 1 year) E.g: A database.
- Reserved Instances: long workloads
- Convertible Reserved Instances: long workloads with flexible instances.
- Scheduled Reserved Instances: example – every Thursday between 3 and 6 pm
Spot Instances: short workloads, for cheap, can lose instances (less reliable)
Dedicated Instances: no other customers will share your hardware.
Dedicated Hosts: book an entire physical server, control instance placement.

EC2 On Demand

Pay for what you use (billing per second, after the first minute).
Has the highest cost but no upfront payment
No long term commitment
Recommended for short-term and un-interrupted workloads, where you can’t predict how the application will behave. Ideal for elastic workflows

EC2 Reserved Instances

Up to 75% discount compared to On-demand
Pay upfront for what you use with long term commitment
Reservation period can be 1 or 3 years
Reserve a specific instance type
Recommended for steady state usage applications (think database)
Convertible Reserved Instance
- Can change the EC2 instance type (you can make it evolve).
- Up to 54% discount.
Scheduled Reserved Instances
- launch within time window you reserve
- When you require a fraction of day / week / month

EC2 Spot Instances

Can get a discount of up to 90% compared to On-demand
Instances that you can “lose” at any point of time if your max price is less than the current spot price
The MOST cost-efficient instances in AWS
Useful for workloads that are resilient to failure
- Batch jobs
- Data analysis
- Image processing
- …
Not great for critical jobs or databases
Great combo: Reserved Instances for baseline (Web app) + On-Demand & Spot for peaks (Unpredictable, more agility, save money)

Q: You plan on running an open-source MongoDB database year-round on EC2. Which instance launch mode should you choose? A: Reserved instances This will allow you to save cost as you know in advance that the instance will be a up for a full year

EC2 Dedicated Hosts

Physical dedicated EC2 server for your use
Full control of EC2 Instance placement
Visibility into the underlying sockets / physical cores of the hardware
Allocated for your account for a 3 year period reservation
More expensive
Useful for software that have complicated licensing model (BYOL – Bring Your Own License)
Or for companies that have strong regulatory or compliance needs.

EC2 Dedicated Instances

Instances running on hardware that’s dedicated to you.
May share hardware with other instances in same account.
No control over instance placement (can move hardware after Stop / Start).

Which host is right for me?

On demand: coming and staying in resort whenever we like, we pay the full price
Reserved: like planning ahead and if we plan to stay for a long time, we may get a good discount.
Spot instances: the hotel allows people to bid for the empty rooms and the highest bidder keeps the rooms. You can get kicked out at any time
Dedicated Hosts: We book an entire building of the resort

Q: You would like to deploy a database technology and the vendor license bills you based on the physical cores and underlying network socket visibility. Which EC2 launch modes allow you to get visibility into them? A: Dedicated Hosts.

Price Comparison Example – m4.large – us-east-1 Price Type Price (per hour) On-demand $0.10 Spot Instance (Spot Price) $0.032 - $0.045 (up to 90% off) Spot Block (1 to 6 hours) ~ Spot Price Reserved Instance (12 months) – no upfront $0.062 Reserved Instance (12 months) – all upfront $0.058 Reserved Instance (36 months) – no upfront $0.043 Reserved Convertible Instance (12 months) – no upfront $0.071 Reserved Dedicated Instance (12 months) – all upfront $0.064 Reserved Scheduled Instance (recurring schedule on 12 months term) $0.090 – $0.095 (5%-10% off) Dedicated Host On-demand price Dedicated Host Reservation Up to 70% off

EC2 Spot Instance Requests

Can get a discount of up to 90% compared to On-demand
Define max spot price and get the instance while current spot price < max
- The hourly spot price varies based on offer and capacity
- If the current spot price > your max price you can choose to stop or terminate your instance with a 2 minutes grace period.
Other strategy: Spot Block
- “block” spot instance during a specified time frame (1 to 6 hours) without interruptions
- In rare situations, the instance may be reclaimed
Used for batch jobs, data analysis, or workloads that are resilient to failures.
Not great for critical jobs or databases

EC2 Spot Instances Pricing https://console.aws.amazon.com/

Spot Fleets

Spot Fleets = set of Spot Instances + (optional) On-Demand Instances
The Spot Fleet will try to meet the target capacity with price constraints
- Define possible launch pools: instance type (m5.large), OS, Availability Zone
- Can have multiple launch pools, so that the fleet can choose
- Spot Fleet stops launching instances when reaching capacity or max cost
Strategies to allocate Spot Instances:
- lowestPrice: from the pool with the lowest price (cost optimization, short workload)
- diversified: distributed across all pools (great for availability, long workloads)
- capacityOptimized: pool with the optimal capacity for the number of instances
Spot Fleets allow us to automatically request Spot Instances with the lowest price

EC2 Instance Types – Main ones

R: applications that needs a lot of RAM – in-memory caches
C: applications that needs good CPU – compute / databases
M: applications that are balanced (think “medium”) – general / web app
I: applications that need good local I/O (instance storage) – databases
G: applications that need a GPU – video rendering / machine learning
T2 / T3: burstable instances (up to a capacity)
T2 / T3 - unlimited: unlimited burst
Real-world tip: use https://www.ec2instances.info

Burstable Instances (T2/T3)

AWS has the concept of burstable instances (T2/T3 machines).
Burst means that overall, the instance has OK CPU performance.
When the machine needs to process something unexpected (a spike in load for example), it can burst, and CPU can be VERY good.
If the machine bursts, it utilizes “burst credits”
If all the credits are gone, the CPU becomes BAD
If the machine stops bursting, credits are accumulated over time
Burstable instances can be amazing to handle unexpected traffic and getting the insurance that it will be handled correctly
If your instance consistently runs low on credit, you need to move to a different kind of non-burstable instance Credit usage Credit balance

T2/T3 Unlimited

Nov 2017: It is possible to have an “unlimited burst credit balance”
You pay extra money if you go over your credit balance, but you don’t lose in performance
Overall, it is a new offering, so be careful, costs could go high if you’re not monitoring the health of your instances
Read more here: https://aws.amazon.com/blogs/aws/new-t2-unlimitedgoing-beyond-the-burst-with-high-performance

What’s an AMI?

As we saw, AWS comes with base images such as:
- Ubuntu
- Fedora
- RedHat
- Windows
- Etc…
These images can be customised at runtime using EC2 User data
But what if we could create our own image, ready to go?
That’s an AMI – an image to use to create our instances
AMIs can be built for Linux or Windows machines

Why would you use a custom AMI?

Using a custom built AMI can provide the following advantages:
- Pre-installed packages needed
- Faster boot time (no need for ec2 user data at boot time)
- Machine comes configured with monitoring / enterprise software
- Security concerns – control over the machines in the network
- Control of maintenance and updates of AMIs over time
- Active Directory Integration out of the box
- Installing your app ahead of time (for faster deploys when auto-scaling)
- Using someone else’s AMI that is optimised for running an app, DB, etc…
AMI are built for a specific AWS region (!) NO GLOBAL

Q: You built and published an AMI in the ap-southeast-2 region, and your colleague in us-east-1 region cannot see it A: An AMI created for a region can only be seen in that region.

Q: You are launching an EC2 instance in us-east-1 using this Python script snippet:

(we will see SDK in a later section, for now just look at the code reference ImageId)

ec2.create_instances(ImageId=’ami-b23a5e7’, MinCount=1, MaxCount=1) It works well, so you decide to deploy your script in us-west-1 as well. There, the script does not work and fails with “ami not found” error. What’s the problem? A: AMI is region locked and the same ID cannot be used across regions

Using Public AMIs

You can leverage AMIs from other people
You can also pay for other people’s AMI by the hour
- These people have optimised the software
- The machine is easy to run and configure
- You basically rent “expertise” from the AMI creator
AMI can be found and published on the Amazon Marketplace
Warning:
- Do not use an AMI you don’t trust!
- Some AMIs might come with malware or may not be secure for your enterprise

AMI Storage

Your AMI take space and they live in Amazon S3
Amazon S3 is a durable, cheap and resilient storage where most of your backups will live (but you won’t see them in the S3 console)
By default, your AMIs are private, and locked for your account / region
You can also make your AMIs public and share them with other AWS accounts or sell them on the AMI Marketplace

AMI Pricing

AMIs live in Amazon S3, so you get charged for the actual space in takes in Amazon S3
Amazon S3 pricing in US-EAST-1:
- First 50 TB / month: $0.023 per GB
- Next 450 TB / month: $0.022 per GB
Overall it is quite inexpensive to store private AMIs.
Make sure to remove the AMIs you don’t use

Cross Account AMI Copy (FAQ + Exam Tip)

You can share an AMI with another AWS account.
Sharing an AMI does not affect the ownership of the AMI.
If you copy an AMI that has been shared with your account, you are the owner of the target AMI in your account.
To copy an AMI that was shared with you from another account, the owner of the source AMI must grant you read permissions for the storage that backs the AMI, either the associated EBS snapshot (for an Amazon EBS-backed AMI) or an associated S3 bucket (for an instance store-backed AMI).
Limits:
You can’t copy an encrypted AMI that was shared with you from another account. Instead, if the underlying snapshot and encryption key were shared with you, you can copy the snapshot while re-encrypting it with a key of your own. You own the copied snapshot, and can register it as a new AMI. [*]
You can’t copy an AMI with an associated billingProduct code that was shared with you from another account. This includes Windows AMIs and AMIs from the AWS Marketplace. To copy a shared AMI with a billingProduct code, launch an EC2 instance in your account using the shared AMI and then create an AMI from the instance. [*]

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Copy

Placement Groups

Sometimes you want control over the EC2 Instance placement strategy
That strategy can be defined using placement groups
When you create a placement group, you specify one of the following strategies for the group:
- Cluster—clusters instances into a low-latency group in a single Availability Zone
- Spread—spreads instances across underlying hardware (max 7 instances per group per AZ) - critical applications
- Partition—spreads instances across many different partitions (which rely on different sets of racks) within an AZ. Scales to 100s of EC2 instances per group (Hadoop, Cassandra, Kafka)

Cluster

Pros: Great network (10 Gbps bandwidth between instances)
Cons: If the rack fails, all instances fails at the same time
Use case:
- Big Data job that needs to complete fast
- Application that needs extremely low latency and high network throughput

Spread

Pros:
- Can span across Availability Zones (AZ)
- Reduced risk is simultaneous failure
- EC2 Instances are on different physical hardware
Cons:
- Limited to 7 instances per AZ per placement group [*]
Use case:
- Application that needs to maximize high availability
- Critical Applications where each instance must be isolated from failure from each other EC2

Partition

Up to 7 partitions per AZ
Up to 100s of EC2 instances
The instances in a partition do not share racks with the instances in the other partitions
A partition failure can affect many EC2 but won’t affect other partitions
EC2 instances get access to the partition information as metadata
Use cases: HDFS, HBase, Cassandra, Kafka

Q: You would like to make sure your EC2 instances have the highest performance while talking to each other as you’re performing big data analysis. Which placement group should you choose? A: Cluster Cluster placement groups places your instances next to each other giving you high performance computing and networking

Q: You are launching an application on EC2 and the whole process of installing the application takes about 30 minutes. You would like to minimize the total time for your instance to boot up and be operational to serve traffic. What do you recommend? A: Create an AMI after installing and launch from the AMI Creating an AMI after installing the applications allows you to start more EC2 instances directly from that AMI, hence bypassing the need to install the application (as it’s already installed)

Q: You are running a critical workload of three hours per week, on Monday. As a solutions architect, which EC2 Instance Launch Type should you choose to maximize the cost savings while ensuring the application stability? A: Scheduled Reserved Instances

Elastic Network Interfaces (ENI)

Logical component in a VPC that represent a virtual network card
The ENI can have the following attributes:
- Primary private IPv4, one or more secondary IPv4
- One Elastic IP (IPv4) per private IPv4
- One Public IPv4
- One or more security groups
- A MAC address
You can create ENI independently and attach them on the fly (move them) on EC2 instances for failover
Bound to a specific availability zone (AZ)

EC2 Hibernate

We know we can stop, terminate instances
- Stop: the data on disk (EBS) is kept intact in the next start
- Terminate: any EBS volumes (root) also set-up to be destroyed is lost
On start, the following happens:
- First start: the OS boots & the EC2 User Data script is run
- Following starts: the OS boots up
- Then your application starts, caches get warmed up, and that can take time!

EC2 Hibernate

Introducing EC2 Hibernate:
- The in-memory (RAM) state is preserved
- The instance boot is much faster! (the OS is not stopped / restarted)
- Under the hood: the RAM state is written to a file in the root EBS volume
- The root EBS volume must be encrypted
Use cases:
- long-running processing
- saving the RAM state
- services that take time to initialize https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Hibernate.html

EC2 Hibernate – Good to know

Supported instance families - C3, C4, C5, M3, M4, M5, R3, R4, and R5.
Instance RAM size - must be less than 150 GB.
Instance size - not supported for bare metal instances.
AMI: Amazon Linux 2, Linux AMI, Ubuntu & Windows…
Root Volume: must be EBS, encrypted, not instance store, and large
Available for On-Demand and Reserved Instances
An instance cannot be hibernated more than 60 days

EC2 for Solutions Architects

EC2 instances are billed by the second, t2.micro is free tier
On Linux / Mac we use SSH, on Windows we use Putty
SSH is on port 22, lock down the security group to your IP
Timeout issues => Security groups issues
Permission issues on the SSH key => run “chmod 0400”
Security Groups can reference other Security Groups instead of IP ranges (very popular exam question)
Know the difference between Private, Public and Elastic IP
You can customize an EC2 instance at boot time using EC2 User Data

EC2 for Solutions Architects

Know the 4 EC2 launch modes:
- On demand
- Reserved
- Spot instances
- Dedicated Hosts
Know the basic instance types: R,C,M,I,G, T2/T3
You can create AMIs to pre-install software on your EC2 => faster boot
AMI can be copied across regions and accounts
EC2 instances can be started in placement groups:
- Cluster
- Spread

Section 4: High Availability and Scalability: ELB & ASG

Scalability & High Availability

Scalability means that an application / system can handle greater loads by adapting.
There are two kinds of scalability:
- Vertical Scalability
- Horizontal Scalability (= elasticity)
Scalability is linked but different to High Availability
Let’s deep dive into the distinction, using a call center as an example

Vertical Scalability

Vertically scalability means increasing the size of the instance
For example, your application runs on a t2.micro
Scaling that application vertically means running it on a t2.large
Vertical scalability is very common for non distributed systems, such as a database.
RDS, ElastiCache are services that can scale vertically.
There’s usually a limit to how much you can vertically scale (hardware limit) junior operator senior operator

Horizontal Scalability

Horizontal Scalability means increasing the number of instances / systems for your application
Horizontal scaling implies distributed systems.
This is very common for web applications / modern applications
It’s easy to horizontally scale thanks the cloud offerings such as Amazon EC2

High Availability first building in New York

High Availability usually goes hand in hand with horizontal scaling
High availability means running your application / system in at least 2 data centers (== Availability Zones)
The goal of high availability is to survive a data center loss
The high availability can be passive (for RDS Multi AZ for example)
The high availability can be active (for horizontal scaling)

High Availability & Scalability For EC2

Vertical Scaling: Increase instance size (= scale up / down)
- From: t2.nano - 0.5G of RAM, 1 vCPU
- To: u-12tb1.metal – 12.3 TB of RAM, 448 vCPUs
Horizontal Scaling: Increase number of instances (= scale out / in)
- Auto Scaling Group
- Load Balancer
High Availability: Run instances for the same application across multi AZ
- Auto Scaling Group multi AZ
- Load Balancer multi AZ

What is load balancing?

Load balancers are servers that forward internet traffic to multiple servers (EC2 Instances) downstream.

Why use a load balancer?

Spread load across multiple downstream instances
Expose a single point of access (DNS) to your application
Seamlessly handle failures of downstream instances
Do regular health checks to your instances
Provide SSL termination (HTTPS) for your websites
Enforce stickiness with cookies
High availability across zones - Separate public traffic from private traffic

Why use an EC2 Load Balancer?

An ELB (EC2 Load Balancer) is a managed load balancer
- AWS guarantees that it will be working
- AWS takes care of upgrades, maintenance, high availability
- AWS provides only a few configuration knobs
It costs less to setup your own load balancer but it will be a lot more effort on your end.
It is integrated with many AWS offerings / services

Health Checks

Health Checks are crucial for Load Balancers
They enable the load balancer to know if instances it forwards traffic to are available to reply to requests
The health check is done on a port and a route (/health is common)
(If the response is not 200 (OK), then the instance is unhealthy)

Types of load balancer on AWS

AWS has 3 kinds of managed Load Balancers
Classic Load Balancer (v1 - old generation) – 2009
- HTTP, HTTPS, TCP
Application Load Balancer (v2 - new generation) – 2016
- HTTP, HTTPS, WebSocket
Network Load Balancer (v2 - new generation) – 2017
- TCP, TLS (secure TCP) & UDP
Overall, it is recommended to use the newer / v2 generation load balancers as they provide more features
You can setup internal (private) or external (public) ELBs

Load Balancer Good to Know

LBs can scale but not instantaneously – contact AWS for a “warm-up”
Troubleshooting
- 4xx errors are client induced errors
- 5xx errors are application induced errors
- Load Balancer Errors 503 means at capacity or no registered target
- If the LB can’t connect to your application, check your security groups!
Monitoring
- ELB access logs will log all access requests (so you can debug per request)
- CloudWatch Metrics will give you aggregate statistics (ex: connections count)

Classic Load Balancers (v1)

Supports TCP (Layer 4), HTTP & HTTPS (Layer 7)
Health checks are TCP or HTTP based
Fixed hostname XXX.region.elb.amazonaws.com

Application Load Balancer (v2)

Application load balancers is Layer 7 (HTTP)
Load balancing to multiple HTTP applications across machines (target groups)
Load balancing to multiple applications on the same machine (ex: containers)
Support for HTTP/2 and WebSocket
Support redirects (from HTTP to HTTPS for example)
Routing tables to different target groups:
- Routing based on path in URL (example.com/users & example.com/posts)
- Routing based on hostname in URL (one.example.com & other.example.com)
- Routing based on Query String, Headers (example.com/users?id=123&order=false)
ALB are a great fit for micro services & container-based application (example: Docker & Amazon ECS)
Has a port mapping feature to redirect to a dynamic port in ECS
In comparison, we’d need multiple Classic Load Balancer per application

Application Load Balancer (v2) Target Groups

EC2 instances (can be managed by an Auto Scaling Group) – HTTP
ECS tasks (managed by ECS itself) – HTTP
Lambda functions – HTTP request is translated into a JSON event
IP Addresses – must be private IPs
ALB can route to multiple target groups
Health checks are at the target group level

Good to Know

Fixed hostname (XXX.region.elb.amazonaws.com)
The application servers don’t see the IP of the client directly
- The true IP of the client is inserted in the header X-Forwarded-For
- We can also get Port (X-Forwarded-Port) and proto (X-Forwarded-Proto)

Network Load Balancer (v2)

Network load balancers (Layer 4) allow to:
- Forward TCP & UDP traffic to your instances
- Handle millions of request per seconds
- Less latency ~100 ms (vs 400 ms for ALB)
NLB has one static IP per AZ, and supports assigning Elastic IP (helpful for whitelisting specific IP)
NLB are used for extreme performance, TCP or UDP traffic
Not included in the AWS free tier

Load Balancer Stickiness

It is possible to implement stickiness so that the same client is always redirected to the same instance behind a load balancer
This works for Classic Load Balancers & Application Load Balancers
The “cookie” used for stickiness has an expiration date you control
Use case: make sure the user doesn’t lose his session data
Enabling stickiness may bring imbalance to the load over the backend EC2 instances

Cross-Zone Load Balancing

It is possible to implement stickiness so that the same client is always redirected to the same instance behind a load balancer
This works for Classic Load Balancers & Application Load Balancers
The “cookie” used for stickiness has an expiration date you control
Use case: make sure the user doesn’t lose his session data
Enabling stickiness may bring imbalance to the load over the backend EC2 instances
With Cross Zone Load Balancing: each load balancer instance distributes evenly across all registered instances in all AZ
Otherwise, each load balancer node distributes requests evenly across the registered instances in its Availability Zone only
Classic Load Balancer
- Disabled by default
- No charges for inter AZ data if enabled
Application Load Balancer
- Always on (can’t be disabled)
- No charges for inter AZ data
Network Load Balancer
- Disabled by default
- You pay charges ($) for inter AZ data if enabled

SSL/TLS - Basics

An SSL Certificate allows traffic between your clients and your load balancer to be encrypted in transit (in-flight encryption)
SSL refers to Secure Sockets Layer, used to encrypt connections
TLS refers to Transport Layer Security, which is a newer version
Nowadays, TLS certificates are mainly used, but people still refer as SSL
Public SSL certificates are issued by Certificate Authorities (CA)
Comodo, Symantec, GoDaddy, GlobalSign, Digicert, Letsencrypt, etc…
SSL certificates have an expiration date (you set) and must be renewed

Load Balancer - SSL Certificates

The load balancer uses an X.509 certificate (SSL/TLS server certificate)
You can manage certificates using ACM (AWS Certificate Manager)
You can create upload your own certificates alternatively
HTTPS listener:
- You must specify a default certificate
- You can add an optional list of certs to support multiple domains
- Clients can use SNI (Server Name Indication) to specify the hostname they reach
- Ability to specify a security policy to support older versions of SSL / TLS (legacy clients)

SSL – Server Name Indication (SNI)

SNI solves the problem of loading multiple SSL certificates onto one web server (to serve multiple websites)
It’s a “newer” protocol, and requires the client to indicate the hostname of the target server in the initial SSL handshake
The server will then find the correct certificate, or return the default one

Note:

Only works for ALB & NLB (newer generation), CloudFront
Does not work for CLB (older gen)

Elastic Load Balancers – SSL Certificates

Classic Load Balancer (v1)
- Support only one SSL certificate
- Must use multiple CLB for multiple hostname with multiple SSL certificates
Application Load Balancer (v2)
- Supports multiple listeners with multiple SSL certificates
- Uses Server Name Indication (SNI) to make it work
Network Load Balancer (v2)
- Supports multiple listeners with multiple SSL certificates
- Uses Server Name Indication (SNI) to make it work

ELB – Connection Draining [*]

Feature naming:
- CLB: Connection Draining
- Target Group: Deregistration Delay (for ALB & NLB)
Time to complete “in-flight requests” while the instance is de-registering or unhealthy
Stops sending new requests to the instance which is de-registering
Between 1 to 3600 seconds, default is 300 seconds
Can be disabled (set value to 0)
Set to a low value if your requests are short

What’s an Auto Scaling Group?

In real-life, the load on your websites and application can change

In the cloud, you can create and get rid of servers very quickly
The goal of an Auto Scaling Group (ASG) is to:
- Scale out (add EC2 instances) to match an increased load
- Scale in (remove EC2 instances) to match a decreased load
- Ensure we have a minimum and a maximum number of machines running
- Automatically Register new instances

ASGs have the following attributes

A launch configuration
- AMI + Instance Type
- EC2 User Data
- EBS Volumes
- Security Groups
- SSH Key Pair
Min Size / Max Size / Initial Capacity
Network + Subnets Information
Load Balancer Information
Scaling Policies

Auto Scaling Alarms

It is possible to scale an ASG based on CloudWatch alarms
An Alarm monitors a metric (such as Average CPU)
Metrics are computed for the overall ASG instances
Based on the alarm:
- We can create scale-out policies (increase the number of instances)
- We can create scale-in policies (decrease the number of instances)

Auto Scaling New Rules

It is now possible to define ”better” auto scaling rules that are directly managed by EC2
- Target Average CPU Usage
- Number of requests on the ELB per instance
- Average Network In
- Average Network Out
These rules are easier to set up and can make more sense

Auto Scaling Custom Metric

We can auto scale based on a custom metric (ex: number of connected users)
1. Send custom metric from application on EC2 to CloudWatch (PutMetric API)
1. Create CloudWatch alarm to react to low / high values
1. Use the CloudWatch alarm as the scaling policy for ASG

ASG Brain Dump

Scaling policies can be on CPU, Network… and can even be on custom metrics or based on a schedule (if you know your visitors patterns)
ASGs use Launch configurations or Launch Templates (newer)
To update an ASG, you must provide a new launch configuration / launch template
IAM roles attached to an ASG will get assigned to EC2 instances
ASG are free. You pay for the underlying resources being launched
Having instances under an ASG means that if they get terminated for whatever reason, the ASG will automatically create new ones as a replacement. Extra safety!
ASG can terminate instances marked as unhealthy by an LB (and hence replace them)

Auto Scaling Groups – Scaling Policies

Target Tracking Scaling
- Most simple and easy to set-up
- Example: I want the average ASG CPU to stay at around 40%
Simple / Step Scaling
- When a CloudWatch alarm is triggered (example CPU > 70%), then add 2 units
- When a CloudWatch alarm is triggered (example CPU < 30%), then remove 1
Scheduled Actions
- Anticipate a scaling based on known usage patterns
- Example: increase the min capacity to 10 at 5 pm on Fridays

Auto Scaling Groups - Scaling Cooldowns

The cooldown period helps to ensure that your Auto Scaling group doesn’t launch or terminate additional instances before the previous scaling activity takes effect.
In addition to default cooldown for Auto Scaling group, we can create cooldowns that apply to a specific simple scaling policy
A scaling-specific cooldown period overrides the default cooldown period.
One common use for scaling-specific cooldowns is with a scale-in policy—a policy that terminates instances based on a specific criteria or metric. Because this policy terminates instances, Amazon EC2 Auto Scaling needs less time to determine whether to terminate additional instances.
If the default cooldown period of 300 seconds is too long—you can reduce costs y applying a scaling-specific cooldown period of 180 seconds to the scale-in policy.
If your application is scaling up and down multiple times each hour, modify the Auto Scaling Groups cool-down timers and the CloudWatch Alarm Period that triggers the scale in

ASG for Solutions Architects

ASG Default Termination Policy (simplified version):

Find the AZ which has the most number of instances
If there are multiple instances in the AZ to choose from, delete the one with the oldest launch configuration

ASG tries the balance the number of instances across AZ by default

Lifecycle Hooks

By default as soon as an instance is launched in an ASG it’s in service.
You have the ability to perform extra steps before the instance goes in service (Pending state)
You have the ability to perform some actions before the instance is terminated(Terminating state) https://docs.aws.amazon.com/autoscaling/ec2/userguide/lifecycle-hooks.html

Launch Template vs Launch Configuration

Both:
- ID of the Amazon Machine Image (AMI), the instance type, a key pair, security roups, and the other parameters that you use to launch EC2 instances (tags, EC2 user-data…)
Launch Configuration (legacy):
- Must be re-created every time
Launch Template (newer):
- Can have multiple versions
- Create parameters subsets (partial configuration for re-use and inheritance)
- Provision using both On-Demand and Spot instances (or a mix)
- Can use T2 unlimited burst feature
- Recommended by AWS going forward

Q1: Load Balancers provide a A: static DNS name we can use our application The reason being that AWS wants your load balancer to be accessible using a static endpoint, even if the underlying infrastructure that AWS manages changes

Q2: You are running a website with a load balancer and 10 EC2 instances. Your users are complaining about the fact that your website always asks them to re-authenticate when they switch pages. You are puzzled, because it’s working just fine on your machine and in the dev environment with 1 server. What could be the reason? A: The Load Balancer does not have stickiness enabled

Stickiness ensures traffic is sent to the same backend instance for a client. This helps maintaining session data

Question 3: Your application is using an Application Load Balancer. It turns out your application only sees traffic coming from private IP which are in fact your load balancer’s. What should you do to find the true IP of the clients connected to your website? A: Look into the X-Forwarded-For header in the backend

This header is created by your load balancer and passed on to your backend application

Question 4: A: Question 4: You quickly created an ELB and it turns out your users are complaining about the fact that sometimes, the servers just don’t work. You realise that indeed, your servers do crash from time to time. How to protect your users from seeing these crashes? A: Enable Health Checks

Health checks ensure your ELB won’t send traffic to unhealthy (crashed) instances

Question 5: You are designing a high performance application that will require millions of connections to be handled, as well as low latency. The best Load Balancer for this is A: Network Load Balancer NLB provide the highest performance if your application needs it

Question 6: Application Load Balancers handle all these protocols except A: TCP Use a NLB (Network Load Balancer) support TCP instead

Question 7: X The application load balancer can route to different target groups based on all these except… A: Geography - This was discussed in Lecture 40: Elastic Load Balancing (ELB) Overview X: Source IP

Question 8: You are running at desired capacity of 3 and the maximum capacity of 3. You have alarms set at 60% CPU to scale out your application. Your application is now running at 80% capacity. What will happen? A: Nothing The capacity of your ASG cannot go over the maximum capacity you have allocated during scale out events

Question 9: I have an ASG and an ALB, and I setup my ASG to get health status of instances thanks to my ALB. One instance has just been reported unhealthy. What will happen? A: The ASG will terminate the EC2 instance. Because the ASG has been configured to leverage the ALB health checks, unhealthy instances will be terminated

Question 10: Your boss wants to scale your ASG based on the number of requests per minute your application makes to your database. A: You create a CloudWatch custom metric and build an alarm on this to scale your ASG The metric “requests per minute” is not an AWS metric, hence it needs to be a custom metric

Question 11: Scaling an instance from an r4.large to an r4.4xlarge is called A: Vertical

Question 12: Running an application on an auto scaling group that scales the number of instances in and out is called A: Horizontal Scalability

Question 13: You would like to expose a fixed static IP to your end-users for compliance purposes, so they can write firewall rules that will be stable and approved by regulators. Which Load Balancer should you use? A: Network Load Balancer Network Load Balancers expose a public static IP, whereas an Application or Classic Load Balancer exposes a static DNS (URL)

Question 14: A web application hosted in EC2 is managed by an ASG. You are exposing this application through an Application Load Balancer. The ALB is deployed on the VPC with the following CIDR: 192.168.0.0/18. How do you configure the EC2 instance security group to ensure only the ALB can access the port 80? A: Open up the EC2 security on port 80 to the ALB’s security group This is the most secure way of ensuring only the ALB can access the EC2 instances. Referencing by security groups in rules is an extremely powerful rule and many questions at the exam rely on it. Make sure you fully master the concepts behind it!

Question 15: Your application load balancer is hosting 3 target groups with hostnames being users.example.com, api.external.example.com and checkout.example.com. You would like to expose HTTPS traffic for each of these hostnames. How do you configure your ALB SSL certificates to make this work? A: Use SNI SNI (Server Name Indication) is a feature allowing you to expose multiple SSL certs if the client supports it. Read more here: https://aws.amazon.com/blogs/aws/new-application-load-balancer-sni/

Question 16: An ASG spawns across 2 availability zones. AZ-A has 3 EC2 instances and AZ-B has 4 EC2 instances. The ASG is about to go into a scale-in event. What will happen? A: Make sure you remember the Default Termination Policy for ASG. It tries to balance across AZ first, and then delete based on the age of the launch configuration.

Question 17: The Application Load Balancers target groups can be all of these EXCEPT… A: Network Load Balancer

Question 18: You are running an application in 3 AZ, with an Auto Scaling Group and a Classic Load Balancer. It seems that the traffic is not evenly distributed amongst all the backend EC2 instances, with some AZ being overloaded. Which feature should help distribute the traffic across all the available EC2 instances? A: Cross Zone Load Balancing

Question 19: Your Application Load Balancer (ALB) currently is routing to two target groups, each of them is routed to based on hostname rules. You have been tasked with enabling HTTPS traffic for each hostname and have loaded the certificates onto the ALB. Which ALB feature will help it choose the right certificate for your clients? A: Server Name Indication (SNI)

Question 20: An application is deployed with an Application Load Balancer and an Auto Scaling Group. Currently, the scaling of the Auto Scaling Group is done manually and you would like to define a scaling policy that will ensure the average number of connections to your EC2 instances is averaging at around 1000. Which scaling policy should you use? A: Target Tracking

EBS & EFS

What’s an EBS Volume?

An EC2 machine loses its root volume (main drive) when it is manually terminated.
Unexpected terminations might happen from time to time (AWS would email you)
Sometimes, you need a way to store your instance data somewhere
An EBS (Elastic Block Store) Volume is a network drive you can attach to your instances while they run
It allows your instances to persist data Amazon EBS

EBS Volume

It’s a network drive (i.e. not a physical drive)
- It uses the network to communicate the instance, which means there might be a bit of latency
- It can be detached from an EC2 instance and attached to another one quickly.
It’s locked to an Availability Zone (AZ)
- An EBS Volume in us-east-1a cannot be attached to us-east-1b
- To move a volume across, you first need to snapshot it
Have a provisioned capacity (size in GBs, and IOPS)
- You get billed for all the provisioned capacity
- You can increase the capacity of the drive over time

EBS Volume Types

EBS Volumes come in 4 types
- GP2 (SSD): General purpose SSD volume that balances price and performance for a wide variety of workloads
- IO1 (SSD): Highest-performance SSD volume for mission-critical low-latency or high- throughput workloads
- ST1 (HDD): Low cost HDD volume designed for frequently accessed, throughput- intensive workloads
- SC1 (HDD): Lowest cost HDD volume designed for less frequently accessed workloads
EBS Volumes are characterized in Size | Throughput | IOPS (I/O Ops Per Sec)
When in doubt always consult the AWS documentation – it’s good!
Only GP2 and IO1 can be used as boot volumes

$ lsblk $ sudo file -s /dev/xvdb /dev/xvdb: data $ sudo mkfs -t ext4 /dev/xvdb mke2fs 1.42.9 (28-Dec-2013) Filesystem label= OS type: Linux Block size=4096 (log=2) Fragment size=4096 (log=2) Stride=0 blocks, Stripe width=0 blocks 131072 inodes, 524288 blocks 26214 blocks (5.00%) reserved for the super user First data block=0 Maximum filesystem blocks=536870912 16 block groups 32768 blocks per group, 32768 fragments per group 8192 inodes per group Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912

Allocating group tables: done Writing inode tables: done Creating journal (16384 blocks): done Writing superblocks and filesystem accounting information: done

[ec2-user@ip-172-31-15-70 ~]$ sudo mkdir /mnt/data [ec2-user@ip-172-31-15-70 ~]$ sudo mount /dev/xvdb /mnt/data/ [ec2-user@ip-172-31-15-70 ~]$ lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT xvda 202:0 0 8G 0 disk └─xvda1 202:1 0 8G 0 part / xvdb 202:16 0 2G 0 disk /mnt/data

EBS Volume Types Use cases GP2 (from AWS doc)

Recommended for most workloads
System boot volumes
Virtual desktops
Low-latency interactive apps
Development and test environments
1 GiB - 16 TiB
Small gp2 volumes can burst IOPS to 3000
Max IOPS is 16,000…
3 IOPS per GB, means at 5,334GB we are at the max IOPS

EBS Volume Types Use cases

IO1 (from AWS doc)

Critical business applications that require sustained IOPS performance, or more than 16,000 IOPS per volume (gp2 limit)
Large database workloads, such as:
MongoDB, Cassandra, Microsoft SQL Server, MySQL, PostgreSQL, Oracle
4 GiB - 16 TiB
IOPS is provisioned (PIOPS) – MIN 100 - MAX 64,000 (Nitro instances) else MAX 32,000 (other instances)
The maximum ratio of provisioned IOPS to requested volume size (in GiB) is 50:1

EBS Volume Types Use cases

ST1 (from AWS doc)

Streaming workloads requiring consistent, fast throughput at a low price.
Big data, Data warehouses, Log processing
Apache Kafka
Cannot be a boot volume
500 GiB - 16 TiB
Max IOPS is 500
Max throughput of 500 MiB/s – can burst

EBS Volume Types Use cases

SC1 (from AWS doc)

Throughput-oriented storage for large volumes of data that is infrequently accessed
Scenarios where the lowest storage cost is important
Cannot be a boot volume
500 GiB - 16 TiB
Max IOPS is 250
Max throughput of 250 MiB/s – can burst

EBS –Volume Types Summary

gp2: General Purpose Volumes (cheap)
- 3 IOPS / GiB, minimum 100 IOPS, burst to 3000 IOPS, max 16000 IOPS
- 1 GiB – 16 TiB , +1 TB = +3000 IOPS
io1: Provisioned IOPS (expensive)
- Min 100 IOPS, Max 64000 IOPS (Nitro) or 32000 (other)
- 4 GiB - 16 TiB. Size of volume and IOPS are independent
st1: Throughput Optimized HDD
- 500 GiB – 16 TiB , 500 MiB /s throughput
sc1: Cold HDD, Infrequently accessed data
- 500 GiB – 16 TiB , 250 MiB /s throughput

EBS Snapshots

Incremental – only backup changed blocks [*]
EBS backups use IO and you shouldn’t run them while your application is handling a lot of traffic
Snapshots will be stored in S3 (but you won’t directly see them)
Not necessary to detach volume to do snapshot, but recommended
Max 100,000 snapshots
Can copy snapshots across AZ or Region
Can make Image (AMI) from Snapshot
EBS volumes restored by snapshots need to be pre-warmed (using fio or dd command to read the entire volume)
Snapshots can be automated using Amazon Data Lifecycle Manager

Tip: Use “Create Snapshot Lifecycle Policy” for backups

EBS Migration

EBS Volumes are only locked to a specific AZ
To migrate it to a different AZ (or region):
- Snapshot the volume
- (optional) Copy the volume to a different region
- Create a volume from the snapshot in the AZ of your choice
Let’s practice!

EBS Encryption

When you create an encrypted EBS volume, you get the following:
- Data at rest is encrypted inside the volume
- All the data in flight moving between the instance and the volume is encrypted
- All snapshots are encrypted
- All volumes created from the snapshot
Encryption and decryption are handled transparently (you have nothing to do)
Encryption has a minimal impact on latency
EBS Encryption leverages keys from KMS (AES-256)
Copying an unencrypted snapshot allows encryption
Snapshots of encrypted volumes are encrypted

Encryption: encrypt an unencrypted EBS volume

Create an EBS snapshot of the volume
Encrypt the EBS snapshot ( using copy )
Create new ebs volume from the snapshot ( the volume will also be encrypted )
Now you can attach the encrypted volume to the original instance

EBS vs Instance Store

Some instance do not come with Root EBS volumes
Instead, they come with “Instance Store” (= ephemeral storage)
Instance store is physically attached to the machine (EBS is a network drive)
Pros:
- Better I/O performance (EBS gp2 has an max IOPS of 16000, io1 of 64000)
- Good for buffer / cache / scratch data / temporary content
- Data survives reboots
Cons:
- On stop or termination, the instance store is lost
- You can’t resize the instance store
- Backups must be operated by the user

Local EC2 Instance Store

Physical disk attached to the physical server where your EC2 is
Very High IOPS (because physical)
Disks up to 7.5 TiB (can change over time), stripped to reach 30 TiB (can change over time…)
Block Storage (just like EBS)
Cannot be increased in size
Risk of data loss if hardware fails

EBS RAID Options

EBS is already redundant storage (replicated within an AZ)
But what if you want to increase IOPS to say 100 000 IOPS?
What if you want to mirror your EBS volumes?
You would mount volumes in parallel in RAID settings!
RAID is possible as long as your OS supports it
Some RAID options are:
- RAID 0
- RAID 1
- RAID 5 (not recommended for EBS – see documentation)
- RAID 6 (not recommended for EBS – see documentation)
We’ll explore RAID 0 and RAID 1

RAID 0 (increase performance) [*]

Combining 2 or more volumes and getting the total disk space and I/O
But one disk fails, all the data is failed
Use cases would be:
- An application that needs a lot of IOPS and doesn’t need fault-tolerance
- A database that has replication already built-in
Using this, we can have a very big disk with a lot of IOPS
For example
- two 500 GiB Amazon EBS io1 volumes with 4,000 provisioned IOPS each will create a…
- 1000 GiB RAID 0 array with an available bandwidth of 8,000 IOPS and 1,000 MB/s of throughput

RAID 1 (increase fault tolerance)

RAID 1 = Mirroring a volume to another
If one disk fails, our logical volume is still working
We have to send the data to two EBS volume at the same time (2x network)
Use case:
- Application that need increase volume fault tolerance
- Application where you need to service disks
For example:
- two 500 GiB Amazon EBS io1 volumes with 4,000 provisioned IOPS each will create a…
- 500 GiB RAID 1 array with an available bandwidth of 4,000 IOPS and 500 MB/s of throughput

EFS – Elastic File System

Managed NFS (network file system) that can be mounted on many EC2
EFS works with EC2 instances in multi-AZ
Highly available, scalable, expensive (3x gp2), pay per use

EFS – Elastic File System

Use cases: content management, web serving, data sharing, Wordpress
Uses NFSv4.1 protocol
Uses security group to control access to EFS
Compatible with Linux based AMI (not Windows)
Encryption at rest using KMS
POSIX file system (~Linux) that has a standard file API
File system scales automatically, pay-per-use, no capacity planning!

EFS – Performance & Storage Classes [*]

EFS Scale
- 1000s of concurrent NFS clients, 10 GB+ /s throughput
- Grow to Petabyte-scale network file system, automatically
Performance mode (set at EFS creation time)
- General purpose (default): latency-sensitive use cases (web server, CMS, etc…)
- Max I/O – higher latency, throughput, highly parallel (big data, media processing)
Storage Tiers (lifecycle management feature – move file after N days)
- Standard: for frequently accessed files
- Infrequent access (EFS-IA): cost to retrieve files, lower price to store

To set up your EC2 instance:

Using the Amazon EC2 console, associate your EC2 instance with a VPC security group that enables access to your mount target. For example, if you assigned the “default” security group to your mount target, you should assign the “default” security group to your EC2 instance. Learn more Open an SSH client and connect to your EC2 instance. (Find out how to connect.) If you’re using an Amazon Linux EC2 instance, install the EFS mount helper with the following command: sudo yum install -y amazon-efs-utils You can still use the EFS mount helper if you’re not using an Amazon Linux instance. Learn more

If you’re not using the EFS mount helper, install the NFS client on your EC2 instance: On a Red Hat Enterprise Linux or SUSE Linux instance, use this command: sudo yum install -y nfs-utils On an Ubuntu instance, use this command: sudo apt-get install nfs-common Mounting your file system

Open an SSH client and connect to your EC2 instance. (Find out how to connect). Create a new directory on your EC2 instance, such as “efs”. sudo mkdir efs Mount your file system with a method listed following. If you need encryption of data in transit, use the EFS mount helper and the TLS mount option. Mounting considerations Using the EFS mount helper: sudo mount -t efs fs-776c8b4f:/ efs Using the EFS mount helper and the TLS mount option: sudo mount -t efs -o tls fs-776c8b4f:/ efs Using the NFS client: sudo mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport fs-776c8b4f.efs.ap-southeast-2.amazonaws.com:/ efs

EBS vs EFS – Elastic Block Storage

EBS volumes…
- can be attached to only one instance at a time
- are locked at the Availability Zone (AZ) level
- gp2: IO increases if the disk size increases
- io1: can increase IO independently
To migrate an EBS volume across AZ
- Take a snapshot
- Restore the snapshot to another AZ
- EBS backups use IO and you shouldn’t run them while your application is handling a lot of traffic
Root EBS Volumes of instances get terminated by default if the EC2 instance gets terminated.

EBS vs EFS – Elastic File System

Mounting 100s of instances across AZ
EFS share website files (WordPress)
Only for Linux Instances (POSIX)
EFS has a higher price point than EBS
Can leverage EFS-IA for cost savings
Remember: EFS vs EBS vs Instance Store

EC2 Data Management - EBS & EFS Quiz

Question 1: Your instance in us-east-1a just got terminated, and the attached EBS volume is now available. Your colleague tells you he can’t seem to attach it to your instance in us-east-1b. A: EBS volumes are AZ locked EBS Volumes are created for a specific AZ. It is possible to migrate them between different AZ through backup and restore

Question 2: You have provisioned an 8TB gp2 EBS volume and you are running out of IOPS. What is NOT a way to increase performance? A: Increase the EBS volume size. EBS IOPS peaks at 16,000 IOPS. or equivalent 5334 GB.

Question 3: You would like to leverage EBS volumes in parallel to linearly increase performance, while accepting greater failure risks. Which RAID mode helps you in achieving that? A: RAID 0 See https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/raid-config.html

Question 4: Although EBS is already a replicated solution, your company SysOps advised you to use a RAID mode that will mirror data and will allow your instance to not be affected if an EBS volume entirely fails. Which RAID mode did he recommend to you? A: RAID 1 See https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/raid-config.html

Question 5: You would like to have the same data being accessible as an NFS drive cross AZ on all your EC2 instances. What do you recommend? EFS is a network file system (NFS) and allows to mount the same file system on EC2 instances that are in different AZ

Question 6: You would like to have a high-performance cache for your application that mustn’t be shared. You don’t mind losing the cache upon termination of your instance. Which storage mechanism do you recommend as a Solution Architect? A: Instance Store Instance Store provide the best disk performance

Question 7: You are running a high-performance database that requires an IOPS of 210,000 for its underlying filesystem. What do you recommend? A: Use EC2 Instance Store Is running a DB on EC2 instance store possible? It is possible to run a database on EC2. It is also possible to use instance store, but there are some considerations to have. The data will be lost if the instance is stopped, but it can be restarted without problems. One can also set up a replication mechanism on another EC2 instance with instance store to have a standby copy. One can also have back-up mechanisms. It’s all up to how you want to set up your architecture to validate your requirements. In this case, it’s around IOPS, and we build an architecture of replication and back up around i

RDS, Aurora & ElastiCache

AWS RDS Overview

RDS stands for Relational Database Service
It’s a managed DB service for DB use SQL as a query language.
It allows you to create databases in the cloud that are managed by AWS
- Postgres
- MySQL
- MariaDB
- Oracle
- Microsoft SQL Server
- Aurora (AWS Proprietary database)

Advantage over using RDS versus deploying DB on EC2

RDS is a managed service:
- Automated provisioning, OS patching
- Continuous backups and restore to specific timestamp (Point in Time Restore)!
- Monitoring dashboards
- Read replicas for improved read performance
- Multi AZ setup for DR (Disaster Recovery)
- Maintenance windows for upgrades
- Scaling capability (vertical and horizontal)
- Storage backed by EBS (gp2 or io1)
BUT you can’t SSH into your instances

RDS Backups

Backups are automatically enabled in RDS
Automated backups:
- Daily full backup of the database (during the maintenance window)
- Transaction logs are backed-up by RDS every 5 minutes
- => ability to restore to any point in time (from oldest backup to 5 minutes ago)
- 7 days retention (can be increased to 35 days)
DB Snapshots:
- Manually triggered by the user
- Retention of backup for as long as you want

RDS Read Replicas for read scalability [*]

Up to 5 Read Replicas
Within AZ, Cross AZ or Cross Region
Replication is ASYNC, so reads are eventually consistent
Replicas can be promoted to their own DB
Applications must update the connection string to leverage read replicas

RDS Read Replicas – Use Cases [*]

You have a production database that is taking on normal load
You want to run a reporting application to run some analytics
You create a Read Replica to run the new workload there
The production application is unaffected
Read replicas are used for SELECT (=read) only kind of statements (not INSERT, UPDATE, DELETE)

RDS Read Replicas – Network Cost

In AWS there’s a network cost when data goes from one AZ to another.
To reduce the cost, you can have your Read Replicas in the same AZ

RDS Multi AZ (Disaster Recovery)

SYNC replication
One DNS name – automatic app failover to standby
Increase availability
Failover in case of loss of AZ, loss of network, instance or storage failure
No manual intervention in apps
Not used for scaling
Note: The Read Replicas be setup as Multi AZ for Disaster Recovery (DR) [*]

RDS Security - Encryption

At rest encryption
- Possibility to encrypt the master & read replicas with AWS KMS - AES-256 encryption
- Encryption has to be defined at launch time
- If the master is not encrypted, the read replicas cannot be encrypted [*]
- Transparent Data Encryption (TDE) available for Oracle and SQL Server
In-flight encryption
- SSL certificates to encrypt data to RDS in flight
- Provide SSL options with trust certificate when connecting to database
- To enforce SSL:
  - PostgreSQL: rds.force_ssl=1 in the AWS RDS Console (Parameter Groups)
  - MySQL: Within the DB: GRANT USAGE ON . TO ‘mysqluser‘@’%’ REQUIRE SSL;

RDS Encryption Operations

Encrypting RDS backups
- Snapshots of un-encrypted RDS databases are un-encrypted
- Snapshots of encrypted RDS databases are encrypted
- Can copy a snapshot into an encrypted one
To encrypt an un-encrypted RDS database:
- Create a snapshot of the un-encrypted database
- Copy the snapshot and enable encryption for the snapshot
- Restore the database from the encrypted snapshot
- Migrate applications to the new database, and delete the old database

RDS Security – Network & IAM

Network Security
- RDS databases are usually deployed within a private subnet, not in a public one
- RDS security works by leveraging security groups (the same concept as for EC2 instances) – it controls which IP / security group can communicate with RDS
Access Management
- IAM policies help control who can manage AWS RDS (through the RDS API)
- Traditional Username and Password can be used to login into the database
- IAM-based authentication can be used to login into RDS MySQL & PostgreSQL

RDS - IAM Authentication

IAM database authentication works with MySQL and PostgreSQL
You don’t need a password, just an authentication token obtained through IAM & RDS API calls
Auth token has a lifetime of 15 minutes
Benefits:
- Network in/out must be encrypted using SSL
- IAM to centrally manage users instead of DB
- Can leverage IAM Roles and EC2 Instance

RDS Security – Summary

Encryption at rest:
- Is done only when you first create the DB instance
- or: unencrypted DB => snapshot => copy snapshot as encrypted => create DB from snapshot
Your responsibility:
- Check the ports / IP / security group inbound rules in DB’s SG
- In-database user creation and permissions or manage through IAM
- Creating a database with or without public access
- Ensure parameter groups or DB is configured to only allow SSL connections
AWS responsibility:
- No SSH access
- No manual DB patching
- No manual OS patching
- No way to audit the underlying instance

Amazon Aurora [*] Exam many question

Aurora is a proprietary technology from AWS (not open sourced)
Postgres and MySQL are both supported as Aurora DB (that means your drivers will work as if Aurora was a Postgres or MySQL database)
Aurora is “AWS cloud optimized” and claims 5x performance improvement over MySQL on RDS, over 3x the performance of Postgres on RDS
Aurora storage automatically grows in increments of 10GB, up to 64 TB.
Aurora can have 15 replicas while MySQL has 5, and the replication process is faster (sub 10 ms replica lag)
Failover in Aurora is instantaneous. It’s HA (High Availability) native.
Aurora costs more than RDS (20% more) – but is more efficient

Aurora High Availability and Read Scaling

6 copies of your data across 3 AZ:
- 4 copies out of 6 needed for writes
- 3 copies out of 6 need for reads
- Self healing with peer-to-peer replication
- Storage is striped across 100s of volumes
One Aurora Instance takes writes (master)
Automated failover for master in less than 30 seconds
Master + up to 15 Aurora Read Replicas serve reads
Support for Cross Region Replication Shared storage Volume

Replication + Self Healing + Auto Expanding

Aurora DB Cluster

Shared storage Volume Auto Expanding from 10G to 64 TB

Reader Endpoint Connection LB

Features of Aurora

Automatic fail-over
Backup and Recovery
Isolation and security
Industry compliance
Push-button scaling
Automated Patching with Zero Downtime
Advanced Monitoring
Routine Maintenance
Backtrack: restore data at any point of time without using backups

Aurora Security

Similar to RDS because uses the same engines
Encryption at rest using KMS
Automated backups, snapshots and replicas are also encrypted
Encryption in flight using SSL (same process as MySQL or Postgres)
You are responsible for protecting the instance with security groups
You can’t SSH

Aurora Serverless

Automated database instantiation and auto
scaling based on actual usage
Good for infrequent, intermittent or unpredictable workloads
No capacity planning needed
Pay per second, can be more cost-effective

Global Aurora

Aurora Cross Region Read Replicas:
- Useful for disaster recovery
- Simple to put in place
Aurora Global Database (recommended):
- 1 Primary Region (read / write)
- Up to 5 secondary (read-only) regions, replication lag is less than 1 second
- Up to 16 Read Replicas per secondary region
- Helps for decreasing latency
- Promoting another region (for disaster recovery) has an RTO of < 1 minute

dev/test db.t2.small

Amazon ElastiCache Overview

The same way RDS is to get managed Relational Databases…
ElastiCache is to get managed Redis or Memcached
Caches are in-memory databases with really high performance, low latency
Helps reduce load off of databases for read intensive workloads
Helps make your application stateless
Write Scaling using sharding
Read Scaling using REad Replicas
Multi AZ with Failover Replicas.
AWS takes care of OS maintenance / patching, optimizations, setup, configuration, monitoring, failure recovery and backups
Using ElastiCache involves heavy application code changes

ElastiCache

Solution Architecture - DB Cache

Applications queries ElastiCache, if not available, get from RDS and store in ElastiCache.
Helps relieve load in RDS
Cache must have an invalidation strategy to make sure only the most current data

Solution Architecture – User Session Store

User logs into any of the application
The application writes the session data into ElastiCache
The user hits another instance of our application
The instance retrieves the data and the user is already logged in

ElastiCache – Redis vs Memcached

REDIS

Multi AZ with Auto-Failover
Read Replicas to scale reads and have high availability
Data Durability using AOF persistence
Backup and restore features MEMCACHED
Multi-node for partitioning of data (sharding)
Non persistent
No backup and restore
Multi-threaded architecture

Replication

ElastiCache – Cache Security

All caches in ElastiCache:
- Support SSL in flight encryption
- Do not support IAM authentication
- IAM policies on ElastiCache are only used for AWS API-level security
Redis AUTH
- You can set a “password/token” when you create a Redis cluster
- This is an extra level of security for your cache (on top of security groups)
Memcached
- Supports SASL-based authentication (advanced)

ElastiCache for Solutions Architects

Patterns for ElastiCache
- Lazy Loading: all the read data is cached, data can become stale in cache
- Write Through: Adds or update data in the cache when written to a DB (no stale data)
- Session Store: store temporary session data in a cache (using TTL features)
Quote: There are only two hard things in Computer Science: cache invalidation and naming things

RDS / Aurora / ElastiCache Quiz

Quiz 5

Question 1: My company would like to have a MySQL database internally that is going to be available even in case of a disaster in the AWS Cloud. I should setup A: Multi AZ

In this question, we consider a disaster to be an entire Availability Zone going down. In which case Multi-AZ will help. If we want to plan against an entire region going down, backups and replication across regions would help.

Question 2: Our RDS database struggles to keep up with the demand of the users from our website. Our million users mostly read news, and we don’t post news very often. Which solution is NOT adapted to this problem? A: An ElasticCache Cluster RDS Read Replicas

Multi AZ

Be very careful with the way you read questions at the exam. Here, the question is asking which solution is NOT adapted to this problem. ElastiCache and RDS Read Replicas do indeed help with scaling reads.

Question 3: We have setup read replicas on our RDS database, but our users are complaining that upon updating their social media posts, they do not see the update right away A: Read Replicas have asynchronous replication and therefore it’s likely our users will only observe eventual consistency

Question 4: Which RDS Classic (not Aurora) feature does not require us to change our SQL connection string? A: Multi AZ

Multi AZ keeps the same connection string regardless of which database is up. Read Replicas imply we need to reference them individually in our application as each read replica will have its own DNS name

Question 5: You want to ensure your Redis cluster will always be available A: X Enable Read Replicas

Enable Multi AZ

Multi AZ ensures high availability

Question 6: Your application functions on an ASG behind an ALB. Users have to constantly log back in and you’d rather not enable stickiness on your ALB as you fear it will overload some servers. What should you do? A: Store session data in ElasticCache.

Storing Session Data in ElastiCache is a common pattern to ensuring different instances can retrieve your user’s state if needed.

Question 7: One analytics application is currently performing its queries against your main production database. These queries slow down the database which impacts the main user experience. What should you do to improve the situation? A: Setup a Read Replicas

Read Replicas will help as our analytics application can now perform queries against it, and these queries won’t impact the main production database.

Question 8: You have a requirement to use TDE (Transparent Data Encryption) on top of KMS. Which database technology does NOT support TDE on RDS? A: PostgreSQL

Question 9: Which RDS database technology does NOT support IAM authentication? A: Oracle

Question 10: You would like to ensure you have a database available in another region if a disaster happens to your main region. Which database do you recommend? A: Aurora Global Database

Global Databases allow you to have cross region replication

Question 11: How can you enhance the security of your Redis cache to force users to enter a password? A: Use Redis AUTH

Question 12: [SAA-C02] Your company has a production Node.js application that is using RDS MySQL 5.6 as its data backend. A new application programmed in Java will perform some heavy analytics workload to create a dashboard, on a regular hourly basis. You want to the final solution to minimize costs and have minimal disruption on the production application, what should you do? A: Create a Read Replica in the same AZ and run the analytics workload on the replica database

this will minimize cost because the data won’t have to move across AZ

Question 13: [SAA-C02] You would like to create a disaster recovery strategy for your RDS PostgreSQL database so that in case of a regional outage, a database can be quickly made available for Read and Write workload in another region. The DR database must be highly available. What do you recommend? A: Create a RR in a different region and enable multi-AZ on the main database.

Question 14: You are managing a PostgreSQL database and for security reasons, you would like to ensure users are authenticated using short-lived credentials. What do you suggest doing? A: Use PostgreSQL for RDS and authenticate using a token obtained through the RDS service.

In this case, IAM is leveraged to obtain the RDS service token, so this is the IAM authentication use case.

[SAA-C02] An application is running in production, using an Aurora database as its backend. Your development team would like to run a version of the application in a scaled-down application, but still, be able to perform some heavy workload on a need-basis. Most of the time, the application will be unused. Your CIO has tasked you with helping the team while minimizing costs. What do you suggest? A:

[SAA-C02] List of Ports to be familiar with Here’s a list of standard ports you should see at least once. You shouldn’t remember them (the exam will not test you on that), but you should be able to differentiate between an Important (HTTPS - port 443) and a database port (PostgreSQL - port 5432)

Important ports:

FTP: 21 SSH: 22 SFTP: 22 (same as SSH) HTTP: 80 HTTPS: 443

vs RDS Databases ports:

PostgreSQL: 5432 MySQL: 3306 Oracle RDS: 1521 MSSQL Server: 1433 MariaDB: 3306 (same as MySQL)

Aurora: 5432 (if PostgreSQL compatible) or 3306 (if MySQL compatible)

Don’t stress out on remember those, just read that list once today and once before going into the exam and you should be all set :)

Remember, you should just be able to differentiate an “Important Port” vs an “RDS database Port”.

Route 53 Section

TTL
CNAME vs Alias
Health Checks
Routing Policies
- Simple
- Weighted
- Latency
- Failover
- Geolocation
- Multi Value
3rd party domains integration

AWS Route 53 Overview

Route53 is a Managed DNS (Domain Name System)
DNS is a collection of rules and records which helps clients understand how to reach a server through URLs.
In AWS, the most common records are:
- A: hostname to IPv4
- AAAA: hostname to IPv6
- CNAME: hostname to hostname
- Alias: hostname to AWS resource.

AWS Route 53 Overview

Route53 can use:
public domain names you own (or buy) application1.mypublicdomain.com
private domain names that can be resolved by your instances in your VPCs. application1.company.internal
Route53 has advanced features such as:
Load balancing (through DNS – also called client load balancing)
Health checks (although limited…)
Routing policy: simple, failover, geolocation, latency, weighted, multi value
You pay $0.50 per month per hosted zone

DNS Records TTL (Time to Live)

High TTL: (e.g. 24hr)
- Less traffic on DNS
- Possibly outdated records
Low TTL: (e.g 60 s)
- More traffic on DNS
- Records are outdated for less time
- Easy to change records
TTL is mandatory for each DNS record

CNAME vs Alias [*]

AWS Resources (Load Balancer, CloudFront…) expose an AWS hostname: lb1-1234.us-east-2.elb.amazonaws.com and you want myapp.mydomain.com
CNAME:
- Points a hostname to any other hostname. (app.mydomain.com => blabla.anything.com)
- ONLY FOR NON ROOT DOMAIN (aka. something.mydomain.com)
Alias:
- Points a hostname to an AWS Resource (app.mydomain.com => blabla.amazonaws.com)
- Works for ROOT DOMAIN and NON ROOT DOMAIN (aka mydomain.com)
- Free of charge
- Native health check

Simple Routing Policy

Maps a hostname to another hostname
Use when you need to redirect to a single resource
You can’t attach health checks to simple routing policy
If multiple values are returned, a random one is chosen by the client

Weighted Routing Policy

Control the % of the requests that go to specific endpoint
Helpful to test 1% of traffic on new app version for example
Helpful to split traffic between two regions
Can be associated with Health Checks

Latency Routing Policy

Redirect to the server that has the least latency close to us
Super helpful when latency of users is a priority
Latency is evaluated in terms of user to designated AWS Region
Germany may be directed to the US (if that’s the lowest latency)

Health Checks

Have X health checks failed => unhealthy (default 3)
After X health checks passed => health (default 3)
Default Health Check Interval: 30s (can set to 10s – higher cost)
About 15 health checkers will check the endpoint health
=> one request every 2 seconds on average
Can have HTTP, TCP and HTTPS health checks (no SSL verification)
Possibility of integrating the health check with CloudWatch
Health checks can be linked to Route53 DNS queries!

Geo Location Routing Policy

Different from Latency based!
This is routing based on user location
Here we specify: traffic from the UK should go to this specific IP
Should create a “default” policy (in case there’s no match on location)

Multi Value Routing Policy

Use when routing traffic to multiple resources
Want to associate a Route 53 health checks with records
Up to 8 healthy records are returned for each Multi Value query
Multi Value is not a substitute for having an ELB

Route53 as a Registrar

A domain name registrar is an organization that manages the reservation of Internet domain names
Famous names:
GoDaddy
Google Domains
Etc…
And also… Route53 (e.g. AWS)!
Domain Registrar != DNS

3rd Party Registrar with AWS Route 53

If you buy your domain on 3rd party website, you can still use Route53.
1. Create a Hosted Zone in Route 53
1. Update NS Records on 3rd party website to use Route 53 name servers
Domain Registrar != DNS
(But each domain registrar usually comes with some DNS features)

Question 1: You have purchased “mycoolcompany.com” on the AWS registrar and would like for it to point to lb1-1234.us-east-2.elb.amazonaws.com . What sort of Route 53 record is NOT POSSIBLE to set up for this? A: CNAME

The DNS protocol does not allow you to create a CNAME record for the top node of a DNS namespace (mycoolcompany.com), also known as the zone apex

Question 2: You have deployed a new Elastic Beanstalk environment and would like to direct 5% of your production traffic to this new environment, in order to monitor for CloudWatch metrics and ensuring no bugs exist. What type of Route 53 records allows you to do so? A: Weighted

Weighted allows you to redirect a part of the traffic based on a weight (hence a percentage). It’s common to use to send a part of a traffic to a new application you’re deploying

Question 3: After updating a Route 53 record to point “myapp.mydomain.com” from an old Load Balancer to a new load balancer, it looks like the users are still not redirected to your new load balancer. You are wondering why… A: It’s because of the TTL

DNS records have a TTL (Time to Live) in order for clients to know for how long to caches these values and not overload the DNS with DNS requests. TTL should be set to strike a balance between how long the value should be cached vs how much pressure should go on the DNS.

Question 4: You want your users to get the best possible user experience and that means minimizing the response time from your servers to your users. Which routing policy will help? A: Latency

Latency will evaluate the latency results and help your users get a DNS response that will minimize their latency (e.g. response time)

Question 5: You have a legal requirement that people in any country but France should not be able to access your website. Which Route 53 record helps you in achieving this? A: Geolocation

Question 6: You have purchased a domain on Godaddy and would like to use it with Route 53. What do you need to change to make this work? A: Create a public hosted zone and update the 3rd party registrar NS records

Private hosted zones are meant to be used for internal network queries and are not publicly accessible. Public Hosted Zones are meant to be used for people requesting your website through the public internet. Finally, NS records must be updated on the 3rd party registrar.

Classic Solutions Architecture

Section Introduction

These solutions architectures are the best part of this course
Let’s understand how all the technologies we’ve seen work together
This is a section you need to be 100% comfortable with
We’ll see the progression of a Solution’s architect mindset through many sample case studies:
- WhatIsTheTime.Com
- MyClothes.Com
- MyWordPress.Com
- Instantiating applications quickly
- Beanstalk

Stateless Web App: WhatIsTheTime.com

WhatIsTheTime.com allows people to know what time it is
We don’t need a database
We want to start small and can accept downtime
We want to fully scale vertically and horizontally, no downtime
Let’s go through the Solutions Architect journey for this app
Let’s see how we can proceed!

In this lecture we’ve discussed…

Public vs Private IP and EC2 instances
Elastic IP vs Route 53 vs Load Balancers
Route 53 TTL, A records and Alias Records
Maintaining EC2 instances manually vs Auto Scaling Groups
Multi AZ to survive disasters
ELB Health Checks
Security Group Rules
Reservation of capacity for costing savings when possible
We’re considering 5 pillars for a well architected application:
- costs,
- performance,
- reliability,
- security,
- operational excellence

Stateful Web App: MyClothes.com

MyClothes.com allows people to buy clothes online.
There’s a shopping cart
Our website is having hundreds of users at the same time
We need to scale, maintain horizontal scalability and keep our web application as stateless as possible
Users should not lose their shopping cart
Users should have their details (address, etc) in a database

In this lecture we’ve discussed…

3-tier architectures for web applications

ELB sticky sessions
Web clients for storing cookies and making our web app stateless
ElastiCache
- For storing sessions (alternative: DynamoDB)
- For caching data from RDS
- Multi AZ
RDS
- For storing user data
- Read replicas for scaling reads
- Multi AZ for disaster recovery
Tight Security with security groups referencing each other

Stateful Web App: MyWordPress.com

We are trying to create a fully scalable WordPress website
We want that website to access and correctly display picture uploads
Our user data, and the blog content should be stored in a MySQL database.

In this lecture we’ve discussed… - Aurora Database to have easy Multi-AZ and Read -Replicas

Storing data in EBS (single instance application) - Vs Storing data in EFS (distributed application)

Instantiating Applications quickly

When launching a full stack (EC2, EBS, RDS), it can take time to:
- Install applications
- Insert initial (or recovery) data
- Configure everything
- Launch the application
We can take advantage of the cloud to speed that up!

Instantiating Applications quickly

EC2 Instances:
- Use a Golden AMI: Install your applications, OS dependencies etc.. beforehand and launch your EC2 instance from the Golden AMI
- Bootstrap using User Data: For dynamic configuration, use User Data scripts
- Hybrid: mix Golden AMI and User Data (Elastic Beanstalk)
RDS Databases:
- Restore from a snapshot: the database will have schemas and data ready!
EBS Volumes:
- Restore from a snapshot: the disk will already be formatted and have data!

Developer problems on AWS

Managing infrastructure
Deploying Code
Configuring all the databases, load balancers, etc
Scaling concerns
Most web apps have the same architecture (ALB + ASG)
All the developers want is for their code to run!
Possibly, consistently across different applications and environments

AWS Elastic Beanstalk Overview

Elastic Beanstalk is a developer centric view of deploying an application on AWS
It uses all the component’s we’ve seen before: EC2, ASG, ELB, RDS, etc…
But it’s all in one view that’s easy to make sense of!
We still have full control over the configuration
Beanstalk is free but you pay for the underlying instances

Elastic Beanstalk

Managed service
- Instance configuration / OS is handled by Beanstalk
- Deployment strategy is configurable but performed by Elastic Beanstalk
Just the application code is the responsibility of the developer
Three architecture models:
- Single Instance deployment: good for dev
- LB + ASG: great for production or pre-production web applications
- ASG only: great for non-web apps in production (workers, etc..)

Elastic Beanstalk

Elastic Beanstalk has three components
Application
Application version: each deployment gets assigned a version
Environment name (dev, test, prod…): free naming
You deploy application versions to environments and can promote application versions to the next environment
Rollback feature to previous application version
Full control over lifecycle of environments

Elastic Beanstalk

Support for many platforms:
- Go
- Java SE
- Java with Tomcat
- .NET on Windows Server with IIS
- Node.js
- PHP
- Python
- Ruby
- Packer Builder
- Single Container Docker
- Multicontainer Docker
- Preconfigured Docker
If not supported, you can write your custom platform (advanced)

Question 1: You have an ASG that scales on demand based on the traffic going to your new website: TriangleSunglasses.Com. You would like to optimise for cost, so you have selected an ASG that scales based on demand going through your ELB. Still, you want your solution to be highly available so you have selected the minimum instances to 2. How can you further optimize the cost while respecting the requirements? A: Reserve two EC2 instances

This is the way to save further costs as we know we will run 2 EC2 instances no matter what.

Question 2: Which of the following will NOT help make our application tier stateless? A: Storing shared data on EBS volumes

EBS volumes are created for a specific AZ and can only be attached to one EC2 instance at a time. This will not help make our application stateles

Question 3: You are looking to store shared software updates data across 100s of EC2 instances. The software updates should be dynamically loaded on the EC2 instances and shouldn’t require heavy operations. What do you sugges A: Store the software updates on EFS and mount EFS as a network drive

EFS is a network file system (NFS) and allows to mount the same file system to 100s of EC2 instances. Publishing software updates their allow each EC2 instance to access them.

Question 4: As a solution architect managing a complex ERP software suite, you are orchestrating a migration to the AWS cloud. The software traditionally takes well over an hour to setup on a Linux machine, and you would like to make sure your application does leverage the ASG feature of auto scaling based on the demand. How do you recommend you speed up the installation process? A: Use a Golden AMI

Golden AMI are a standard in making sure you snapshot a state after an application installation so that future instances can boot up from that AMI quickly.

Question 5: I am creating an application and would like for it to be running with minimal cost in a development environment with Elastic Beanstalk. I should run it in A: Single Instance Mode

This will create one EC2 instance and one Elastic IP

Question 6: My deployments on Elastic Beanstalk have been painfully slow, and after looking at the logs, I realize this is due to the fact that my dependencies are resolved on each EC2 machine at deployment time. How can I speed up my deployment with the minimal impact? A: Create a Golden AMI that contains the dependencies and launch the EC2 instances from that.

Golden AMI are a standard in making sure save the state after the installation or pulling dependencies so that future instances can boot up from that AMI quickly.

S3 Storage and Data Management

Section introduction

Amazon S3 is one of the main building blocks of AWS
It’s advertised as ”infinitely scaling” storage
It’s widely popular and deserves its own section
Many websites use Amazon S3 as a backbone
Many AWS services uses Amazon S3 as an integration as well
We’ll have a step-by-step approach to S3

Amazon S3 Overview - Buckets

Amazon S3 allows people to store objects (files) in “buckets” (directories)
Buckets must have a globally unique name
Buckets are defined at the region level
- Naming convention
- No uppercase
- No underscore
- 3-63 characters long
- Not an IP
- Must start with lowercase letter or number

Amazon S3 Overview - Objects

Objects (files) have a Key
The key is the FULL path:
- s3://my-bucket/my_file.txt
- s3://my-bucket/my_folder1/another_folder/my_file.txt
The key is composed of prefix + object name
- s3://my-bucket/my_folder1/another_folder/my_file.txt
There’s no concept of “directories” within buckets (although the UI will trick you to think otherwise)
Just keys with very long names that contain slashes (“/”)

Amazon S3 Overview – Objects (continued)

Object values are the content of the body:
- Max Object Size is 5TB (5000GB)
- If uploading more than 5GB, must use “multi-part upload”
Metadata (list of text key / value pairs – system or user metadata)
Tags (Unicode key / value pair – up to 10) – useful for security / lifecycle
Version ID (if versioning is enabled)

Amazon S3 - Versioning

You can version your files in Amazon S3
It is enabled at the bucket level
Same key overwrite will increment the “version”: 1, 2, 3….
It is best practice to version your buckets
- Protect against unintended deletes (ability to restore a version)
- Easy roll back to previous version
Notes:
- Any file that is not versioned prior to enabling versioning will have version “null”
- Suspending versioning does not delete the previous versions

S3 Encryption for Objects [*]

There are 4 methods of encrypting objects in S3
- SSE-S3: encrypts S3 objects using keys handled & managed by AWS
- SSE-KMS: leverage AWS Key Management Service to manage encryption keys
- SSE-C: when you want to manage your own encryption keys
- Client Side Encryption
It’s important to understand which ones are adapted to which situation for the exam

SSE-S3

SSE-S3: encryption using keys handled & managed by Amazon S3
Object is encrypted server side
AES-256 encryption type
Must set header: “x-amz-server-side-encryption”: “AES256”

SSE-KMS

SSE-KMS: encryption using keys handled & managed by KMS
KMS Advantages: user control + audit trail
Object is encrypted server side
Must set header: “x-amz-server-side-encryption”: ”aws:kms”

SSE-C

SSE-C: server-side encryption using data keys fully managed by the customer outside of AWS
Amazon S3 does not store the encryption key you provide
HTTPS must be used
Encryption key must provided in HTTP headers, for every HTTP request made

Client Side Encryption

Client library such as the Amazon S3 Encryption Client
Clients must encrypt data themselves before sending to S3
Clients must decrypt data themselves when retrieving from S3
Customer fully manages the keys and encryption cycle’

Encryption in transit (SSL/TLS)

Amazon S3 exposes:
- HTTP endpoint: non encrypted
- HTTPS endpoint: encryption in flight
You’re free to use the endpoint you want, but HTTPS is recommended
Most clients would use the HTTPS endpoint by default
HTTPS is mandatory for SSE-C
Encryption in flight is also called SSL / TLS

S3 Security

User based
- IAM policies - which API calls should be allowed for a specific user from IAM console
Resource Based
- Bucket Policies - bucket wide rules from the S3 console - allows cross account
- Object Access Control List (ACL) – finer grain
- Bucket Access Control List (ACL) – less common
Note: an IAM principal can access an S3 object if
the user IAM permissions allow it OR the resource policy ALLOWS it
AND there’s no explicit DENY

S3 Bucket Policies

JSON based policies
- Resources: buckets and objects
- Actions: Set of API to Allow or Deny
- Effect: Allow / Deny
- Principal: The account or user to apply the policy to
Use S3 bucket for policy to:
- Grant public access to the bucket
- Force objects to be encrypted at upload - Grant access to another account (Cross Account)

Bucket settings for Block Public Access

Block public access to buckets and objects granted through
- new access control lists (ACLs)
- any access control lists (ACLs)
- new public bucket or access point policies
Block public and cross-account access to buckets and objects through any public bucket or access point policies
These settings were created to prevent company data leaks
If you know your bucket should never be public, leave these on
Can be set at the account level

S3 Security - Other

Networking:
- Supports VPC Endpoints (for instances in VPC without www internet)
Logging and Audit:
- S3 Access Logs can be stored in other S3 bucket
- API calls can be logged in AWS CloudTrail
User Security:
- MFA Delete: MFA (multi factor authentication) can be required in versioned buckets to delete objects
- Pre-Signed URLs: URLs that are valid only for a limited time (ex: premium video service for logged in users)

S3 Websites

S3 can host static websites and have them accessible on the www
The website URL will be:
- .s3-website-.amazonaws.com OR
- .s3-website..amazonaws.com
If you get a 403 (Forbidden) error, make sure the bucket policy allows public reads!

CORS - Explained

An origin is a scheme (protocol), host (domain) and port
- E.g.: https://www.example.com (implied port is 443 for HTTPS, 80 for HTTP)
CORS means Cross-Origin Resource Sharing
Web Browser based mechanism to allow requests to other origins while visiting the main origin
Same origin: http://example.com/app1 & http://example.com/app2
Different origins: http://www.example.com & http://other.example.com
The requests won’t be fulfilled unless the other origin allows for the requests, using CORS Headers (ex: Access-Control-Allow-Origin)

Access-Control-Allow-Origin: https://www.example.com Access-Control-Allow-Methods: GET, PUT, DELETE CORS – Diagram

S3 CORS

If a client does a cross-origin request on our S3 bucket, we need to enable the correct CORS headers
It’s a popular exam question [*]
You can allow for a specific origin or for * (all origins)

Amazon S3 - Consistency Model

Read after write consistency for PUTS of new objects
- As soon as a new object is written, we can retrieve it ex: (PUT 200 => GET 200)
- This is true, except if we did a GET before to see if the object existed ex: (GET 404 => PUT 200 => GET 404) – eventually consistent
Eventual Consistency for DELETES and PUTS of existing objects
- If we read an object after updating, we might get the older version ex: (PUT 200 => PUT 200 => GET 200 (might be older version))
- If we delete an object, we might still be able to retrieve it for a short time ex: (DELETE 200 => GET 200)
Note: there’s no way to request “strong consistency”

Question 1: You’re trying to upload a 25 GB file on S3 and it’s not working A: X

Question 2: I tried creating an S3 bucket named “dev” but it didn’t work. This is a new AWS Account and I have no buckets at all. What is the cause? A: Bucket names must be globally unique and “dev” is already taken

Question 3: You’ve added files in your bucket and then enabled versioning. The files you’ve already added will have which version? A: null

Question 4: Your client wants to make sure the encryption is happening in S3, but wants to fully manage the encryption keys and never store them in AWS. You recommend A: X SSE-C

Question 5: Your company wants data to be encrypted in S3, and maintain control of the rotation policy for the encryption keys, but not know the encryption keys values. You recommend A: SSE-KMS

With SSE-KMS you let AWS manage the encryption keys but you have full control of the key rotation policy

Question 6: Your company does not trust S3 for encryption and wants it to happen on the application. You recommend A: With Client Side Encryption you perform the encryption yourself and send the encrypted data to AWS directly. AWS does not know your encryption keys and cannot decrypt your data.

Question 7: The bucket policy allows our users to read/write files in the bucket, yet we were not able to perform a PutObject API call. A: The IAM user must have an explicit DENY in the attached IAM policy

Explicit DENY in an IAM policy will take precedence over a bucket policy permission

Question 8: You have a website that loads files from another S3 bucket. When you try the URL of the files directly in your Chrome browser it works, but when the website you’re visiting tries to load these files it doesn’t. What’s the problem? A: CORS is wrong

Cross-origin resource sharing (CORS) defines a way for client web applications that are loaded in one domain to interact with resources in a different domain. To learn more about CORS, go here: https://docs.aws.amazon.com/AmazonS3/latest/dev/cors.html

Developing on AWS

Section Introduction

So far, we’ve interacts with services manually and they exposed standard information for clients:
EC2 exposes a standard Linux machine we can use any way we want
RDS exposes a standard database we can connect to using a URL
ElastiCache exposes a cache URL we can connect to using a URL
ASG / ELB are automated and we don’t have to program against them
Route53 was setup manual
Developing against AWS has two components:
How to perform interactions with AWS without using the Online Console?
How to interact with AWS Proprietary services? (S3, DynamoDB, etc…)

Section Introduction

Developing and performing AWS tasks against AWS can be done in several ways
Using the AWS CLI on our local computer
Using the AWS CLI on our EC2 machines
Using the AWS SDK on our local computer
Using the AWS SDK on our EC2 machines
Using the AWS Instance Metadata Service for EC2
In this section, we’ll learn:
How to do all of those
In the right & most secure way, adhering to best practices

AWS NETWORK

AWS CLI Configuration

Let’s learn how to properly configure the CLI
We’ll learn how to get our access credentials and protect them
Do not share your AWS Access Key and Secret key with anyone!

AWS CLI ON EC2… THE BAD WAY

We could run aws configure on EC2 just like we did (and it’ll work)
But… it’s SUPER INSECURE
NEVER EVER EVER PUT YOUR PERSONAL CREDENTIALS ON AN EC2
Your PERSONAL credentials are PERSONAL and only belong on your PERSONAL computer
If the EC2 is compromised, so is your personal account
If the EC2 is shared, other people may perform AWS actions while impersonating you
For EC2, there’s a better way… it’s called AWS IAM Roles

AWS CLI ON EC2… THE RIGHT WAY

IAM Roles can be attached to EC2 instances
IAM Roles can come with a policy authorizing exactly what the EC2 instance should be able to do
EC2 Instances can then use these profiles automatically without any additional configurations
This is the best practice on AWS and you should 100% do this.

AWS EC2 Instance Metadata

AWS EC2 Instance Metadata is powerful but one of the least known features to developers
It allows AWS EC2 instances to ”learn about themselves” without using an IAM Role for that purpose.
The URL is http://169.254.169.254/latest/meta-data
You can retrieve the IAM Role name from the metadata, but you CANNOT retrieve the IAM Policy.
Metadata = Info about the EC2 instance
Userdata = launch script of the EC2 instance

AWS SDK Overview

What if you want to perform actions on AWS directly from your applications code ? (without using the CLI).
You can use an SDK (software development kit) !
Official SDKs are…
- Java
- .NET
- Node.js
- PHP
- Python (named boto3 / botocore)
- Go
- Ruby
- C++

AWS SDK Overview

We have to use the AWS SDK when coding against AWS Services such as DynamoDB
Fun fact… the AWS CLI uses the Python SDK (boto3)
The exam expects you to know when you should use an SDK
We’ll practice the AWS SDK when we get to the Lambda functions
Good to know: if you don’t specify or configure a default region, then us-east-1 will be chosen by default

AWS SDK Credentials Security

It’s recommend to use the default credential provider chain
The default credential provider chain works seamlessly with:
- AWS credentials at ~/.aws/credentials (only on our computers or on premise)
- Instance Profile Credentials using IAM Roles (for EC2 machines, etc…)
- Environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
Overall, NEVER EVER STORE AWS CREDENTIALS IN YOUR CODE.
Best practice is for credentials to be inherited from mechanisms above, and 100% IAM Roles if working from within AWS Services

Exponential Backoff

Any API that fails because of too many calls needs to be retried with Exponential Backoff
These apply to rate limited API
Retry mechanism included in SDK API calls

Question 1: My EC2 Instance does not have the permissions to perform an API call PutObject on S3. What should I do? A:

IAM roles are the right way to provide credentials and permissions to an EC2 instance

Question 2: I have an on-premise personal server that I’d like to use to perform AWS API calls A: O I should run aws configure and put my credentials there. Invalidate them when I’m done

Even better would be to create a user specifically for that one on-premise server

A: X I should attach an EC2 IAM Role to my personal server

you can't attach EC2 IAM roles to on premise servers

Question 3: I need my colleagues help to debug my code. When he runs the application on his machine, it’s working fine, whereas I get API authorisation exceptions. What should I do? A: Compare his IAM policy and my IAM policy in the policy simulator to understand the differences

Question 4: To get the instance id of my EC2 machine from the EC2 machine, the best thing is to… A: Query the meta data at http://169.254.169.254/latest/meta-data

Advanced S3 www.datacumulus.com S3, Glacier, Athena

S3 MFA-Delete

MFA (multi factor authentication) forces user to generate a code on a device (usually a mobile phone or hardware) before doing important operations on S3
To use MFA-Delete, enable Versioning on the S3 bucket
You will need MFA to
permanently delete an object version
suspend versioning on the bucket
You won’t need MFA for
enabling versioning
listing deleted versions
Only the bucket owner (root account) can enable/disable MFA-Delete
MFA-Delete currently can only be enabled using the CLI

S3 Default Encryption vs Bucket Policies

The old way to enable default encryption was to use a bucket policy and refuse any HTTP command without the proper headers:
The new way is to use the “default encryption” option in S3
Note: Bucket Policies are evaluated before “default encryption”

S3 Access Logs

For audit purpose, you may want to log all access to S3 buckets
Any request made to S3, from any account, authorized or denied, will be logged into another S3 bucket
That data can be analyzed using data analysis tools…
Or Amazon Athena as we’ll see later in this section!
The log format is at: https://docs.aws.amazon.com/AmazonS3/latest/dev/LogFo rmat.html My-bucket Logging Bucket requests Log all requests

S3 Access Logs: Warning

Do not set your logging bucket to be the monitored bucket
It will create a logging loop, and your bucket will grow in size exponentially App Bucket & Logging Bucket Logging loop PutObject Do not try this at home J

S3 Replication (CRR & SRR)

Must enable versioning in source and destination
Cross Region Replication (CRR)
Same Region Replication (SRR)
Buckets can be in different accounts
Copying is asynchronous
Must give proper IAM permissions to S3
CRR - Use cases: compliance, lower latency access, replication across accounts
SRR - Use cases: log aggregation, live replication between production and test accounts

S3 Replication – Notes

After activating, only new objects are replicated (not retroactive)
For DELETE operations:
If you delete without a version ID, it adds a delete marker, not replicated
If you delete with a version ID, it deletes in the source, not replicated
There is no “chaining” of replication
If bucket 1 has replication into bucket 2, which has replication into bucket 3
Then objects created in bucket 1 are not replicated to bucket 3

S3 Pre-Signed URLs

Can generate pre-signed URLs using SDK or CLI
For downloads (easy, can use the CLI)
For uploads (harder, must use the SDK)
Valid for a default of 3600 seconds, can change timeout with –expires-in [TIME_BY_SECONDS] argument
Users given a pre-signed URL inherit the permissions of the person who generated the URL for GET / PUT
Examples :
Allow only logged-in users to download a premium video on your S3 bucket
Allow an ever changing list of users to download files by generating URLs dynamically
Allow temporarily a user to upload a file to a precise location in our bucket

aws –profile pablo-developer s3 presign s3://my-sample-bucket-monitored-pablo/beach.jpg –expires-in 300 –region ap-southeast-2 https://my-sample-bucket-monitored-pablo.s3.ap-southeast-2.amazonaws.com/beach.jpg?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAYZ33DWH4ML6QNG3%2F20200527%2Fap-southeast-2%2Fs3%2Faws4_request&X-Amz-Date=20200527T035233Z&X-Amz-Expires=300&X-Amz-SignedHeaders=host&X-Amz-Signature=d3574302ba10d7197de6b58cf605a1b73c49e3ccf064410acb1c251c42fbe38e

S3 Storage Classes [*]

Amazon S3 Standard - General Purpose
Amazon S3 Standard-Infrequent Access (IA)
Amazon S3 One Zone-Infrequent Access
Amazon S3 Intelligent Tiering
Amazon Glacier
Amazon Glacier Deep Archive
Amazon S3 Reduced Redundancy Storage (deprecated - omitted)

S3 Standard – General Purpose

High durability (99.999999999%) of objects across multiple AZ
If you store 10,000,000 objects with Amazon S3, you can on average expect to incur a loss of a single object once every 10,000 years
99.99% Availability over a given year
Sustain 2 concurrent facility failures
Use Cases: Big Data analytics, mobile & gaming applications, content distribution…

S3 Standard – Infrequent Access (IA)

Suitable for data that is less frequently accessed, but requires rapid access when needed
High durability (99.999999999%) of objects across multiple AZs
99.9% Availability
Low cost compared to Amazon S3 Standard
Sustain 2 concurrent facility failures
Use Cases: As a data store for disaster recovery, backups…

S3 One Zone - Infrequent Access (IA)

Same as IA but data is stored in a single AZ
High durability (99.999999999%) of objects in a single AZ; data lost when AZ is destroyed
99.5% Availability
Low latency and high throughput performance
Supports SSL for data at transit and encryption at rest
Low cost compared to IA (by 20%)
Use Cases: Storing secondary backup copies of on-premise data, or storing data you can recreate

S3 Intelligent Tiering

Same low latency and high throughput performance of S3 Standard
Small monthly monitoring and auto-tiering fee
Automatically moves objects between two access tiers based on changing access patterns
Designed for durability of 99.999999999% of objects across multiple Availability Zones
Resilient against events that impact an entire Availability Zone
Designed for 99.9% availability over a given year

Amazon Glacier

Low cost object storage meant for archiving / backup
Data is retained for the longer term (10s of years)
Alternative to on-premise magnetic tape storage
Average annual durability is 99.999999999%
Cost per storage per month ($0.004 / GB) + retrieval cost
Each item in Glacier is called “Archive” (up to 40TB)
Archives are stored in ”Vaults”

Amazon Glacier & Glacier Deep Archive [*]

Amazon Glacier – 3 retrieval options:
- Expedited (1 to 5 minutes)
- Standard (3 to 5 hours)
- Bulk (5 to 12 hours)
- Minimum storage duration of 90 days
Amazon Glacier Deep Archive – for long term storage – cheaper:
- Standard (12 hours)
- Bulk (48 hours)
- Minimum storage duration of 180 days

S3 Storage Classes Comparison

https://aws.amazon.com/s3/storage-classes/

[SAAC-02]

S3 – Moving between storage classes

You can transition objects between storage classes
For infrequently accessed object, move them to STANDARD_IA
For archive objects you don’t need in real -time, GLACIER or DEEP_ARCHIVE
Moving objects can be automated using a lifecycle configuration

S3 Lifecycle Rules

Transition actions: It defines when objects are transitioned to another storage class.
- Move objects to Standard IA class 60 days after creation
- Move to Glacier for archiving after 6 months
Expiration actions: configure objects to expire (delete) after some time
- Access log files can be set to delete after a 365 days
- Can be used to delete old versions of files (if versioning is enabled)
- Can be used to delete incomplete multi-part uploads
Rules can be created for a certain prefix (ex - s3://mybucket/mp3/*)
Rules can be created for certain objects tags (ex - Department: Finance)

S3 Lifecycle Rules – Scenario 1

Your application on EC2 creates images thumbnails after profile photos are uploaded to Amazon S3. These thumbnails can be easily recreated, and only need to be kept for 45 days. The source images should be able to be immediately retrieved for these 45 days, and afterwards, the user can wait up to 6 hours. How would you design this?
S3 source images can be on STANDARD, with a lifecycle configuration to transition them to GLACIER after 45 days.
S3 thumbnails can be on ONEZONE_IA, with a lifecycle configuration to expire them (delete them) after 45 days.

S3 Lifecycle Rules – Scenario 2

A rule in your company states that you should be able to recover your deleted S3 objects immediately for 15 days, although this may happen rarely. After this time, and for up to 365 days, deleted objects should be recoverable within 48 hours.
You need to enable S3 versioning in order to have object versions, so that “deleted objects” are in fact hidden by a “delete marker” and can be recovered
You can transition these “noncurrent versions” of the object to S3_IA
You can transition afterwards these “noncurrent versions” to DEEP_ARCHIVE

[SAAC-02]

S3 – Baseline Performance

Amazon S3 automatically scales to high request rates, latency 100-200 ms
Your application can achieve at least 3,500 PUT/COPY/POST/DELETE and 5,500 GET/HEAD requests per second per prefix in a bucket.
There are no limits to the number of prefixes in a bucket.
Example (object path => prefix):
- bucket/folder1/sub1/file => /folder1/sub1/
- bucket/folder1/sub2/file => /folder1/sub2/
- bucket/1/file => /1/
- bucket/2/file => /2/
If you spread reads across all four prefixes evenly, you can achieve 22,000 requests per second for GET and HEAD

S3 – KMS Limitation

If you use SSE-KMS, you may be impacted by the KMS limits
When you upload, it calls the GenerateDataKey KMS API
When you download, it calls the Decrypt KMS API
Count towards the KMS quota per second (5500, 10000, 30000 req/s based on region)
As of today, you cannot request a quota increase for KMS

S3 Performance

Multi-Part upload:
- recommended for files > 100MB, must use for files > 5GB
- Can help parallelize uploads (speed up transfers)
S3 Transfer Acceleration (upload only)
- Increase transfer speed by transferring file to an AWS edge location which will forward the data to the S3 bucket in the target region
- Compatible with multi-part upload

S3 Performance – S3 Byte-Range Fetches

Parallelize GETs by requesting specific byte ranges
Better resilience in case of failures
Can be used to speed up downloads
Can be used to retrieve only partial data (for example the head of a file)

S3 Select & Glacier Select

Retrieve less data using SQL by performing server side filtering
Can filter by rows & columns (simple SQL statements)
Less network transfer, less CPU cost client-side

https://aws.amazon.com/blogs/aws/s3-glacier-select/

S3 Event Notifications

Amazon S3

S3:ObjectCreated, S3:ObjectRemoved, S3:ObjectRestore, S3:Replication…
Object name filtering possible (*.jpg)
Use case: generate thumbnails of images uploaded to S3
Can create as many “S3 events” as desired
S3 event notifications typically deliver events in seconds but can sometimes take a minute or longer
If two writes are made to a single non-versioned object at the same time, it is possible that only a single event notification will be sent
If you want to ensure that an event notification is sent for every successful write, you can enable versioning on your bucket.

AWS Athena

Serverless service to perform analytics directly against S3 files
Uses SQL language to query the files
Has a JDBC / ODBC driver
Charged per query and amount of data scanned
Supports CSV, JSON, ORC, Avro, and Parquet (built on Presto)
Use cases: Business intelligence / analytics / reporting, analyze & query VPC Flow Logs, ELB Logs, CloudTrail trails, etc…
Exam Tip: Analyze data directly on S3 => use Athena

S3 Object Lock & Glacier Vault Lock

S3 Object Lock
- Adopt a WORM (Write Once Read Many) model
- Block an object version deletion for a specified amount of time
Glacier Vault Lock
- Adopt a WORM (Write Once Read Many) model
- Lock the policy for future edits (can no longer be changed)
- Helpful for compliance and data retention

S3 Advanced & Athena - Quiz

Question 1: You have enabled versioning and want to be extra careful when it comes to deleting files on S3. What should you enable to prevent accidental permanent deletions? A: Enable MFA Delete

MFA Delete forces users to use MFA tokens before deleting objects. It’s an extra level of security to prevent accidental deletes

Question 2: You would like all your files in S3 to be encrypted by default. What is the optimal way of achieving this? A: Enable “Default Encryption” on S3

Question 3: You suspect some of your employees to try to access files in S3 that they don’t have access to. How can you verify this is indeed the case without them noticing? A: Enable S3 Access logs analyze the using Athena

S3 Access Logs log all the requests made to buckets, and Athena can then be used to run serverless analytics on top of the logs files

Question 4: You are looking for your entire S3 bucket to be available fully in a different region so you can perform data analysis optimally at the lowest possible cost. Which feature should you use? A: S3 Cross Region Replication CRR

S3 CRR is used to replicate data from an S3 bucket to another one in a different region

Question 5: You are looking to provide temporary URLs to a growing list of federated users in order to allow them to perform a file upload on S3 to a specific location. What should you use? A: S3 Pre-Signer URK

Pre-Signed URL are temporary and grant time-limited access to some actions in your S3 bucket.

Question 6: How can you automate the transition of S3 objects between their different tiers? A: Use S3 Lifecycle Rules

Question 7: Which of the following is NOT a Glacier retrieval mode? A: Instan (10 seconds)

Question 8: Which of the following is a Serverless data analysis service allowing you to query data in S3? A: Athena

Question 9: [SAA-C02] You are looking to build an index of your files in S3, using Amazon RDS PostgreSQL. To build this index, it is necessary to read the first 250 bytes of each object in S3, which contains some metadata about the content of the file itself. There is over 100,000 files in your S3 bucket, amounting to 50TB of data. how can you build this index efficiently? A: Create an application that will traverse the S3 bucket, issue a Byte Range Fetch for the first 250 bytes, and store that information in RDS.

AWS CloudFront

Content Delivery Network (CDN)
Improves read performance, content is cached at the edge
216 Point of Presence globally (edge locations)
DDoS protection, integration with Shield, AWS Web Application Firewall
Can expose external HTTPS and can talk to internal HTTPS backends

https://aws.amazon.com/cloudfront/features/?nc=sn&loc=2

CloudFront – Origins

S3 bucket
- For distributing files and caching them at the edge
- Enhanced security with CloudFront Origin Access Identity (OAI)
- CloudFront can be used as an ingress (to upload files to S3)
Custom Origin (HTTP)
- Application Load Balancer
- EC2 instance
- S3 website (must first enable the bucket as a static S3 website)
- Any HTTP backend you want

CloudFront Geo Restriction

You can restrict who can access your distribution
- Whitelist: Allow your users to access your content only if they’re in one of the countries on a list of approved countries.
- Blacklist: Prevent your users from accessing your content if they’re in one of the countries on a blacklist of banned countries.
The “country” is determined using a 3rd party Geo-IP database
Use case: Copyright Laws to control access to content

CloudFront vs S3 Cross Region Replication

CloudFront:
- Global Edge network
- Files are cached for a TTL (maybe a day)
- Great for static content that must be available everywhere
S3 Cross Region Replication:
- Must be setup for each region you want replication to happen
- Files are updated in near real-time
- Read only
- Great for dynamic content that needs to be available at low-latency in few regions

AWS CloudFront Hands On

We’ll create an S3 bucket
We’ll create a CloudFront distribution
We’ll create an Origin Access Identity
We’ll limit the S3 bucket to be accessed only using this identity

[SAA-C02]

CloudFront Signed URL / Signed Cookies

You want to distribute paid shared content to premium users over the world
We can use CloudFront Signed URL / Cookie. We attach a policy with:
- Includes URL expiration
- Includes IP ranges to access the data from
- Trusted signers (which AWS accounts can create signed URLs)
How long should the URL be valid for?
- Shared content (movie, music): make it short (a few minutes)
- Private content (private to the user): you can make it last for years
Signed URL = access to individual files (one signed URL per file)
Signed Cookies = access to multiple files (one signed cookie for many files)

[SAA-C02]

CloudFront Signed URL vs S3 Pre-Signed URL

CloudFront Signed URL:
- Allow access to a path, no matter the origin
- Account wide key-pair, only the root can manage it
- Can filter by IP, path, date, expiration
- Can leverage caching features
S3 Pre-Signed URL:
- Issue a request as the person who pre-signed the URL
- Uses the IAM key of the signing IAM principal
- Limited lifetime

https://aws.amazon.com/global-accelerator/pricing/

CloudFront & AWS Global Accelerator Quiz

Question 1: Which features allows us to distribute paid content from S3 securely, globally, if the S3 bucket is secured to only exchange data with CloudFront? A: CloudFront Signed URL

CloudFront Signed URL are commonly used to distribute paid content through dynamic CloudFront Signed URL generation.

Question 2: You are hosting highly dynamic content in Amazon S3 in us-east-1. Recently, there has been a need to make that data available with low latency in Singapore. What do you recommend using? A: S3 Cross Region Replication

S3 CRR allows you to replicate the data from one bucket in a region to another bucket in another region

Question 3: How can you ensure that only users who access our website through Canada are authorized in CloudFront? A: Use CloudFront Geo Restriction

Question 4: You would like to provide your users access to hundreds of private files in your CloudFront distribution, which is fronting an HTTP web server behind an application load balancer. What should you use? A: CloudFront Signed Cookies

Allows you to access many files

Question 5: [SAA-C02] You are creating an application that is going to expose an HTTP REST API. There is a need to provide request routing rules at the HTTP level. Due to security requirements, your application can only be exposed through the use of two static IPs. How can you create a solution that validates these requirements? A: Use Global Accelerator and an Application Load Balancer.

Global Accelerator will provide us with the two static IP, and the ALB will provide use with the HTTP routing rules

What does this S3 bucket policy do?

{ “Version”:”2012-10-17”, “Id”:”Mystery policy”, “Statement”:[ { “Sid”:”What could it be?”, “Effect”:”Allow”, “Principal”:{“CanonicalUser”:”CloudFront Origin Identity Canonical User ID”}, “Action”:”s3:GetObject”, “Resource”:”arn:aws:s3:::examplebucket/*” } ] }

A: Only allows the S3 bucket content to be accessed from your CloudFront distribution origin identity

Section 13: AWS Storage Extras

AWS Storage Gateway Summary

Exam tip: Read the question well, it will hint at which gateway to use
On premise data to the cloud => Storage Gateway
File access / NFS => File Gateway (backed by S3)
Volumes / Block Storage / iSCSI => Volume gateway (backed by S3 with EBS snapshots)
VTL Tape solution / Backup with iSCSI = > Tape Gateway (backed by S3 and Glacier)

Amazon FSx for Windows (File Server)

EFS is a shared POSIX system for Linux systems.
FSx for Windows is a fully managed Windows file system share drive
Supports SMB protocol & Windows NTFS
Microsoft Active Directory integration, ACLs, user quotas
Built on SSD, scale up to 10s of GB/s, millions of IOPS, 100s PB of data
Can be accessed from your on-premise infrastructure
Can be configured to be Multi-AZ (high availability)
Data is backed-up daily to S3

Amazon FSx for Lustre

Lustre is a type of parallel distributed file system, for large-scale computing
The name Lustre is derived from “Linux” and “cluster”
Machine Learning, High Performance Computing (HPC)
Video Processing, Financial Modeling, Electronic Design Automation
Scales up to 100s GB/s, millions of IOPS, sub-ms latencies
Seamless integration with S3
- Can “read S3” as a file system (through FSx)
- Can write the output of the computations back to S3 (through FSx)
Can be used from on-premise servers

Storage Comparison

S3: Object Storage
Glacier: Object Archival
EFS: Network File System for Linux instances, POSIX filesystem
FSx for Windows: Network File System for Windows servers
FSx for Lustre: High Performance Computing Linux file system
EBS volumes: Network storage for one EC2 instance at a time
Instance Storage: Physical storage for your EC2 instance (high IOPS)
Storage Gateway: File Gateway, Volume Gateway (cache & stored), Tape Gateway
Snowball / Snowmobile: to move large amount of data to the cloud, physically
Database: for specific workloads, usually with indexing and querying

AWS Storage Extras - Quiz

Question 1: You need to move hundreds of Terabytes into the cloud in S3, and after that pre-process it using many EC2 instances in order to clean the data. You have a 1 Gbit/s broadband and would like to optimize the process of moving the data and pre-processing it, in order to save time. What do you recommend? A: Use Snowball Edge

Snowball Edge is the right answer as it comes with computing capabilities and allows use to pre-process the data while it’s being moved in Snowball, so we save time on the pre-processing side as well.

Question 2: You want to expose a virtually infinite storage for your tape backups. You want to keep the same software as today and want a iSCSI compatible interface. What do you use? A: Tape Gateway

Question 3: Your EC2 Windows Servers need to share some data by having a Network File System mounted, that respect the Windows security mechanisms and has integration with Active Directory. What do you recommend putting in place as an NFS? A: FSx for Windows

Question 4: You would like to have a distributed POSIX compliant file system that will allow you to maximize the IOPS in order to perform some HPC and genomics computational research. That file system will have to scale easily to millions of IOPS. What do you recommend? A: FSx for Lustre

Section Introduction

Synchronous between applications can be problematic if there are sudden spikes of traffic
What if you need to suddenly encode 1000 videos but usually it’s 10?
In that case, it’s better to decouple your applications,
- using SQS: queue model
- using SNS: pub/sub model
- using Kinesis: real-time streaming model
These services can scale independently from our application!

AWS SQS – Standard Queue

Oldest offering (over 10 years old)
Fully managed
Scales from 1 message per second to 10,000s per second
Default retention of messages: 4 days, maximum of 14 days
No limit to how many messages can be in the queue
Low latency (<10 ms on publish and receive)
Horizontal scaling in terms of number of consumers
Can have duplicate messages (at least once delivery, occasionally)
Can have out of order messages (best effort ordering)
Limitation of 256KB per message sent

AWS SQS – Delay Queue

Delay a message (consumers don’t see it immediately) up to 15 minutes
Default is 0 seconds (message is available right away)
Can set a default at queue level
Can override the default using the DelaySeconds parameter

SQS – Consuming Messages

Consumers…
Poll SQS for messages (receive up to 10 messages at a time)
Process the message within the visibility timeout
Delete the message using the message ID & receipt handle

SQS –Visibility timeout

When a consumer polls a message from a queue, the message is “invisible” to other consumers for a defined period… the Visibility Timeout:
- Set between 0 seconds and 12 hours (default 30 seconds)
- If too high (15 minutes) and consumer fails to process the message, you must wait a long time before processing the message again
- If too low (30 seconds) and consumer needs time to process the message (2 minutes), another consumer will receive the message and the message will be processed more than once
ChangeMessageVisibility API to change the visibility while processing a message
DeleteMessage API to tell SQS the message was successfully processed

AWS SQS – Dead Letter Queue

If a consumer fails to process a message within the Visibility Timeout… the message goes back to the queue!
We can set a threshold of how many times a message can go back to the queue – it’s called a “redrive policy”
After the threshold is exceeded, the message goes into a dead letter queue (DLQ)
We have to create a DLQ first and then designate it dead letter queue
Make sure to process the messages in the DLQ before they expire!

AWS SQS - Long Polling

When a consumer requests message from the queue, it can optionally “wait” for messages to arrive if there are none in the queue
This is called Long Polling
LongPolling decreases the number of API calls made to SQS while increasing the efficiency and latency of your application.
The wait time can be between 1 sec to 20 sec (20 sec preferable)
Long Polling is preferable to Short Polling
Long polling can be enabled at the queue level or at the API level using WaitTimeSeconds

AWS SQS – FIFO Queue

Newer offering (First In - First out) – not available in all regions!
Name of the queue must end in .fifo - Lower throughput (up to 3,000 per second with batching, 300/s without)
Messages are processed in order by the consumer
Messages are sent exactly once
No per message delay (only per queue delay)
Ability to do content based de-duplication
5-minute interval de-duplication using “Duplication ID”
Message Groups:
- Possibility to group messages for FIFO ordering using “Message GroupID”
- Only one worker can be assigned per message group so that messages are processed in order
- Message group is just an extra tag on the message!

AWS SNS

The “event producer” only sends message to one SNS topic
As many “event receivers” (subscriptions) as we want to listen to the SNS topic notifications
Each subscriber to the topic will get all the messages (note: new feature to filter messages)
Up to 10,000,000 subscriptions per topic
100,000 topics limit
Subscribers can be:
- SQS
- HTTP / HTTPS (with delivery retries – how many times)
- Lambda
- Emails
- SMS messages
- Mobile Notifications

Some services can send data directly to SNS for notifications
CloudWatch (for alarms)
Auto Scaling Groups notifications
Amazon S3 (on bucket events)
CloudFormation (upon state changes => failed to build, etc)
Etc…

AWS SNS – How to publish

Topic Publish (within your AWS Server – using the SDK)
- Create a topic
- Create a subscription (or many)
- Publish to the topic
Direct Publish (for mobile apps SDK)
- Create a platform application
- Create a platform endpoint
- Publish to the platform endpoint
- Works with Google GCM, Apple APNS, Amazon ADM…

Push once in SNS, receive in many SQS
Fully decoupled
No data loss
Ability to add receivers of data later
SQS allows for delayed processing
SQS allows for retries of work
May have many workers on one queue and one worker on the other queue

AWS Kinesis Overview

Kinesis is a managed alternative to Apache Kafka.
Great for application logs, metrics, IoT, clickstreams
Great for “real-time” big data
Great for streaming processing frameworks (Spark, NiFi, etc…)
Data is automatically replicated to 3 AZ
Kinesis Streams: low latency streaming ingest at scale
Kinesis Analytics: perform real-time analytics on streams using SQL
Kinesis Firehose: load streams into S3, Redshift, ElasticSearch…

Kinesis Streams Shards

One stream is made of many different shards
1MB/s or 1000 messages/s at write PER SHARD
2MB/s at read PER SHARD
Billing is per shard provisioned, can have as many shards as you want
Batching available or per message calls.
The number of shards can evolve over time (reshard / merge)
Records are ordered per shard

SQS:

Consumer “pull data”
Data is deleted after being consumed
Can have as many workers (consumers) as we want
No need to provision throughput
No ordering guarantee (except FIFO queues)
Individual message delay capability SNS:
Push data to many subscribers
Up to 10,000,000 subscribers
Data is not persisted (lost if not delivered)
Pub/Sub
Up to 100,000 topics
No need to provision throughput
Integrates with SQS for fan-out architecture pattern Kinesis:
Consumers “pull data”
As many consumers as we want
Possibility to replay data
Meant for real-time big data, analytics and ETL
Ordering at the shard level
Data expires after X days
Must provision throughput

Amazon MQ

SQS, SNS are “cloud-native” services, and they’re using proprietary protocols from AWS.
Traditional applications running from on-premise may use open protocols such as: MQTT, AMQP, STOMP, Openwire, WSS
When migrating to the cloud, instead of re-engineering the application to use SQS and SNS, we can use Amazon MQ
Amazon MQ = managed Apache ActiveMQ
Amazon MQ doesn’t “scale” as much as SQS / SNS
Amazon MQ runs on a dedicated machine, can run in HA with failover
Amazon MQ has both queue feature (~~SQS) and topic features (~~SNS)

Messaging and Integration Quiz Quiz 13|12 questions

Question 1: You are preparing for the biggest day of sale of the year, where your traffic will increase by 100x. You have already setup SQS standard queue. What should you do? A: Do nothing, SQS scales automatically

Question 2: You would like messages to be processed by SQS consumers only after 5 minutes of being published to SQS. What should you do?

Question 4: Your SQS costs are extremely high. Upon closer look, you notice that your consumers are polling SQS too often and getting empty data as a result. What should you do? A: Enable Long Polling

Long polling helps reduce the cost of using Amazon SQS by eliminating the number of empty responses (when there are no messages available for a ReceiveMessage request) and false empty responses (when messages are available but aren’t included in a response)

Question 5: You’d like your messages to be processed exactly once and in order. Which do you need? A: SQS FIFO Queue

FIFO (First-In-First-Out) queues are designed to enhance messaging between applications when the order of operations and events is critical, or where duplicates can’t be tolerated. FIFO queues also provide exactly-once processing but have a limited number of transactions per second (TPS).

Question 6: You’d like to send a message to 3 different applications all using SQS. You should A: Use SNS + SQS Fan Out pattern

This is a common pattern as only one message is sent to SNS and then “fan out” to multiple SQS queuesee

Question 7: You have a Kinesis stream usually receiving 5MB/s of data and sending out 8 MB/s of data. You have provisioned 6 shards. Some days, your traffic spikes up to 2 times and you get a throughput exception. You should A: Add more shards

Each shard allows for 1MB/s incoming and 2MB/s outgoing of data

Question 8: You are sending a clickstream for your users navigating your website, all the way to Kinesis. It seems that the users data is not ordered in Kinesis, and the data for one individual user is spread across many shards. How to fix that problem? A: You should use a partition key that represents the identity of the user

By providing a partition key we ensure the data is ordered for our users

Question 9: We’d like to perform real time analytics on streams of data. The most appropriate product will be A: Kinesis

Kinesis Analytics is the product to use, with Kinesis Streams as the underlying source of data

Question 10: We’d like for our big data to be loaded near real time to S3 or Redshift. We’d like to convert the data along the way. What should we use? A: Kinesis Streams + Kinesis Firehose

This is a perfect combo of technology for loading data near real-time in S3 and Redshift

Question 11: You want to send email notifications to your users. You should use A: SNS

Has that feature by default

Question 12: You have many microservices running on-premise and they currently communicate using a message broker that supports the MQTT protocol. You would like to migrate these applications and the message broker to the cloud without changing the application logic. Which technology allows you to get a managed message broker that supports the MQTT protocol? A: Amazon MQ

Supports JMS, NMS, AMQP, STOMP, MQTT, and WebSocket

Section 15: Serverless Overviews from a Solution Architect Perspective

What’s serverless?

Serverless is a new paradigm in which the developers don’t have to manage servers anymore…
They just deploy code
They just deploy… functions !
Initially… Serverless == FaaS (Function as a Service)
Serverless was pioneered by AWS Lambda but now also includes anything that’s managed: “databases, messaging, storage, etc.”
Serverless does not mean there are no servers… it means you just don’t manage / provision / see them

Serverless in AWS

AWS Lambda
DynamoDB
AWS Cognito
AWS API Gateway
Amazon S3
AWS SNS & SQS
AWS Kinesis Data Firehose
Aurora Serverless
Step Functions
Fargate

Why AWS Lambda

Amazon EC2

Virtual Servers in the Cloud
Limited by RAM and CPU
Continuously running
Scaling means intervention to add / remove servers Amazon Lambda
Virtual functions – no servers to manage!
Limited by time - short executions
Run on-demand
Scaling is automated!

Benefits of AWS Lambda

Easy Pricing:
- Pay per request and compute time
- Free tier of 1,000,000 AWS Lambda requests and 400,000 GBs of compute time
Integrated with the whole AWS suite of services
Integrated with many programming languages
Easy monitoring through AWS CloudWatch
Easy to get more resources per functions (up to 3GB of RAM!)
Increasing RAM will also improve CPU and network!

AWS Lambda language support

Node.js (JavaScript)
Python
Java (Java 8 compatible)
C# (.NET Core)
Golang
C# / Powershell
Ruby
Custom Runtime API (community supported, example Rust)
Important: Docker is not for AWS Lambda, it’s for ECS / Fargate

AWS Lambda Pricing: example

You can find overall pricing information here: https://aws.amazon.com/lambda/pricing/
Pay per calls:
- First 1,000,000 requests are free
- $0.20 per 1 million requests thereafter ($0.0000002 per request)
Pay per duration: (in increment of 100ms)
- 400,000 GB-seconds of compute time per month if FREE
- == 400,000 seconds if function is 1GB RAM
- == 3,200,000 seconds if function is 128 MB RAM
- After that $1.00 for 600,000 GB-seconds
It is usually very cheap to run AWS Lambda so it’s very popular.

AWS Lambda Limits to Know - per region

Execution:
- Memory allocation: 128 MB – 3008 MB (64 MB increments)
- Maximum execution time: 900 seconds (15 minutes)
- Environment variables (4 KB)
- Disk capacity in the “function container” (in /tmp): 512 MB
- Concurrency executions: 1000 (can be increased)
Deployment:
- Lambda function deployment size (compressed .zip): 50 MB
- Size of uncompressed deployment (code + dependencies): 250 MB
- Can use the /tmp directory to load other files at startup
- Size of environment variables: 4 KB

Lambda@Edge

You have deployed a CDN using CloudFront
What if you wanted to run a global AWS Lambda alongside?
Or how to implement request filtering before reaching your application?
For this, you can use Lambda@Edge: deploy Lambda functions alongside your CloudFront CDN
- Build more responsive applications
- You don’t manage servers, Lambda is deployed globally
- Customize the CDN content
- Pay only for what you use

Lambda@Edge

You can use Lambda to change CloudFront requests and responses:
- After CloudFront receives a request from a viewer (viewer request)
- Before CloudFront forwards the request to the origin (origin request)
- After CloudFront receives the response from the origin (origin response)
- Before CloudFront forwards the response to the viewer (viewer response)
You can also generate responses to viewers without ever sending the request to the origin.

Lambda@Edge: Use Cases

Website Security and Privacy
Dynamic Web Application at the Edge
Search Engine Optimization (SEO)
Intelligently Route Across Origins and Data Centers
Bot Mitigation at the Edge
Real-time Image Transformation
A/B Testing
User Authentication and Authorization
User Prioritization
User Tracking and Analytics

AWS API Gateway

AWS Lambda + API Gateway: No infrastructure to manage
Support for the WebSocket Protocol
Handle API versioning (v1, v2…)
Handle different environments (dev, test, prod…)
Handle security (Authentication and Authorization)
Create API keys, handle request throttling
Swagger / Open API import to quickly define APIs
Transform and validate requests and responses
Generate SDK and API specifications
Cache API responses

API Gateway – Integrations High Level

Lambda Function
- Invoke Lambda function
- Easy way to expose REST API backed by AWS Lambda
HTTP
- Expose HTTP endpoints in the backend
- Example: internal HTTP API on premise, Application Load Balancer…
- Why? Add rate limiting, caching, user authentications, API keys, etc…
AWS Service
- Expose any AWS API through the API Gateway?
- Example: start an AWS Step Function workflow, post a message to SQS
- Why? Add authentication, deploy publicly, rate control…

API Gateway - Endpoint Types

Edge-Optimized (default): For global clients
- Requests are routed through the CloudFront Edge locations (improves latency)
- The API Gateway still lives in only one region
Regional:
- For clients within the same region
- Could manually combine with CloudFront (more control over the caching strategies and the distribution)
Private:
- Can only be accessed from your VPC using an interface VPC endpoint (ENI)
- Use a resource policy to define access

API Gateway – Security

IAM Permissions

Create an IAM policy authorization and attach to User / Role
API Gateway verifies IAM permissions passed by the calling application
Good to provide access within your own infrastructure
Leverages “Sig v4” capability where IAM credential are in headers

API Gateway – Security

Lambda Authorizer (formerly Custom Authorizers)

Uses AWS Lambda to validate the token in header being passed
Option to cache result of authentication
Helps to use OAuth / SAML / 3rd party type of authentication
Lambda must return an IAM policy for the user

API Gateway – Security

Cognito User Pools

Cognito fully manages user lifecycle
API gateway verifies identity automatically from AWS Cognito
No custom implementation required
Cognito only helps with authentication, not authorization

API Gateway – Security – Summary

IAM:
- Great for users / roles already within your AWS account
- Handle authentication + authorization
- Leverages Sig v4
Custom Authorizer:
- Great for 3rd party tokens
- Very flexible in terms of what IAM policy is returned
- Handle Authentication + Authorization
- Pay per Lambda invocation
Cognito User Pool:
- You manage your own user pool (can be backed by Facebook, Google login etc…)
- No need to write any custom code
- Must implement authorization in the backend

AWS Cognito – Federated Identity Pools

Goal:
- Provide direct access to AWS Resources from the Client Side
How:
- Log in to federated identity provider – or remain anonymous
- Get temporary AWS credentials back from the Federated Identity Pool
- These credentials come with a pre-defined IAM policy stating their permissions
Example:
- provide (temporary) access to write to S3 bucket using Facebook Login

AWS SAM - Serverless Application Model

SAM = Serverless Application Model
Framework for developing and deploying serverless applications
All the configuration is YAML code
- Lambda Functions
- DynamoDB tables
- API Gateway
- Cognito User Pools
SAM can help you to run Lambda, API Gateway, DynamoDB locally
SAM can use CodeDeploy to deploy Lambda functions

Serverless Quiz

Question 1: You have a Lambda function that will process data for 25 minutes before successfully completing. The code is working fine in your machine, but in AWS Lambda it just fails with a “timeout” issue after 3 seconds. What should you do? A: Run your code somewhere else than Lambda - the maximum timeout is 15 minutes

Question 2: You’d like to have a dynamic DB_URL variable loaded in your Lambda code A: Place it in the environment variables

Question 3: We have to provision the instance type for our DynamoDB database A: False

DynamoDB is a serverless service and as such we don’t provision an instance type for our database. We just say how much RCU and WCU we require for our table (or auto scaling)

Question 4: A DynamoDB table has been provisioned with 10 RCU and 10 WCU. You would like to increase the RCU to sustain more read traffic. What is true about RCU and WCU? A: RCU and WCU are decoupled, so WCU can stay the same

Question 5: You are about to enter the Christmas sale and you know a few items in your website are very popular and will be read often. Last year you had a ProvisionedThroughputExceededException. What should you do this year? A: Create a DAX cluter

A DynamoDB Accelerator (DAX) cluster is a cache that fronts your DynamoDB tables and caches the most frequently read values. They help offload the heavy reads on hot keys off of DynamoDB itself, hence preventing the ProvisionedThroughputExceededException

Question 6: You would like to automate sending welcome emails to the users who subscribe to the Users table in DynamoDB. How can you achieve that? A: Enable DynamoDB Streams and have the Lambda function receive the events in real-time

Question 7: To make a serverless API, I should integrate API Gateway with A: Lambda

Question 8: You would like to provide a Facebook login before your users call your API hosted by API Gateway. You need seamlessly authentication integration, you will use a: Conigto User Pools

Cognito User Pools directly integration with Facebook Logins

Question 9: [SAA-C02] Your production application is leveraging DynamoDB as its backend and is experiencing smooth sustained usage. There is a need to make the application run in development as well, where it will experience unpredictable, sometimes high, sometimes low volume of requests. You would like to make sure you optimize for cost. What do you recommend? A: OJO X

Provision WCU & RCU and enable auto-scaling for production and use on-demand capacity for development

Serverless Architectures

Mobile application: MyTodoList

We want to create a mobile application with the following requirements
Expose as REST API with HTTPS
Serverless architecture
Users should be able to directly interact with their own folder in S3
Users should authenticate through a managed serverless service
The users can write and read to-dos, but they mostly read them
The database should scale, and have some high read throughput

[*] Store files on S3 from mobile, use Amazon Cognito to generate temp credentials via AWS STS.

[*] Improve hight read throughput, static data. Use DAX caching layer / CACHING OR RESPONSES of API Gateway

In this lecture

Serverless REST API: HTTPS, API Gateway, Lambda, DynamoDB
Using Cognito to generate temporary credentials with STS to access S3 bucket with restricted policy. App users can directly access AWS resources this way. Pattern can be applied to DynamoDB, Lambda…
Caching the reads on DynamoDB using DAX
Caching the REST requests at the API Gateway level
Security for authentication and authorization with Cognito, STS

Serverless hosted website: MyBlog.com

This website should scale globally
Blogs are rarely written, but often read
Some of the website is purely static files, the rest is a dynamic REST API
Caching must be implement where possible
Any new users that subscribes should receive a welcome email
Any photo uploaded to the blog should have a thumbnail generated

[*] Provide securit with CloudFront: OAI, origin access identity, + Bucket Policy

client <—> CloudFront <———————–> S3 (Policy only authorize from OAI) Origin Access Identity

AWS Hosted Website Summary

We’ve seen static content being distributed using CloudFront with S3
The REST API was serverless, didn’t need Cognito because public
We leveraged a Global DynamoDB table to serve the data globally
(we could have used Aurora Global Tables)
We enabled DynamoDB streams to trigger a Lambda function
The lambda function had an IAM role which could use SES
SES (Simple Email Service) was used to send emails in a serverless way
S3 can trigger SQS / SNS / Lambda to notify of events

Micro Services architecture

We want to switch to a micro service architecture
Many services interact with each other directly using a REST API
Each architecture for each micro service may vary in form and shape
We want a micro-service architecture so we can have a leaner development lifecycle for each service

Discussions on Micro Services

You are free to design each micro-service the way you want
Synchronous patterns: API Gateway, Load Balancers
Asynchronous patterns: SQS, Kinesis, SNS, Lambda triggers (S3)
Challenges with micro-services:
- repeated overhead for creating each new microservice,
- issues with optimizing server density/utilization
- complexity of running multiple versions of multiple microservices simultaneously
- proliferation of client-side code requirements to integrate with many separate services.
Some of the challenges are solved by Serverless patterns:
- API Gateway, Lambda scale automatically and you pay per usage
- You can easily clone API, reproduce environments
- Generated client SDK through Swagger integration for the API Gateway

Distributing paid content

We sell videos online and users have to paid to buy videos
Each videos can be bought by many different customers
We only want to distribute videos to users who are premium users
We have a database of premium users
Links we send to premium users should be short lived
Our application is global
We want to be fully serverles

Premium User Video service

We have implemented a fully serverless solution:
- Cognito for authentication
- DynamoDB for storing users that are premium
- 2 serverless applications
  - Premium User registration
  - CloudFront Signed URL generator
- Content is stored in S3 (serverless and scalable)
- Integrated with CloudFront with OAI for security (users can’t bypass)
- CloudFront can only be used using Signed URLs to prevent unauthorized users
- What about S3 Signed URL? They’re not efficient for global access

Software updates offloading

We have an application running on EC2, that distributes software updates once in a while
When a new software update is out, we get a lot of request and the content is distributed in mass over the network. It’s very costly
We don’t want to change our application, but want to optimize our cost and CPU, how can we do it?

Why CloudFront?

No changes to architecture
Will cache software update files at the edge
Software update files are not dynamic, they’re static (never changing)
Our EC2 instances aren’t serverless
But CloudFront is, and will scale for us
Our ASG will not scale as much, and we’ll save tremendously in EC2
We’ll also save in availability, network bandwidth cost, etc
Easy way to make an existing application more scalable and cheaper!

Big Data Ingestion Pipeline

We want the ingestion pipeline to be fully serverless
We want to collect data in real time
We want to transform the data
We want to query the transformed data using SQL
The reports created using the queries should be in S3
We want to load that data into a warehouse and create dashboards

Serverless Architectures Quiz

Question 1: As a solutions architect, you have been tasked to implement a fully Serverless REST API. Which technology choices do you recommend? A: API Gateway + AWS Lambda

Question 2: Which technology does not have an out of the box caching feature? A: Lambda

Lambda does not have an out of the box caching feature (it’s often paired with API gateway for that)

Question 3: Which service allows to federate mobile users and generate temporary credentials so that they can access their own S3 bucket sub-folder? A: Cognito

in combination with STS

Question 4: You would like to distribute your static content which currently lives in Amazon S3 to multiple regions around the world, such as the US, France and Australia. What do you recommend? A: CloudFront

This is a perfect use case for CloudFront

Question 5: You have hosted a DynamoDB table in ap-northeast-1 and would like to make it available in eu-west-1. What must be enabled first to create a DynamoDB Global Table? A: DynamoDB Streams.

Streams enable DynamoDB to get a changelog and use that changelog to replicate data across regions

Question 6: A Lambda function is triggered by a DynamoDB stream and is meant to insert data into SQS for further long processing jobs. The Lambda function does seem able to read from the DynamoDB stream but isn’t able to store messages in SQS. What’s the problem? A: The Lambda IAM role is missing permissions

Question 7: You would like to create a micro service whose sole purpose is to encode video files with your specific algorithm from S3 back into S3. You would like to make that micro-service reliable and retry upon failure. Processing a video may take over 25 minutes. The service is asynchronous and it should be possible for the service to be stopped for a day and resume the next day from the videos that haven’t been encoded yet. Which of the following service would you recommend to implement this service? A: X SQS + EC2

SQS allows you to retain messages for days and process them later, while we take down our EC2 instances

Question 8: You would like to distribute paid software installation files globally for your customers that have indeed purchased the content. The software may be purchased by different users, and you want to protect the download URL with security including IP restriction. Which solution do you recommend? A: CloudFront Signed URL

This will have security including IP restriction

Question 9: You are a photo hosting service and publish every month a master pack of beautiful mountains images, that are over 50 GB in size and downloaded from all around the world. The content is currently hosted on EFS and distributed by ELB and EC2 instances. You are experiencing high load each month and very high network costs. What can you recommend that won’t force an application refactor and reduce network costs and EC2 load dramatically? A: Create a CloudFront distribution

CloudFront can be used in front of an ELB

Question 10: You would like to deliver big data streams in real time to multiple consuming applications, with replay features. Which technology do you recommend? A: Kinesis Data Streams

Databases

Choosing the Right Database

We have a lot of managed databases on AWS to choose from
Questions to choose the right database based on your architecture:
- Read-heavy, write-heavy, or balanced workload? Throughput needs? Will it change, does it need to scale or fluctuate during the day?
- How much data to store and for how long? Will it grow? Average object size? How are they accessed?
- Data durability? Source of truth for the data ?
- Latency requirements? Concurrent users?
- Data model? How will you query the data? Joins? Structured? Semi-Structured?
- Strong schema? More flexibility? Reporting? Search? RDBMS / NoSQL?
- License costs? Switch to Cloud Native DB such as Aurora?

Database Types

RDBMS (= SQL / OLTP): RDS, Aurora – great for joins
NoSQL database: DynamoDB (~JSON), ElastiCache (key / value pairs), Neptune (graphs) – no joins, no SQL
Object Store: S3 (for big objects) / Glacier (for backups / archives)
Data Warehouse (= SQL Analytics / BI): Redshift (OLAP), Athena
Search: ElasticSearch (JSON) – free text, unstructured searches
Graphs: Neptune – displays relationships between data

RDS Overview

Managed PostgreSQL / MySQL / Oracle / SQL Server
Must provision an EC2 instance & EBS Volume type and size
Support for Read Replicas and Multi AZ
Security through IAM, Security Groups, KMS, SSL in transit
Backup / Snapshot / Point in time restore feature
Managed and Scheduled maintenance
Monitoring through CloudWatch
Use case: Store relational datasets (RDBMS / OLTP), perform SQL queries, transactional inserts / update / delete is available

RDS for Solutions Architect

Operations: small downtime when failover happens, when maintenance happens, scaling in read replicas / ec2 instance / restore EBS implies manual intervention, application changes
Security: AWS responsible for OS security, we are responsible for setting up KMS, security groups, IAM policies, authorizing users in DB, using SSL
Reliability: Multi AZ feature, failover in case of failures
Performance: depends on EC2 instance type, EBS volume type, ability to add Read Replicas. Doesn’t auto-scale
Cost: Pay per hour based on provisioned EC2 and EBS

Aurora Overview

Compatible API for PostgreSQL / MySQL
Data is held in 6 replicas, across 3 AZ
Auto healing capability
Multi AZ, Auto Scaling Read Replicas
Read Replicas can be Global
Aurora database can be Global for DR or latency purposes
Auto scaling of storage from 10GB to 64 TB
Define EC2 instance type for aurora instances
Same security / monitoring / maintenance features as RDS
“Aurora Serverless” option
Use case: same as RDS, but with less maintenance / more flexibility / more performance

Aurora for Solutions Architect

Operations: less operations, auto scaling storage
Security: AWS responsible for OS security, we are responsible for setting up KMS, security groups, IAM policies, authorizing users in DB, using SSL
Reliability: Multi AZ, highly available, possibly more than RDS, Aurora Serverless option.
Performance: 5x performance (according to AWS) due to architectural optimizations. Up to 15 Read Replicas (only 5 for RDS)
Cost: Pay per hour based on EC2 and storage usage. Possibly lower costs compared to Enterprise grade databases such as Oracle

ElastiCache Overview

Managed Redis / Memcached (similar offering as RDS, but for caches)
In-memory data store, sub-millisecond latency
Must provision an EC2 instance type
Support for Clustering (Redis) and Multi AZ, Read Replicas (sharding)
Security through IAM, Security Groups, KMS, Redis Auth
Backup / Snapshot / Point in time restore feature
Managed and Scheduled maintenance
Monitoring through CloudWatch
Use Case: Key/Value store, Frequent reads, less writes, cache results for DB queries, store session data for websites, cannot use SQL.

ElastiCache for Solutions Architect

Operations: same as RDS
Security: AWS responsible for OS security, we are responsible for setting up KMS, security groups, IAM policies, users (Redis Auth), using SSL
Reliability: Clustering, Multi AZ
Performance: Sub-millisecond performance, in memory, read replicas for sharding, very popular cache option
Cost: Pay per hour based on EC2 and storage usage

DynamoDB Overview

AWS proprietary technology, managed NoSQL database
Serverless, provisioned capacity, auto scaling, on demand capacity (Nov 2018)
Can replace ElastiCache as a key/value store (storing session data for example)
Highly Available, Multi AZ by default, Read and Writes are decoupled, DAX for read cache
Reads can be eventually consistent or strongly consistent
Security, authentication and authorization is done through IAM
DynamoDB Streams to integrate with AWS Lambda
Backup / Restore feature, Global Table feature
Monitoring through CloudWatch
Can only query on primary key, sort key, or indexes
Use Case: Serverless applications development (small documents 100s KB), distributed serverless cache, doesn’t have SQL query language available, has transactions capability from Nov 2018

DynamoDB for Solutions Architect

Operations: no operations needed, auto scaling capability, serverless
Security: full security through IAM policies, KMS encryption, SSL in flight
Reliability: Multi AZ, Backups
Performance: single digit millisecond performance, DAX for caching reads, performance doesn’t degrade if your application scales
Cost: Pay per provisioned capacity and storage usage (no need to guess in advance any capacity – can use auto scaling)

S3 Overview

S3 is a… key / value store for objects
Great for big objects, not so great for small objects
Serverless, scales infinitely, max object size is 5 TB
Eventually consistency for overwrites and deletes
Tiers: S3 Standard, S3 IA, S3 One Zone IA, Glacier for backups
Features: Versioning, Encryption, Cross Region Replication, etc…
Security: IAM, Bucket Policies, ACL
Encryption: SSE-S3, SSE-KMS, SSE-C, client side encryption, SSL in transit
Use Case: static files, key value store for big files, website hosting

S3 for Solutions Architect

Operations: no operations needed
Security: IAM, Bucket Policies, ACL, Encryption (Server/Client), SSL
Reliability: 99.999999999% durability / 99.99% availability, Multi AZ, CRR
Performance: scales to thousands of read / writes per second, transfer acceleration / multi-part for big files
Cost: pay per storage usage, network cost, requests number

Athena Overview

Fully Serverless database with SQL capabilities
Used to query data in S3
Pay per query
Output results back to S3
Secured through IAM
Use Case: one time SQL queries, serverless queries on S3, log analytics

Athena for Solutions Architect

Operations: no operations needed, serverless
Security: IAM + S3 security
Reliability: managed service, uses Presto engine, highly available
Performance: queries scale based on data size
Cost: pay per query / per TB of data scanned, serverless

Redshift Overview

Redshift is based on PostgreSQL, but it’s not used for OLTP
It’s OLAP – online analytical processing (analytics and data warehousing)
10x better performance than other data warehouses, scale to PBs of data
Columnar storage of data (instead of row based)
Massively Parallel Query Execution (MPP), highly available
Pay as you go based on the instances provisioned
Has a SQL interface for performing the queries
BI tools such as AWS Quicksight or Tableau integrate with it

Redshift Continued…

Data is loaded from S3, DynamoDB, DMS, other DBs…
From 1 node to 128 nodes, up to 160 GB of space per node
Leader node: for query planning, results aggregation
Compute node: for performing the queries, send results to leader
Redshift Spectrum: perform queries directly against S3 (no need to load)
Backup & Restore, Security VPC / IAM / KMS, Monitoring
Redshift Enhanced VPC Routing: COPY / UNLOAD goes through VPC

Redshift – Snapshots & DR

Snapshots are point-in-time backups of a cluster, stored internally in S3
Snapshots are incremental (only what has changed is saved)
You can restore a snapshot into a new cluster
Automated: every 8 hours, every 5 GB, or on a schedule. Set retention
Manual: snapshot is retained until you delete it
You can configure Amazon Redshift to automatically copy snapshots (automated or manual) of a cluster to another AWS Region

Redshift Spectrum

Query data that is already in S3 without loading it
Must have a Redshift cluster available to start the query
The query is then submitted to thousands of Redshift Spectrum nodes

https://aws.amazon.com/blogs/big-data/amazon-redshift-spectrum-extends-data-warehousing-out-to-exabytes-no-loading-required

Redshift for Solutions Architect

Operations: similar to RDS
Security: IAM, VPC, KMS, SSL (similar to RDS)
Reliability: highly available, auto healing features
Performance: 10x performance vs other data warehousing, compression
Cost: pay per node provisioned, 1/10th of the cost vs other warehouses
Remember: Redshift = Analytics / BI / Data Warehouse

Neptune

Fully managed graph database
When do we use Graphs?
- High relationship data
- Social Networking: Users friends with Users, replied to comment on post of user and likes other comments.
- Knowledge graphs (Wikipedia)
Highly available across 3 AZ, with up to 15 read replicas
Point-in-time recovery, continuous backup to Amazon S3
Support for KMS encryption at rest + HTTPS

Neptune for Solutions Architect

Operations: similar to RDS
Security: IAM, VPC, KMS, SSL (similar to RDS) + IAM Authentication
Reliability: Multi-AZ, clustering
Performance: best suited for graphs, clustering to improve performance
Cost: pay per node provisioned (similar to RDS)
Remember: Neptune = Graphs

ElasticSearch

Example: In DynamoDB, you can only find by primary key or indexes.
With ElasticSearch, you can search any field, even partially matches
It’s common to use ElasticSearch as a complement to another database
ElasticSearch also has some usage for Big Data applications
You can provision a cluster of instances
Built-in integrations: Amazon Kinesis Data Firehose, AWS IoT, and Amazon CloudWatch Logs for data ingestion
Security through Cognito & IAM, KMS encryption, SSL & VPC
Comes with Kibana (visualization) & Logstash (log ingestion) – ELK stack

ElasticSearch for Solutions Architect

Operations: similar to RDS
Security: Cognito, IAM, VPC, KMS, SSL
Reliability: Multi-AZ, clustering
Performance: based on ElasticSearch project (open source), petabyte scale
Cost: pay per node provisioned (similar to RDS)
Remember: ElasticSearch = Search / Indexing

Databases Quiz

Question 1: Which database helps you store data in a relational format, with SQL language compatibility and capability of processing transactions? A: RDS

Question 2: Which database do you suggest to have caching capability with a Redis compatible API? A: ElascticCache

ElastiCache can create a Redis cache or a Memcached cache

Question 3: You are looking to perform OLTP, and would like to have the underlying storage with the maximum amount of replication and auto-scaling capability. What do you recommend? A: Aurora

Question 4: As a solution architect, you plan on creating a social media website where users can be friends with each other, and like each other’s posts. You plan on performing some complicated queries such as “What are the number of likes on the posts that have been posted by the friends of Mike?”. What database do you suggest? A: Neptune

This is AWS’ managed graph database

Question 5: You would like to store big objects of 100 MB into a reliable and durable Key Value store. What do you recommend? A: S3

S3 is indeed a key value store! (where the key is the full path of the object in the bucket)

Question 6: You would like to have a database which is efficient at performing analytical queries on large sets of columnar data. You would like to connect that Data Warehouse to a reporting and dashboard tool such as Amazon Quicksight. Which technology do you recommend? A: Redshift

Question 7: Your log data is currently stored in S3 and you would like to perform a quick analysis if possible serverless to filter the logs and find a user which may have completed an unauthorized action. Which technology do you recommend? A: Athena

Question 8: Your gaming website is currently running on top of DynamoDB. Users have been asking for a search feature to find other gamers by name, with partial matches if possible. Which technology do you recommend to implement that feature? A: ElasticSearch

Anytime you see “search”, think ElasticSearch

AWS Monitoring, Audit and Performance

AWS CloudWatch Metrics

CloudWatch provides metrics for every services in AWS
Metric is a variable to monitor (CPUUtilization, NetworkIn…)
Metrics belong to namespaces
Dimension is an attribute of a metric (instance id, environment, etc…).
Up to 10 dimensions per metric
Metrics have timestamps
Can create CloudWatch dashboards of metrics

AWS CloudWatch EC2 Detailed monitoring

EC2 instance metrics have metrics “every 5 minutes”
With detailed monitoring (for a cost), you get data “every 1 minute”
Use detailed monitoring if you want to more prompt scale your ASG!
The AWS Free Tier allows us to have 10 detailed monitoring metrics
Note: EC2 Memory usage is by default not pushed (must be pushed from inside the instance as a custom metric)

AWS CloudWatch Custom Metrics

Possibility to define and send your own custom metrics to CloudWatch
Ability to use dimensions (attributes) to segment metrics
- Instance.id
- Environment.name
Metric resolution (StorageResolution API parameter – two possible value):
- Standard: 1 minute (60 seconds)
- High Resolution: 1 second – Higher cost
Use API call PutMetricData
Use exponential back off in case of throttle errors

CloudWatch Dashboards

Great way to setup dashboards for quick access to keys metrics
Dashboards are global
Dashboards can include graphs from different regions
You can change the time zone & time range of the dashboards
You can setup automatic refresh (10s, 1m, 2m, 5m, 15m)
Pricing:
3 dashboards (up to 50 metrics) for free
$3/dashboard/month afterwards

AWS CloudWatch Logs

Applications can send logs to CloudWatch using the SDK
CloudWatch can collect log from:
- Elastic Beanstalk: collection of logs from application
- ECS: collection from containers
- AWS Lambda: collection from function logs
- VPC Flow Logs: VPC specific logs
- API Gateway
- CloudTrail based on filter
- CloudWatch log agents: for example on EC2 machines
- Route53: Log DNS queries
CloudWatch Logs can go to:
- Batch exporter to S3 for archival
- Stream to ElasticSearch cluster for further analytics

AWS CloudWatch Logs

Logs storage architecture:
- Log groups: arbitrary name, usually representing an application
- Log stream: instances within application / log files / containers
Can define log expiration policies (never expire, 30 days, etc..)
Using the AWS CLI we can tail CloudWatch logs
To send logs to CloudWatch, make sure IAM permissions are correct!
Security: encryption of logs using KMS at the Group Level

CloudWatch Logs Metric Filter & Insights

CloudWatch Logs can use filter expressions
- For example, find a specific IP inside of a log
- Metric filters can be used to trigger alarms
CloudWatch Logs Insights (new – Nov 2018) can be used to query logs and add queries to CloudWatch Dashboards

AWS CloudWatch Alarms

Alarms are used to trigger notifications for any metric
Alarms can go to Auto Scaling, EC2 Actions, SNS notifications
Various options (sampling, %, max, min, etc…)
Alarm States:
- OK
- INSUFFICIENT_DATA (Missing data points)
- ALARM
Period:
- Length of time in seconds to evaluate the metric
- High resolution custom metrics: can only choose 10 sec or 30 sec

AWS CloudWatch Events

Source + Rule => Target
Schedule: Cron jobs
Event Pattern: Event rules to react to a service doing something
- Ex: CodePipeline state changes!
Triggers to Lambda functions, SQS/SNS/Kinesis Messages
CloudWatch Event creates a small JSON document to give information about the change

AWS CloudTrail

Provides governance, compliance and audit for your AWS Account
CloudTrail is enabled by default!
Get an history of events / API calls made within your AWS Account by:
- Console
- SDK
- CLI
- AWS Services
Can put logs from CloudTrail into CloudWatch Logs
If a resource is deleted in AWS, look into CloudTrail first!

AWS Config

Helps with auditing and recording compliance of your AWS resources
Helps record configurations and changes over time
Possibility of storing the configuration data into S3 (analyzed by Athena)
Questions that can be solved by AWS Config:
- Is there unrestricted SSH access to my security groups?
- Do my buckets have any public access?
- How has my ALB configuration changed over time?
You can receive alerts (SNS notifications) for any changes
AWS Config is a per-region service
Can be aggregated across regions and accounts

AWS Config Resource - View compliance of a resource over time

AWS Config Rules

Can use AWS managed config rules (over 75)
Can make custom config rules (must be defined in AWS Lambda)
- Evaluate if each EBS disk is of type gp2
- Evaluate if each EC2 instance is t2.micro
Rules can be evaluated / triggered:
- For each config change
- And / or: at regular time intervals
- Can trigger CloudWatch Events if the rule is non-compliant (and chain with Lambda)
Rules can have auto remediations:
- If a resource is not compliant, you can trigger an auto remediation
- Ex: stop instances with non-approved tags
AWS Config Rules does not prevent actions from happening (no deny)
Pricing: no free tier, $2 per active rule per region per month

CloudWatch vs CloudTrail vs Config

CloudWatch
- Performance monitoring (metrics, CPU, network, etc…) & dashboards
- Events & Alerting
- Log Aggregation & Analysis
CloudTrail
- Record API calls made within your Account by everyone
- Can define trails for specific resources
- Global Service
Config
- Record configuration changes
- Evaluate resources against compliance rules
- Get timeline of changes and compliance

For an Elasctic Load Balancer

CloudWatch:
- Monitoring Incoming connections metic
- Visualize error codes as a % over time
- Make a dashboard to get an idea of your load blanacer performance
Config:
- Track security group rules for the load Balancer
- Track configuration changes for the Load Balancer
- Ensure an SSL certificate is alqways assigned to the Load Blanacer (compliace)
CloudTrail:
- Track who made any changes to the Load Balancer with API calls.

Monitoring Quiz

Question 1: We’d like to have CloudWatch Metrics for EC2 at a 1 minute rate. What should we do? A: Enable Detailed Monitoring

This is a paid offering and gives you EC2 metrics at a 1 minute rate

Question 2: High Resolution Custom Metrics can have a minimum resolution of A: 1 second

Question 3: Your CloudWatch alarm is triggered and controls an ASG. The alarm should trigger 1 instance being deleted from your ASG, but your ASG has already 2 instances running and the minimum capacity is 2. What will happen? A: The alarm will remain in “ALARM” state but never decrease the number of instances in my ASG

The number of instances in an ASG cannot go below the minimum, even if the alarm would in theory trigger an instance termination

Question 4: An Alarm on a High Resolution Metric can be triggered as often as A:

Question 5: You have made a configuration change and would like to evaluate the impact of it on the performance of your application. Which service do you use? A: CloudWatch

CloudWatch is used to monitor the applications performance / metrics

Question 6: Someone has terminated an EC2 instance in your account last week, which was hosting a critical database. You would like to understand who did it and when, how can you achieve that? A: Look a CloudTrail

CloudTrail helps audit the API calls made within your account, so the database deletion API call will appear here (regardless if made from the console, the CLI, or an SDK)

Question 7: You would like to ensure that over time, none of your EC2 instances expose the port 84 as it is known to have vulnerabilities with the OS you are using. What can you do to monitor this? A: Setup Config Rules

Question 8: You would like to evaluate the compliance of your resource’s configurations over time. Which technology do you choose? A: Config

AWS STS – Security Token Service

Allows to grant limited and temporary access to AWS resources.
Token is valid for up to one hour (must be refreshed)
AssumeRole
- Within your own account: for enhanced security
- Cross Account Access: assume role in target account to perform actions there
AssumeRoleWithSAML
- return credentials for users logged with SAML
AssumeRoleWithWebIdentity
- return creds for users logged with an IdP (Facebook Login, Google Login, OIDC compatible…)
- AWS recommends against using this, and using Cognito instead
GetSessionToken
- for MFA, from a user or AWS account root user

Using STS to Assume a Role

Define an IAM Role within your account or cross-account
Define which principals can access this IAM Role
Use AWS STS (Security Token Service) to retrieve credentials and impersonate the IAM Role you have access to (AssumeRole API)
Temporary credentials can be valid between 15 minutes to 1 hour

https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_common-scenarios_aws-accounts.html

Identity Federation in AWS

Federation lets users outside of AWS to assume temporary role for accessing AWS resources.
These users assume identity provided access role.
Federations can have many flavors:
SAML 2.0
Custom Identity Broker
Web Identity Federation with Amazon Cognito
Web Identity Federation without Amazon Cognito
Single Sign On
Non-SAML with AWS Microsoft AD
Using federation, you don’t need to create IAM users (user management is outside of AWS)

SAML 2.0 Federation

To integrate Active Directory / ADFS with AWS (or any SAML 2.0)
Provides access to AWS Console or CLI (through temporary creds)
No need to create an IAM user for each of your employees

AWS Directory Services

AWS Managed Microsoft AD
- Create your own AD in AWS, manage users locally, supports MFA
- Establish “trust” connections with your on- premise AD
AD Connector
- Directory Gateway (proxy) to redirect to on- premise AD
- Users are managed on the on-premise AD
Simple AD
- AD-compatible managed directory on AWS
- Cannot be joined with on-premise AD

AWS Organizations

Global service
Allows to manage multiple AWS accounts
The main account is the master account – you can’t change it
Other accounts are member accounts
Member accounts can only be part of one organization
Consolidated Billing across all accounts - single payment method
Pricing benefits from aggregated usage (volume discount for EC2, S3…)
API is available to automate AWS account creation

Multi Account Strategies

Create accounts per department, per cost center, per dev / test / prod, based on regulatory restrictions (using SCP), for better resource isolation (ex: VPC), to have separate per-account service limits, isolated account for logging
Multi Account vs One Account Multi VPC
Use tagging standards for billing purposes
Enable CloudTrail on all accounts, send logs to central S3 account
Send CloudWatch Logs to central logging account
Establish Cross Account Roles for Admin purposes

IAM Conditions

aws:SourceIP: restrict the client IP from which the API calls are being made

Aws:RequestedRegion: restrict the region The API calls are made to

Restrict based on tags Force MFA

IAM for S3

ListBucket permission applies to arn:aws:s3:::test
=> bucket level permission
GetObject, PutObject, DeleteObject applies to arn:awn:s3:::test/*
=> object level permission

Identity and Access Management (IAM) - Advanced - Quiz

Question 1: We need to gain access to a Role in another AWS account. How is it done? A: We should use the STS service to gain temporary credentials

STS will allow us to get cross account access through the creation of a role in our account authorized to access a role in another account. See more here: https://docs.aws.amazon.com/IAM/latest/UserGuide/tutorial_cross-account-with-roles.html

Question 2: You have a mobile application and would like to give your users access to their own personal space in Amazon S3. How do you achieve that? A: Use Cognito Identity Federation

Cognito is made to federate mobile user accounts and provide them with their own IAM policy. As such, they should be able thanks to that policy to access their own personal space in Amazon S3.

X: Use SAML Identity Federation

SAML identity federation is used to integrate a service such as Active Directory with AWS. It does not work for mobile applciations

Question 3: You have strong regulatory requirements to only allow fully internally audited AWS Services in production. You still want to allow your teams to experiment in development environments while services are being audited. How can you best set this up? A: Create an AWS Organization and create two Prod and Dev OU. Apply a SCP on Prod

Question 4: [SAA-C02] You have an on-premise active directory setup and would like to provide access for your on-premise users to the multiple accounts you have in AWS. The solution should scale to adding accounts in the future. What do you recommend? A: Setup AWS Single Sign-On

Question 5: Which AWS Directory Service allows you to proxy requests to your on-premise active directory? A: AD Connector

[*]

Section 20: AWS Security & Encryption: KMS, SSM Parameter Store, CloudHSM, Shield, WAF

AWS Security & Encryption

Why encryption?

Encryption in flight (SSL)

Data is encrypted before sending and decrypted after receiving
SSL certificates help with encryption (HTTPS)
Encryption in flight ensures no MITM (man in the middle attack) can happen

Why encryption?

Server side encryption at rest

Data is encrypted after being received by the server
Data is decrypted before being sent
It is stored in an encrypted form thanks to a key (usually a data key)
The encryption / decryption keys must be managed somewhere and the server must have access to it

Why encryption?

Client side encryption

Data is encrypted by the client and never decrypted by the server
Data will be decrypted by a receiving client
The server should not be able to decrypt the data
Could leverage Envelope Encryption

AWS KMS (Key Management Service)

Anytime you hear “encryption” for an AWS service, it’s most likely KMS
Easy way to control access to your data, AWS manages keys for us
Fully integrated with IAM for authorization
Seamlessly integrated into:
- Amazon EBS: encrypt volumes
- Amazon S3: Server side encryption of objects
- Amazon Redshift: encryption of data
- Amazon RDS: encryption of data
- Amazon SSM: Parameter store
- Etc…
But you can also use the CLI / SDK

KMS – Customer Master Key (CMK) Types

Symmetric (AES-256 keys)
- First offering of KMS, single encryption key that is used to Encrypt and Decrypt
- AWS services that are integrated with KMS use Symmetric CMKs
- Necessary for envelope encryption
- You never get access to the Key unencrypted (must call KMS API to use)
Asymmetric (RSA & ECC key pairs)
- Public (Encrypt) and Private Key (Decrypt) pair
- Used for Encrypt/Decrypt, or Sign/Verify operations
- The public key is downloadable, but you access the Private Key unencrypted
- Use case: encryption outside of AWS by users who can’t call the KMS API

AWS KMS (Key Management Service)

Able to fully manage the keys & policies:
- Create
- Rotation policies
- Disable
- Enable
Able to audit key usage (using CloudTrail)
Three types of Customer Master Keys (CMK):
AWS Managed Service Default CMK: free
- User Keys created in KMS: $1 / month
- User Keys imported (must be 256-bit symmetric key): $1 / month
- - pay for API call to KMS ($0.03 / 10000 calls)

AWS KMS 101

Anytime you need to share sensitive information… use KMS
Database passwords
Credentials to external service
Private Key of SSL certificates
The value in KMS is that the CMK used to encrypt data can never be retrieved by the user, and the CMK can be rotated for extra security

AWS KMS 101

Never ever store your secrets in plaintext, especially in your code!
Encrypted secrets can be stored in the code / environment variables
KMS can only help in encrypting up to 4KB of data per call
If data > 4 KB, use envelope encryption
To give access to KMS to someone:
- Make sure the Key Policy allows the user
- Make sure the IAM Policy allows the API calls

KMS Key Policies

Control access to KMS keys, “similar” to S3 bucket policies
Difference: you cannot control access without them
Default KMS Key Policy:
- Created if you don’t provide a specific KMS Key Policy
- Complete access to the key to the root user = entire AWS account
- Gives access to the IAM policies to the KMS key
Custom KMS Key Policy:
- Define users, roles that can access the KMS key
- Define who can administer the key
- Useful for cross-account access of your KMS key

Copying Snapshots across accounts

Create a Snapshot, encrypted with your own CMK
Attach a KMS Key Policy to authorize cross-account access
Share the encrypted snapshot
(in target) Create a copy of the Snapshot, encrypt it with a KMS Key in your account
Create a volume from the snapshot KMS Key Policy

SSM Parameter Store

Secure storage for configuration and secrets
Optional Seamless Encryption using KMS
Serverless, scalable, durable, easy SDK
Version tracking of configurations / secrets
Configuration management using path & IAM
Notifications with CloudWatch Events
Integration with CloudFormation

SSM Parameter Store Hierarchy

AWS Secrets Manager

Newer service, meant for storing secrets
Capability to force rotation of secrets every X days
Automate generation of secrets on rotation (uses Lambda)
Integration with Amazon RDS (MySQL, PostgreSQL, Aurora)
Secrets are encrypted using KMS
Mostly meant for RDS integration

CloudHSM

KMS => AWS manages the software for encryption
CloudHSM => AWS provisions encryption hardware
Dedicated Hardware (HSM = Hardware Security Module)
You manage your own encryption keys entirely (not AWS)
HSM device is tamper resistant, FIPS 140-2 Level 3 compliance
CloudHSM clusters are spread across Multi AZ (HA) – must setup
Supports both symmetric and asymmetric encryption (SSL/TLS keys)
No free tier available
Must use the CloudHSM Client Software
Redshift supports CloudHSM for database encryption and key management
Good option to use with SSE-C encryption

AWS Shield

AWS Shield Standard:
- Free service that is activated for every AWS customer
- Provides protection from attacks such as SYN/UDP Floods, Reflection attacks and other layer 3/layer 4 attacks
AWS Shield Advanced:
- Optional DDoS mitigation service ($3,000 per month per organization)
- Protect against more sophisticated attack on Amazon EC2, Elastic Load Balancing (ELB), Amazon CloudFront, AWS Global Accelerator, and Route 53
- 24/7 access to AWS DDoS response team (DRP)
- Protect against higher fees during usage spikes due to DDoS

AWS WAF – Web Application Firewall

Protects your web applications from common web exploits (Layer 7)
Layer 7 is HTTP (vs Layer 4 is TCP)
Deploy on Application Load Balancer, API Gateway, CloudFront
Define Web ACL (Web Access Control List):
- Rules can include: IP addresses, HTTP headers, HTTP body, or URI strings
- Protects from common attack - SQL injection and Cross-Site Scripting (XSS)
- Size constraints, geo-match (block countries)
- Rate-based rules (to count occurrences of events) – for DDoS protection

AWS Firewall Manager

Manage rules in all accounts of an AWS Organization
Common set of security rules
WAF rules (Application Load Balancer, API Gateways, CloudFront)
AWS Shield Advanced (ALB, CLB, Elastic IP, CloudFront)
Security Groups for EC2 and ENI resources in VPC

AWS Shared Responsibility Model

AWS responsibility - Security of the Cloud
Protecting infrastructure (hardware, software, facilities, and networking) that runs all of the AWS services
Managed services like S3, DynamoDB, RDS etc
Customer responsibility - Security in the Cloud
For EC2 instance, customer is responsible for management of the guest OS (including security patches and updates), firewall & network configuration, IAM etc

Example, for RDS

AWS responsibility:
- Manage the underlying EC2 instance, disable SSH access
- Automated DB patching
- Automated OS patching
- Audit the underlying instance and disks & guarantee it functions
Your responsibility:
- Check the ports / IP / security group inbound rules in DB’s SG
- In-database user creation and permissions
- Creating a database with or without public access
- Ensure parameter groups or DB is configured to only allow SSL connections
- Database encryption setting

Example, for S3

AWS responsibility:
- Guarantee you get unlimited storage
- Guarantee you get encryption
- Ensure separation of the data between different customers
- Ensure AWS employees can’t access your data
Your responsibility:
- Bucket configuration
- Bucket policy / public setting
- IAM user and roles
- Enabling encryption

Shared Responsibility Model diagram

https://aws.amazon.com/compliance/shared-responsibility-model/

Security & Encryption Quiz

Question 1: To enable encryption in flight, we need to have A: an HTTPS endpoint with a SSL certificate

encryption in flight = HTTPS, and HTTPs cannot be enabled without an SSL certificate

Question 2: Server side encryption means that the data is sent encrypted to the server first A: False

Server side encryptions means the server will encrypt the data for us. We don’t need to encrypt it beforehand

Question 3: In server side encryption, only the encryption happens on the server. Where does the decryption happen? A: The server

In server side encryption, the decryption also happens on the server (in AWS, we wouldn’t be able to decrypt the data ourselves as we can’t have access to the corresponding encryption key)

Question 4: In client side encryption, the server must know our encryption scheme to accept the data A: False

With client side encryption, the server does not need to know any information about the encryption being used, as the server won’t perform any encryption or decryption tasks

Question 5: We need to create User Keys in KMS before using the encryption features for EBS, S3, etc… A: False

we can use the AWS Managed Service Keys in KMS, therefore we don’t need to create our own keys

Question 6: We’d like our Lambda function to have access to a database password. We should A: Have it as an encrypted environment variable and decrypt it at runtime

This is the most secure solution amongst the options

Question 7: We would like to audit the values of an encryption value over time A: We should use SSM Parameter Store

SSM Parameter Store has versioning and audit of values built-in directly

Question 8: Under the shared responsibility model, what are you responsible for in RDS? A: Security Group Rules

This are configured by us and we’ve done that extensively in the course

Question 9: Your user-facing website is a high risk target for DDoS attack and you would like to get 24/7 support in case they happen, as well as AWS bill reimbursement for the incurred costs during the attacks. What service should you use? A: AWS Shield Advanced

Question 10: You need an encryption service that supports asymmetric encryption schemes, and you want to manage the security keys yourself. Which service could you use? A: CloudHSM

Networking - VPC

Understanding CIDR - IPv4

(Classless Inter-Domain Routing)

CIDR are used for Security Groups rules, or AWS networking in general
They help to define an IP address range
- We’ve seen WW.XX.YY.ZZ/32 == one IP
- We’ve seen 0.0.0.0/0 == all IPs
- But we can define for ex: 192.168.0.0/26: 192.168.0.0 – 192.168.0.63 (64 IP)

Understanding CIDR

A CIDR has two components:
- The base IP (XX.XX.XX.XX)
- The Subnet Mask (/26)
The base IP represents an IP contained in the range
The subnet masks defines how many bits can change in the IP
The subnet mask can take two forms. Examples:
- 255.255.255.0 <- less common
- /24 <- more common

Understanding CIDRs

Subnet Masks

The subnet masks basically allows part of the underlying IP to get additional next values from the base IP
/32 allows for 1 IP = 2^0
/31 allows for 2 IP = 2^1
/30 allows for 4 IP = 2^2
/29 allows for 8 IP = 2^3
/28 allows for 16 IP = 2^4
/27 allows for 32 IP = 2^5
/26 allows for 64 IP = 2^6
/25 allows for 128 IP = 2^7
/24 allows for 256 IP = 2^8
/16 allows for 65,536 IP = 2^16
/0 allows for all IPs = 2^32

Understanding CIDRs

Little exercise

192.168.0.0/24 = … ?
192.168.0.0 – 192.168.0.255 (256 IP)
192.168.0.0/16 = … ?
192.168.0.0 – 192.168.255.255 (65,536 IP)
134.56.78.123/32 = … ?
Just 134.56.78.123
0.0.0.0/0
All IP!
When in doubt, use this website: https://www.ipaddressguide.com/cidr

Private vs Public IP (IPv4)

Allowed ranges

The Internet Assigned Numbers Authority (IANA) established certain blocks of IPV4 addresses for the use of private (LAN) and public (Internet) addresses.
Private IP can only allow certain values
- 10.0.0.0 – 10.255.255.255 (10.0.0.0/8) <= in big networks
- 172.16.0.0 – 172.31.255.255 (172.16.0.0/12) <= default AWS one
- 192.168.0.0 – 192.168.255.255 (192.168.0.0/16) <= example: home networks
All the rest of the IP on the internet are public IP

Default VPC Walkthrough

All new accounts have a default VPC
New instances are launched into default VPC if no subnet is specified
Default VPC have internet connectivity and all instances have public IP
We also get a public and a private DNS name

VPC in AWS – IPv4

VPC = Virtual Private Cloud
You can have multiple VPCs in a region (max 5 per region – soft limit)
- Max CIDR per VPC is 5. For each CIDR:
- Min size is /28 = 16 IP Addresses
Max size is /16 = 65536 IP Addresses
- Because VPC is private, only the Private IP ranges are allowed:
- 10.0.0.0 – 10.255.255.255 (10.0.0.0/8)
- 172.16.0.0 – 172.31.255.255 (172.16.0.0/12)
- 192.168.0.0 – 192.168.255.255 (192.168.0.0/16)
Your VPC CIDR should not overlap with your other networks (ex: corporate)

Subnets - IPv4

AWS reserves 5 IPs address (first 4 and last 1 IP address) in each Subnet
These 5 IPs are not available for use and cannot be assigned to an instance
Ex, if CIDR block 10.0.0.0/24, reserved IP are:
- 10.0.0.0: Network address
- 10.0.0.1: Reserved by AWS for the VPC router
- 10.0.0.2: Reserved by AWS for mapping to Amazon-provided DNS
- 10.0.0.3: Reserved by AWS for future use
- 10.0.0.255: Network broadcast address. AWS does not support broadcast in a VPC, therefore the address is reserved
Exam Tip: [*]
- If you need 29 IP addresses for EC2 instances, you can’t choose a Subnet of size /27 (32 IP)
- You need at least 64 IP, Subnet size /26 (64-5 = 59 > 29, but 32-5 = 27 < 29)

Internet Gateways

Internet gateways helps our VPC instances connect with the internet
It scales horizontally and is HA (High Ability) and redundant
Must be created separately from VPC
One VPC can only be attached to one IGW and vice versa [*]
Internet Gateway is also a NAT for the instances that have a public IPv4
Internet Gateways on their own do not allow internet access…
Route tables must also be edited!

NAT Instances – Network Address Translation

(outdated but still at the exam)

Allows instances in the private subnets to connect to the internet
Must be launched in a public subnet
Must disable EC2 flag: Source / Destination Check
Must have Elastic IP attached to it
Route table must be configured to route traffic from private subnets to

NAT Instances – Comments

Amazon Linux AMI pre-configured are available
Not highly available / resilient setup out of the box
=> Would need to create ASG in multi AZ + resilient user-data script
Internet traffic bandwidth depends on EC2 instance performance
Must manage security groups & rules:
- Inbound:
  - Allow HTTP / HTTPS Traffic coming from Private Subnets
  - Allow SSH from your home network (access is provided through Internet Gateway)
- Outbound:
  - Allow HTTP / HTTPS traffic to the internet

NAT Gateway

AWS managed NAT, higher bandwidth, better availability, no admin
Pay by the hour for usage and bandwidth
NAT is created in a specific AZ, uses an EIP
Cannot be used by an instance in that subnet (only from other subnets)
Requires an IGW (Private Subnet => NAT => IGW)
5 Gbps of bandwidth with automatic scaling up to 45 Gbps
No security group to manage / required

NAT Instance vs Gateway

See: https://docs.aws.amazon.com/vpc/latest/userguide/vpc-natcomparison.html

DNS Resolution in VPC

enableDnsSupport: (= DNS Resolution setting)
- Default True
- Helps decide if DNS resolution is supported for the VPC
- If True, queries the AWS DNS server at 169.254.169.253
enableDnsHostname: (= DNS Hostname setting)
- False by default for newly created VPC, True by default for Default VPC
- Won’t do anything unless enableDnsSupport=true
- If True, Assign public hostname to EC2 instance if it has a public
If you use custom DNS domain names in a private zone in Route 53, you must set both these attributes to true

[*] Network ACLs & Security Group Incoming Request

Network ACLs

NACL are like a firewall which control traffic from and to subnet
Default NACL allows everything outbound and everything inbound
One NACL per Subnet, new Subnets are assigned the Default NACL
Define NACL rules:
- Rules have a number (1-32766) and higher precedence with a lower number
- E.g. If you define #100 ALLOW and #200 DENY , IP will be allowed
- Last rule is an asterisk (*) and denies a request in case of no rule match
- AWS recommends adding rules by increment of 100
Newly created NACL will deny everything
NACL are a great way of blocking a specific IP at the subnet level

Network ACLs vs Security Groups

https://docs.aws.amazon.com/vpc/latest/userguide/VPC_Security.html#VPC_Security_Comparison

Example Network ACL with Ephemeral Ports

https://docs.aws.amazon.com/vpc/latest/userguide/vpc-network-acls.html

VPC Peering [*]

Connect two VPC, privately using AWS’ network
Make them behave as if they were in the same network
Must not have overlapping CIDR
VPC Peering connection is not transitive (must be established for each VPC that need to communicate with one another) [*]
You can do VPC peering with another AWS account
You must update route tables in each VPC’s subnets to ensure instances can communicate [*]

VPC Peering – Good to know

VPC peering can work inter-region, cross-account
You can reference a security group of a peered VPC (works cross account)

VPC Endpoints

Endpoints allow you to connect to AWS Services using a private network instead of the public www network
They scale horizontally and are redundant
They remove the need of IGW, NAT, etc… to access AWS Services
Interface: provisions an ENI (private IP address) as an entry point (must attach security group) – most AWS services
Gateway: provisions a target and must be used in a route table – S3 and DynamoDB
In case of issues:
- Check DNS Setting Resolution in your VPC
- Check Route Tables

Flow Logs

Capture information about IP traffic going into your interfaces:
- VPC Flow Logs
- Subnet Flow Logs
- Elastic Network Interface Flow Logs
Helps to monitor & troubleshoot connectivity issues
Flow logs data can go to S3 / CloudWatch Logs
Captures network information from AWS managed interfaces too: ELB, RDS, ElastiCache, Redshift, WorkSpaces

[*] To use private DNS names, ensure that the attributes ‘Enable DNS hostnames’ and ‘Enable DNS Support’ are set to ‘true’ for your VPC (vpc-f4858893). Learn more.

Flow Log Syntax

Srcaddr, dstaddr help identify problematic IP
Srcport, dstport help identity problematic ports
Action : success or failure of the request due to Security Group / NACL
Can be used for analytics on usage patterns, or malicious behavior
Flow logs example: https://docs.aws.amazon.com/vpc/latest/userguide/flow-logs.html#flow-log-records
Query VPC flow logs using Athena on S3 or CloudWatch Logs Insights

Bastion Hosts

We can use a Bastion Host to SSH into our private instances
The bastion is in the public subnet which is then connected to all other private subnets
Bastion Host security group must be tightened
Exam Tip: Make sure the bastion host only has port 22 traffic from the IP you need, not from the security groups of your other instances [*]

Site to Site VPN

Virtual Private Gateway:
- VPN concentrator on the AWS side of the VPN connection
- VGW is created and attached to the VPC from which you want to create the Site-toSite VPN connection
- Possibility to customize the ASN
Customer Gateway:
- Software application or physical device on customer side of the VPN connection
- https://docs.aws.amazon.com/vpc/latest/adminguide/Introduction.html#DevicesTested
- IP Address: [*]
  - Use static, internet-routable IP address for your customer gateway device.
  - If behind a CGW behind NAT (with NAT-T), use the public IP address of the NAT

Direct Connect

Provides a dedicated private connection from a remote network to your VPC
Dedicated connection must be setup between your DC and AWS Direct Connect locations
You need to setup a Virtual Private Gateway on your VPC
Access public resources (S3) and private (EC2) on same connection
Use Cases:
- Increase bandwidth throughput - working with large data sets – lower cost
- More consistent network experience - applications using real-time data feeds
- Hybrid Environments (on prem + cloud)
Supports both IPv4 and IPv6

Direct Connect Gateway

If you want to setup a Direct Connect to one or more VPC in many different regions (same account), you must use a Direct Connect Gateway

Direct Connect – Connection Types

Dedicated Connections: 1Gbps and 10 Gbps capacity
Physical ethernet port dedicated to a customer
Request made to AWS first, then completed by AWS Direct Connect Partners
Hosted Connections: 50Mbps, 500 Mbps, to 10 Gbps
Connection requests are made via AWS Direct Connect Partners
Capacity can be added or removed on demand
1, 2, 5, 10 Gbps available at select AWS Direct Connect Partners
Lead times are often longer than 1 month to establish a new connection

Direct Connect

– Encryption

Data in transit is not encrypted but is private
AWS Direct Connect + VPN provides an IPsec -encrypted private connection
Good for an extra level of security, but slightly more complex to put in place

Egress Only Internet Gateway

Egress only Internet Gateway is for IPv6 only
Similar function as a NAT, but a NAT is for IPv4
Good to know: IPv6 are all public addresses
Therefore all our instances with IPv6 are publicly accessibly
Egress Only Internet Gateway gives our IPv6 instances access to the internet, but they won’t be directly reachable by the internet
After creating an Egress Only Internet Gateway, edit the route tables

[SAA-C02]

AWS PrivateLink (VPC Endpoint Services) [*]

Most secure & scalable way to expose a service to 1000s of VPC (own or other accounts)
Does not require VPC peering, internet gateway, NAT, route tables…
Requires a network load balancer (Service VPC) and ENI (Customer VPC)
If the NLB is in multiple AZ, and the ENI in multiple AZ, the solution is fault tolerant!

[SAA-C02]

EC2-Classic & AWS ClassicLink (deprecated)

EC2-Classic: instances run in a single network shared with other customers
Amazon VPC: your instances run logically isolated to your AWS account
ClassicLink allows you to link EC2-Classic instances to a VPC in your account
- Must associate a security group
- Enables communication using private IPv4 addresses
- Removes the need to make use of public IPv4 addresses or Elastic IP addresses
Likely to be distractors at the exam

[SAA-C02]

AWS VPN CloudHub

Provide secure communication between sites, if you have multiple VPN connections
Low cost hub-and-spoke model for primary or secondary network connectivity between locations
It’s a VPN connection so it goes over the public internet

[SAA-C02]

Transit Gateway

For having transitive peering between thousands of VPC and on-premises, hub-and-spoke (star) connection
Regional resource, can work cross -region
Share cross -account using Resource Access Manager (RAM)
You can peer Transit Gateways across regions
Route Tables: limit which VPC can talk with other VPC
Works with Direct Connect Gateway, VPN connections
Supports IP Multicast (not supported by any other AWS service)

VPC Section Summary (1/3)

CIDR: IP Range
VPC: Virtual Private Cloud => we define a list of IPv4 & IPv6 CIDR
Subnets:Tied to an AZ, we define a CIDR
Internet Gateway: at the VPC level, provide IPv4 & IPv6 Internet Access
Route Tables: must be edited to add routes from subnets to the IGW, VPC Peering Connections, VPC Endpoints, etc…
NAT Instances: gives internet access to instances in private subnets. Old, must be setup in a public subnet, disable Source / Destination check flag
NAT Gateway: managed by AWS, provides scalable internet access to private instances, IPv4 only
Private DNS + Route 53: enable DNS Resolution + DNS hostnames (VPC)
NACL: Stateless, subnet rules for inbound and outbound, don’t forget ephemeral ports
Security Groups: Stateful, operate at the EC2 instance level

VPC Section Summary (2/3)

VPC Peering: Connect two VPC with non overlapping CIDR, non transitive
VPC Endpoints: Provide private access to AWS Services (S3, DynamoDB, CloudFormation, SSM) within VPC
VPC Flow Logs: Can be setup at the VPC / Subnet / ENI Level, for ACCEPT and REJECT traffic, helps identifying attacks, analyze using Athena or CloudWatch Log Insights
Bastion Host: Public instance to SSH into, that has SSH connectivity to instances in private subnets
Site to Site VPN: setup a Customer Gateway on DC, a Virtual Private Gateway on VPC, and site-to-site VPN over public internet
Direct Connect: setup a Virtual Private Gateway on VPC, and establish a direct private connection to an AWS Direct Connect Location
Direct Connect Gateway: setup a Direct Connect to many VPC in different regions
Internet Gateway Egress: like a NAT Gateway, but for IPv6

VPC Section Summary (3/3)

Private Link / VPC Endpoint Services:
connect services privately from your service VPC to customers VPC
Doesn’t need VPC peering, public internet, NAT gateway, route tables
Must be used with Network Load Balancer & ENI
ClassicLink: connect EC2-Classic instances privately to your VPC
VPN CloudHub: hub-and-spoke VPN model to connect your sites
Transit Gateway: transitive peering connections for VPC, VPN & DX

VPC Quiz Question 1: What does this CIDR correspond to? 10.0.4.0/28 A: 10.0.4.0 TO 10.0.4.15

/28 means 16 IPs (=2^(32-28) = 2^4), means only the last digit can change.

Question 2: You have a corporate network of size 10.0.0.0/8 and a satellite office of size 192.168.0.0/16. Which CIDR is acceptable for your AWS VPC if you plan on connecting your networks later on? a: 172.16.0.0/16 X: Lecture 222 CIDR not should overlap, and the max CIDR size in AWS is /16

Question 3: You plan on creating a subnet and want it to have at least capacity for 28 EC2 instances. What’s the minimum size you need to have for your subnet? A: /26

perfect size (64 IP)

Question 4: You have set up an internet gateway in your VPC, but your EC2 instances still don’t have access to the internet. What is NOT a possible issue? A: The security group does not allow network in. X: 228 security groups are stateful and if traffic can go out, then it can go back in

Question 5: You would like to provide internet access to your instances in private subnets with IPv4, while making sure this solution requires the least amount of administration and scales seamlessly. What should you use? A: NAT Gateway

Question 6: VPC Peering has been enabled between VPC A and VPC B, and the route tables have been updated for VPC A. Still, your instances cannot communicate. What is the likely issue? A: Check the route tables in VPC B

Route tables must be updated in both VPC that are peered

Question 7: You have set-up a direct connection between your Corporate Data Center and your VPC A. You need to access VPC B in another region from your Corporate Data Center as well. What should you do? A: Use a Direct Connect Gateway

This is the main use case of Direct Connect Gateways

Question 8: Which are the only two services that have a Gateway Endpoint instead of an Interface Endpoint as a VPC endpoint? A: Amazon S3 & DynamoDB

these two services have a Gateway endpoint (remember it), all the other ones have an interface endpoint (powered by Private Link - means a private IP)

Question 9: Your company has created a REST API that it will sell to hundreds of customers as a SaaS. Your customers are on AWS and are using their own VPC. You would like to allow your customers to access your SaaS without going through the public internet while ensuring your infrastructure is not left exposed to network attacks. What do you recommend? A: AWS PrivateLink

Question 10: Your company has several on-premise sites across the USA. These sites are currently linked using a private connection, but your private connection provider has been recently quite unstable, making your IT architecture partially offline. You would like to create a backup connection that will use the public internet to link your on-premise sites, that you can failover in case of issues with your provider. What do you recommend? A: VPN CloudHub

Networking Costs in AWS per GB - Simplified

Use Private IP instead of Public IP for good savings and better network performance
Use same AZ for maximum savings (at the cost of high availability)

Disaster Recovery Overview

Any event that has a negative impact on a company’s business continuity or finances is a disaster
Disaster recovery (DR) is about preparing for and recovering from a disaster
What kind of disaster recovery?
On-premise => On-premise: traditional DR, and very expensive
On-premise => AWS Cloud: hybrid recovery
AWS Cloud Region A => AWS Cloud Region B
Need to define two terms:
RPO: Recovery Point Objective
RTO: Recovery Time Objective

Disaster Recovery Strategies

Backup and Restore
Pilot Light
Warm Standby
Hot Site / Multi Site Approach

Disaster Recovery Tips

Backup
- EBS Snapshots, RDS automated backups / Snapshots, etc…
- Regular pushes to S3 / S3 IA / Glacier, Lifecycle Policy, Cross Region Replication
- From On-Premise: Snowball or Storage Gateway
High Availability
- Use Route53 to migrate DNS over from Region to Region
- RDS Multi-AZ, ElastiCache Multi-AZ, EFS, S3
- Site to Site VPN as a recovery from Direct Connect
Replication
- RDS Replication (Cross Region), AWS Aurora + Global Databases
- Database replication from on-premise to RDS
- Storage Gateway
Automation
- CloudFormation / Elastic Beanstalk to re-create a whole new environment
- Recover / Reboot EC2 instances with CloudWatch if alarms fail
- AWS Lambda functions for customized automations
Chaos
- Netflix has a “simian-army” randomly terminating EC2

DMS – Database Migration Service

Quickly and securely migrate databases to AWS, resilient, self healing
The source database remains available during the migration
Supports:
- Homogeneous migrations: ex Oracle to Oracle
- Heterogeneous migrations: ex Microsoft SQL Server to Aurora
Continuous Data Replication using CDC
You must create an EC2 instance to perform the replication tasks

DMS Sources and Targets

SOURCES:

On-Premise and EC2 instances databases: Oracle, MS SQL Server, MySQL, MariaDB, PostgreSQL, MongoDB, SAP, DB2
Azure: Azure SQL Database
Amazon RDS: all including Aurora
Amazon S3 TARGETS:
On-Premise and EC2 instances databases: Oracle, MS SQL Server, MySQL, MariaDB, PostgreSQL, SAP
Amazon RDS
Amazon Redshift
Amazon DynamoDB
Amazon S3
ElasticSearch Service
Kinesis Data Streams
DocumentDB

[SAA-C02]

AWS Schema Conversion Tool (SCT)

Convert your Database’s Schema from one engine to another
Example OLTP: (SQL Server or Oracle) to MySQL, PostgreSQL, Aurora
Example OLAP: (Teradata or Oracle) to Amazon Redshift
You do not need to use SCT if you are migrating the same DB engine
- Ex: On-Premise PostgreSQL => RDS PostgreSQL
- The DB engine is still PostgreSQL (RDS is the platform)

[SAA-C02]

On-Premise strategy with AWS

Ability to download Amazon Linux 2 AMI as a VM (.iso format)
- VMWare, KVM, VirtualBox (Oracle VM), Microsoft Hyper-V
VM Import / Export
- Migrate existing applications into EC2
- Create a DR repository strategy for your on-premise VMs
- Can export back the VMs from EC2 to on-premise
AWS Application Discovery Service
- Gather information about your on-premise servers to plan a migration
- Server utilization and dependency mappings
- Track with AWS Migration Hub
AWS Database Migration Service (DMS)
- replicate On-premise => AWS , AWS => AWS, AWS => On-premise
- Works with various database technologies (Oracle, MySQL, DynamoDB, etc..)
AWS Server Migration Service (SMS)
- Incremental replication of on-premise live servers to AWS

[SAA-C02]

AWS DataSync

Move large amount of data from on- premise to AWS
Can synchronize to: Amazon S3, Amazon EFS, Amazon FSx for Windows
Move data from your NAS or file system via NFS or SMB
Replication tasks can be scheduled hourly, daily, weekly
Leverage the DataSync agent to connect to your systems https://docs.aws.amazon.com/datasync/latest/userguide/how-datasync-works.html NFS / SMB to AWS (S3, EFS, FSx for Windows) EFS to EFS

[SAA-C02]

Transferring large amount of data into AWS

Example: transfer 200 TB of data in the cloud. We have a 100 Mbps internet connection.
Over the internet / Site-to-Site VPN:
- Immediate to setup
- Will take 200(TB)*1000(GB)*1000(MB)*8(Mb)/100 Mbps = 16,000,000s = 185d
Over direct connect 1Gbps:
- Long for the one-time setup (over a month)
- Will take 200(TB)*1000(GB)*8(Gb)/1 Gbps = 1,600,000s = 18.5d
Over Snowball:
- Will take 2 to 3 snowballs in parallel
- Takes about 1 week for the end-to-end transfer
- Can be combined with DMS
For on-going replication / transfers: Site-to-Site VPN or DX with DMS or DataSync

Disaster Recovery Quiz

Question 1: As part of your disaster recovery strategy, you would like to have only the critical systems up and running in AWS. You don’t mind a longer RTO. Which DR strategy do you recommend? A: Pilot Light

If you’re interested into reading more about disaster recovery, the whitepaper is here: https://d1.awsstatic.com/asset-repository/products/CloudEndure/CloudEndure_Affordable_Enterprise-Grade_Disaster_Recovery_Using_AWS.pdf

Question 2: You would like to get the DR strategy with the lowest RTO and RPO, regardless of the cost, which one do you recommend? A: Multi Site

Question 3: Which of the following strategies has a potentially high RPO and RTO? A: Backup and Restore

Section 23: More Solution Architectures

Extra Solution Architecture discussions - 249

[SAA-C02]

S3 Events

S3:ObjectCreated, S3:ObjectRemoved, S3:ObjectRestore, S3:Replication…
Object name filtering possible (*.jpg)
Use case: generate thumbnails of images uploaded to S3
Can create as many “S3 events” as desired
S3 event notifications typically deliver events in seconds but can sometimes take a minute or longer
If two writes are made to a single non- versioned object at the same time, it is possible that only a single event notification will be sent
If you want to ensure that an event notification is sent for every successful write, you can enable versioning on your bucket

[SAA-C02]

High Performance Computing (HPC)

The cloud is the perfect place to perform HPC
You can create a very high number of resources in no time
You can speed up time to results by adding more resources
You can pay only for the systems you have used
Perform genomics, computational chemistry, financial risk modeling, weather prediction, machine learning, deep learning, autonomous driving
Which services help perform HPC?

Data Management & Transfer

AWS Direct Connect:
- Move GB/s of data to the cloud, over a private secure network
Snowball & Snowmobile
- Move PB of data to the cloud
AWS DataSync
- Move large amount of data between on-premise and S3, EFS, FSx for Windows

[SAA-C02][*]

Compute and Networking

EC2 Enhanced Networking (SR-IOV)
- Higher bandwidth, higher PPS (packet per second), lower latency
- Option 1: Elastic Network Adapter (ENA) up to 100 Gbps
- Option 2: Intel 82599 VF up to 10 Gbps – LEGACY
Elastic Fabric Adapter (EFA)
- Improved ENA for HPC, only works for Linux
- Great for inter-node communications, tightly coupled workloads
- Leverages Message Passing Interface (MPI) standard
- Bypasses the underlying Linux OS to provide low-latency, reliable transport

[SAA-C02]

Storage

Instance-attached storage:
- EBS: scale up to 64000 IOPS with io1 Provisioned IOPS
- Instance Store: scale to millions of IOPS, linked to EC2 instance, low latency
Network storage:
- Amazon S3: large blob, not a file system
- Amazon EFS: scale IOPS based on total size, or use provisioned IOPS
- Amazon FSx for Lustre:
  - HPC optimized distributed file system, millions of IOPS
  - Backed by S3

[SAA-C02]

Automation and Orchestration

AWS Batch
- AWS Batch supports multi-node parallel jobs, which enables you to run single jobs that span multiple EC2 instances.
- Easily schedule jobs and launch EC2 instances accordingly
AWS ParallelCluster
- Open source cluster management tool to deploy HPC on AWS
- Configure with text files
- Automate creation of VPC, Subnet, cluster type and instance types

More Solution Architectures - Quiz Quiz 22

Question 1: Your Lambda function is processing events coming through S3 events and distributed through an SNS topic. You have decided to ensure that events that can not be processed are sent to a DLQ. In which service should you set up the DLQ? A: Lambda function

the invocation is asynchronous (coming from the SNS topic) so the DLQ has to be set on the Lambda side

Question 2: You have created an architecture including CloudFront with WAF, Shield, an ALB, and EC2 instances. You would like to block an IP, where should you do it? A: WAF

Question 3: Your instances are deployed in an EC2 placement group of type cluster in order to perform HPC. You would like to maximize network performance between your instances. What should you use? A: Elastic Fabric Adapar

Section 24: Other Services

Continuous Integration

Developers push the code to a code repository often (GitHub / CodeCommit / Bitbucket / etc…)
A testing / build server checks the code as soon as it’s pushed (CodeBuild / Jenkins CI / etc…)
The developer gets feedback about the tests and checks that have passed / failed
Find bugs early, fix bugs
Deliver faster as the code is tested
Deploy often
Happier developers, as they’re unblocked

Continuous Delivery

Ensure that the software can be released reliably whenever needed.
Ensures deployments happen often and are quick
Shift away from “one release every 3 months” to ”5 releases a day”
That usually means automated deployment
- CodeDeploy
- Jenkins CD
- Spinnaker
- Etc…

Infrastructure as Code

Currently, we have been doing a lot of manual work
All this manual work will be very tough to reproduce:
In another region
in another AWS account
Within the same region if everything was deleted
Wouldn’t it be great, if all our infrastructure was… code?
That code would be deployed and create / update / delete our infrastructure

What is CloudFormation

CloudFormation is a declarative way of outlining your AWS Infrastructure, for any resources (most of them are supported).
For example, within a CloudFormation template, you say:
- I want a security group
- I want two EC2 machines using this security group
- I want two Elastic IPs for these EC2 machines
- I want an S3 bucket
- I want a load balancer (ELB) in front of these machines
Then CloudFormation creates those for you, in the right order, with the exact configuration that you specify

Benefits of AWS CloudFormation (1/2)

Infrastructure as code
No resources are manually created, which is excellent for control
The code can be version controlled for example using git
Changes to the infrastructure are reviewed through code
Cost
Each resources within the stack is tagged with an identifier so you can easily see how much a stack costs you
You can estimate the costs of your resources using the CloudFormation template
Savings strategy: In Dev, you could automation deletion of templates at 5 PM and recreated at 8 AM, safely

Benefits of AWS CloudFormation (2/2)

Productivity
Ability to destroy and re-create an infrastructure on the cloud on the fly
Automated generation of Diagram for your templates!
Declarative programming (no need to figure out ordering and orchestration)
Separation of concern: create many stacks for many apps, and many layers. Ex:
VPC stacks
Network stacks
App stacks
Don’t re-invent the wheel
Leverage existing templates on the web!
Leverage the documentation

How CloudFormation Works

Templates have to be uploaded in S3 and then referenced in CloudFormation
To update a template, we can’t edit previous ones. We have to reupload a new version of the template to AWS
Stacks are identified by a name
Deleting a stack deletes every single artifact that was created by CloudFormation.

Deploying CloudFormation templates

Manual way:
Editing templates in the CloudFormation Designer
Using the console to input parameters, etc
Automated way:
Editing templates in a YAML file
Using the AWS CLI (Command Line Interface) to deploy the templates
Recommended way when you fully want to automate your flow

CloudFormation Building Blocks

Templates components

Resources: your AWS resources declared in the template (MANDATORY)
Parameters: the dynamic inputs for your template
Mappings: the static variables for your template
Outputs: References to what has been created
Conditionals: List of conditions to perform resource creation
Metadata Templates helpers:
References
Functions

Note:

This is an introduction to CloudFormation

It can take over 3 hours to properly learn and master CloudFormation
This lecture is meant so you get a good idea of how it works
The exam expects you to understand how to read CloudFormation

[SAA-C02]

CloudFormation - StackSets

Create, update, or delete stacks across multiple accounts and regions with a single operation
Administrator account to create StackSets
Trusted accounts to create, update, delete stack instances from StackSets
When you update a stack set, all associated stack instances are updated throughout all accounts and regions.

[*]

AWS ECS – Elastic Container Service

ECS is a container orchestration service
ECS helps you run Docker containers on EC2 machines
ECS is complicated, and made of:
- “ECS Core”: Running ECS on user-provisioned EC2 instances
- Fargate: Running ECS tasks on AWS-provisioned compute (serverless)
- EKS: Running ECS on AWS-powered Kubernetes (running on EC2)
- ECR: Docker Container Registry hosted by AWS
ECS & Docker are very popular for microservices
For now, for the exam, only “ECS Core” & ECR is in scope
IAM security and roles at the ECS task level

What’s Docker?

Docker is a “container technology”
Run a containerized application on any machine with Docker installed
Containers allows our application to work the same way anywhere
Containers are isolated from each other
Control how much memory / CPU is allocated to your container
Ability to restrict network rules
More efficient than Virtual machines
Scale containers up and down very quickly (seconds)

AWS ECS – Use cases

Run microservices
Ability to run multiple docker containers on the same machine
Easy service discovery features to enhance communication
Direct integration with Application Load Balancers
Auto scaling capability
Run batch processing / scheduled tasks
Schedule ECS containers to run on On-demand / Reserved / Spot instances
Migrate applications to the cloud
Dockerize legacy applications running on premise
Move Docker containers to run on ECS

[*]

AWS ECS – ALB integration

Application Load Balancer (ALB) has a direct integration feature with ECS called “port mapping”
This allows you to run multiple instances of the same application on the same EC2 machine
Use cases:
- Increased resiliency even if runningon one EC2 instance
- Maximize utilization of CPU / cores
- Ability to perform rolling upgrades without impacting application uptime

AWS ECS – ECS Setup & Config file

Run an EC2 instance, install the ECS agent with ECS config file
Or use an ECS-ready Linux AMI (still need to modify config file)
ECS Config file is at /etc/ecs/ecs.config ECS_CLUSTER= ECS_ENGINE_AUTH_dATA= ECS_AVAILABLE_LOGGING_DRIVERS= ECS_ENABLE_TASK_IAM_ROLE=true

[SAA-C02]

ECS - IAM Task Roles

The EC2 instance should have an IAM role allowing it to access the ECS service (for the ECS agent)
Each ECS task should have an ECS IAM task role to perform their API calls
Use the “taskRoleArn” parameter in a task definition

[SAA-C02]

Fargate

• When launching an ECS Cluster, we have to create our EC2 instances • If we need to scale, we need to add EC2 instances • So we manage infrastructure…

• With Fargate, it’s all Serverless! • We don’t provision EC2 instances • We just create task definitions, and AWS will run our containers for us • To scale, just increase the task number. Simple! No more EC2 J

[SAA-C02]

Amazon EKS Overview

• Amazon EKS standards for Amazon Elastic Kubernetes Service • It is a way to launch managed Kubernetes clusters on AWS • Kubernetes is an open-source system for automatic deployment, scaling and management of containerized (usually Docker) application • It’s an alternative to ECS, similar goal but different API • EKS supports EC2 if you want to to deploy worker nodes or Fargate to deploy serverless containers • Use case: if your company is already using Kubernetes on-premises or in another cloud, and wants to migrate to AWS using Kubernetes

AWS Step Functions

• Build serverless visual workflow to orchestrate your Lambda functions • Represent flow as a JSON state machine • Features: sequence, parallel, conditions, timeouts, error handling… • Can also integrate with EC2, ECS, On premise servers, API Gateway • Maximum execution time of 1 year • Possibility to implement human approval feature • Use cases: • Order fulfillment • Data processing • Web applications • Any workflow

Quick word on Chef / Puppet

• They help with managing configuration as code • Helps in having consistent deployments • Works with Linux / Windows • Can automate: user accounts, cron, ntp, packages, services…

• They leverage “Recipes” or ”Manifests”

• Chef / Puppet have similarities with SSM / Beanstalk / CloudFormation but they’re open-source tools that work cross-cloud

Other Services: Cheat Sheet

Here’s a quick cheat-sheet to remember all these services:

CodeCommit: service where you can store your code. Similar service is GitHub

CodeBuild: build and testing service in your CICD pipelines

CodeDeploy: deploy the packaged code onto EC2 and AWS Lambda

CodePipeline: orchestrate the actions of your CICD pipelines (build stages, manual approvals, many deploys, etc)

CloudFormation: Infrastructure as Code for AWS. Declarative way to manage, create and update resources.

ECS (Elastic Container Service): Docker container management system on AWS. Helps with creating micro-services.

ECR (Elastic Container Registry): Docker images repository on AWS. Docker Images can be pushed and pulled from there

Step Functions: Orchestrate / Coordinate Lambda functions and ECS containers into a workflow

SWF (Simple Workflow Service): Old way of orchestrating a big workflow.

EMR (Elastic Map Reduce): Big Data / Hadoop / Spark clusters on AWS, deployed on EC2 for you

Glue: ETL (Extract Transform Load) service on AWS

OpsWorks: managed Chef & Puppet on AWS

ElasticTranscoder: managed media (video, music) converter service into various optimized formats

Organizations: hierarchy and centralized management of multiple AWS accounts

Workspaces: Virtual Desktop on Demand in the Cloud. Replaces traditional on-premise VDI infrastructure

AppSync: GraphQL as a service on AWS

SSO (Single Sign On): One login managed by AWS to log in to various business SAML 2.0-compatible applications (office 365 etc)

Other Services: Quiz

Quiz 23|16 questions

Question 1: You are looking for a service to store docker images in AWS. Which one do you recommend? A: ECR

Amazon Elastic Container Registry (ECR) is a fully-managed Docker container registry that makes it easy for developers to store, manage, and deploy Docker containers

Question 2: You would like to find a managed-service in AWS alternative to GitLab, in order to version control your code entirely in AWS. Which technology do you recommend? A: CodeCommit

CodeCommit is used to store and version control your code and as such, it’s an alternative to GitLab and GitHub

Question 3: As part of your disaster recovery strategy, you would like to make sure your entire infrastructure is code, so that you can easily re-deploy it in any region. Which service do you recommend? A: CloudFormation

CloudFormation is the de-facto service in AWS for infrastructure as code.

Question 4: You need to manage a fleet of Docker containers in the cloud, which service do you recommend? A: ECS

ECS is a container orchestrator service and the correct service to manage a fleet of Docker containers in the cloud

Question 5: You would like to orchestrate your CICD pipeline to deliver all the way to Elastic Beanstalk. Which service do you recommend? A: CodePipeline

CodePipeline is a CICD orchestration service, and has an integration with Elastic Beanstalk

Question 6: You need to deploy your code to a fleet of EC2 instances with a specific strategy. Which technology do you recommend? A: CodeDeploy

When deploying code directly onto EC2 instances or On Premise servers, CodeDeploy is the service to use. You can define the strategy (how fast the rollout of the new code should be)

Question 7: You have a Jenkins CI build server hosted on premise and you would like to de-commission it and replace it by a managed service on AWS. Which service do you recommend? A: CodeBuild

CodeBuild is an alternative to Jenkins

Question 8: You need to orchestrate a series of AWS Lambda function into a workflow. Which service do you recommend? A: Step Functions

Question 9: You are looking to create an Hadoop cluster to perform Big Data Analysis. Which service do you recommend on using? A: EMR (Elastic MapReduce)

EMR is the AWS way of creating an Hadoop cluster with the tools of your choosing.

Question 10: You are looking to move data all around your AWS databases using a managed ETL service that has a metadata catalog feature. Which one do you recommend? A: Glue

Glue is an ETL service

Question 11: Your company is already using Chef recipes to manage its infrastructure. You would like to move to the AWS cloud and keep on using Chef. What service do you recommend? A: OpsWorks

Question 12: You work for a consulting company which has recently decided to create video training content for their clients. They would like to view the videos on different devices such as iPhone, iPad, Web browsers. Which service do you recommend to convert the videos? A: Elastic Transcoder

Question 13: Your organization would like to create various accounts to physically separate their dev, test and production environments. Your IT lead would still like to manage these environments centrally from a billing purposes, in order for management to be simple. Which service do you recommend? A: Organizations

AWS Organizations allow you to create multiple AWS accounts and centralize them around a single organization for simplified and unified billing.

Question 14: You have a VDI (Virtual Desktop Infrastructure) on premise and as a solution architect, you would like to optimize maintenance and management cost by switching to virtual desktops on the AWS Cloud. Which service do you recommend? A: Workspaces

Amazon WorkSpaces is a managed, secure cloud desktop service. You can use Amazon WorkSpaces to provision either Windows or Linux desktops

Question 15: Your developers are creating a mobile application and would like to have a managed GraphQL backend. Which service do you recommend? A: AppSync

Question 16: You are deploying your application on an ECS cluster made of EC2 instances. The cluster is hosting one application that has been issuing API calls to DynamoDB successfully. Upon adding a second application, which issues API calls to S3, you are getting authorization issues. What should you do to resolve the problem and ensure proper security? A: Create an IAM task role for the new application

Section 25: WhitePapers and Architectures - AWS Certified Solutions Architect Associate

Well Architected Framework General Guiding Principles

• Stop guessing your capacity needs • Test systems at production scale • Automate to make architectural experimentation easier • Allow for evolutionary architectures • Design based on changing requirements • Drive architectures using data • Improve through game days • Simulate applications for flash sale days

Well Architected Framework

5 Pillars • 1) Operational Excellence • 2) Security • 3) Reliability • 4) Performance Efficiency • 5) Cost Optimization • They are not something to balance, or trade-offs, they’re a synergy

Well Architected Framework

• It’s also questions! • Let’s look into the Well-Architected Tool • https://console.aws.amazon.com/wellarchitected AWS Well-Architected Tool

1) Operational Excellence

• Includes the ability to run and monitor systems to deliver business value and to continually improve supporting processes and procedures • Design Principles • Perform operations as code - Infrastructure as code • Annotate documentation - Automate the creation of annotated documentation after every build • Make frequent, small, reversible changes - So that in case of any failure, you can reverse it • Refine operations procedures frequently - And ensure that team members are familiar with it • Anticipate failure • Learn from all operational failures

WhitePaper Quiz Quiz 24|1 question

Question 1: You would like to get AWS recommendations on actual potential cost savings, performance, service limits improvements amongst other things. Which service do you recommend? A: Trusted Advisor

aws configure –profile

aws configure set default.s3.signature_version s3v4

Not working

Lecture 100

UTILS UTLS

AWS Policy Generator

https://awspolicygen.s3.amazonaws.com/policygen.html

Summary of concepts for AWS SysOps Administrator Certification.

CloudWatch

AWS CloudWatch Metrics

CloudWatch Custom Metrics

CloudWatch Dashboards

CloudWatch Logs Subscriptions

Alarms

Alarm Targets - 👀 EXAM

EC2 Instance Recovery

CloudWatch Synthetics

Reference

Amazon EventBridge (formerly CloudWatch Events)

Service Quotas CloudWatch Alarms

Alternative: Trusted Advisor CW Alarms

👀 For each production EC2 instance, create an Amazon CloudWatch alarm for Status Check Failed: System. Set the alarm action to recover the EC2 instance. Configure the alarm notification to be published to an Amazon Simple Notification Service (Amazon SNS) topic.

Status Check

Status Checks - CW Metrics & Recovery - 👀 EXAM

Determine which instance use the most bandwidth

Identify the processing power required

Number of users.

RAMUtilization is NOT available as an EC2 metric

5xx server errors

4xxx

Events

EBS Snapshots

Filters - QUESTION

Agents

Install Agents to track the state of each of the instances

Publish custom metrics to CloudWatch.

Collect process metrics with the procstat plugin

Dashboard Body Structure and Syntax - EXAM

CloudTrail

CloudTrail Insights

CloudTrail - Integration with EventBridge

CloudTrail - Organizations Trails

CloudTrail - Log File Integrity Validation

Q. To ensure that SysOps administrators can easily verify that the CloudTrail log files have not been deleted or changed, the following action should be taken:

Cloud Trail - Integration with EventBridge AWS CloudTrail

CloudTrail - Organizations Trails

👀 AWS Config

Config Rules

Config Rules - Remediations

AWS Config Auto Remediation

Config Rules - Notifications

AWS Config - Aggregators

CloudWatch vs CloudTrail vs Config

AWS Task Orchestrator and Executor (AWSTOE) - 👀 EXAM

`AWS Artifact - 👀 EXAM

RDS

Advantage over using RDS versus deploying

RDS Read Replicas for read scalability

RDS Read Replicas - Network Cost

RDS Multi AZ (Disaster Recovery)

Lambda in VPC

RDS Proxy for AWS Lambda

DB Parameter Groups

RDS Events & Event Subscriptions

RDS with CloudWatch

RDS storage autoscaling - 👀 EXAM

Amazon Aurora DB

Aurora High Availability and Read Scaling

RDS & Aurora Security

Aurora for SysOps

Connect to Amazon Aurora DB cluster from outside a VPC

Aurora Replicas - TODO

Metrics to generate reports on the Aurora DB Cluster and its replicas

Aurora Reader Endpoint - 👀 EXAM

Amazon ElastiCache Overview

ElastiCache Replication (Redis): Cluster Mode Disabled

ElastiCache Replication: Cluster Mode Enabled

Memcached

Fix high Memcached evictions

VPC

Configuration

CIDR - IPv4

Public vs. Private IP (IPv4)

VPC in AWS - IPv4

VPC - Subnet (IPv4)

Internet Gateway (IGW)

Bastion Hosts

`👀` For each production EC2 instance, create an `Amazon CloudWatch alarm` for Status `Check Failed: System`. Set the alarm action to `recover the EC2 instance`. Configure the alarm notification to be published to an Amazon Simple Notification Service (Amazon SNS) topic.

Status Checks - CW Metrics & Recovery - `👀 EXAM`

RAMUtilization is `NOT available as` an `EC2 metric`

Collect process metrics with the `procstat` plugin

`👀` AWS Config

RDS storage autoscaling - `👀` EXAM

`Use` AWS CloudFormation `StackSets for Multiple Accounts in an AWS Organization`:

`QUESTION` To lunch the last AMI.

UpdatePolicy attribute - `👀 EXAM`

👀 `DependsOn` attribute

`👀 QUESTIONUESTION`

`👀 QUESTIONUESTION` For the production account, a SysOps administrator must ensure that all data is backed up daily for all current and future Amazon EC2 instances and Amazon Elastic File System (Amazon EFS) file systems. Backups must be retained for 30 days

AWS Backup does not reboot EC2 instances - `👀 QUESTIONUESTION`