☁️ AWS Guide

Updated at 2018-12-02 20:42

This note is an overview of the most important aspects of Amazon Web Services. AWS is a collection of 40+ products (online services) offered by Amazon and most of them are billed by usage.

Product Overview

AWS products can be divided into three classes by the how much control they give related to how easy they are to use. Low-level products give more control, but are harder to maintain. Medium-level products offer balance between control and ease-of-access, usually utilizing lower level AWS products. High-level offer less control, but are a lot easier to maintain, always utilizing lower level AWS products.

ElasticBeanstalk = ELB + EC2 + CodeDeploy + RDB/DynamoDB

Low-level AWS products:

IAM: users, roles and permissions for developers and applications.
VPC: private networks for servers.
EC2: virtual servers, the meat of AWS.
S3: file storages that are accessed frequently.
Glacier: file storages that are read rarely like backups and logs. Cheap to storage, costly to read.
Route53: domain name registration, DNS and health checks.
SES: sending emails, good for app emails, bad for marketing.
SQS: storing messages to queues.
CloudWatch: monitoring and alerts for AWS products. Or logging.
CloudTrail: logging for AWS API calls made.
Config: configuration and maintenance of other AWS products.
__API Gateway:__proxy APIs through these for better management.

Mid-level AWS products:

CodeCommit: hosted git repositories like GitHub or GitLab but less web interface. Used only for automation, not collaboration.
CodeDeploy: automated code deployments from GitHub or CodeCommit, like mini-Heroku.
CodePipeline: run automated tests on your code, like mini-Jenkins.
EC2 Container Service: run Docker images on EC2 instances.
OpsWorks: handle running your application with things like auto-scaling. The closest thing to Heroku that AWS has. OpsWorks is harder to setup, but allows much more configuration.
CloudFormation: creating templates for AWS configurations like how many instances, types, auto scaling groups etc.
Lake Formation: creating templates for data lake services.
CloudFront: content delivery service (CDN). Serving and streaming files from multiple geolocations. Should be used by all web applications.
CloudSearch: allows querying data in S3 or in multiple databases.
Data Pipeline: extract and transform data from elsewhere in AWS to S3 or databases.
SNS: send mobile push notifications, emails and SMS messages, like Twilio. Targeted notifications by country or device are available.
RDS: provides hosted relational databases like PostgreSQL.
Aurora: provides hosted MySQL/PostgreSQL-compatible relational database.
Aurora Global Database: Aurora replicated over multiple AWS regions.
Neptune: provides hosted graph database.
DynamoDB: provides a hosted NoSQL database, similar to MongoDB.
ElastiCache: in-memory caches, similar to Redis or Memcached.
Redshift: data warehouses for big data.
Athena: allows making SQL queries against your S3 data.
Managed Blockchain: make blockchain networks (Hyperledger Fabric, Ethereum)
App Mesh: allows monitoring and managing microservices.

High-level AWS products:

Lightsail: simple way to start an open source web application on AWS e.g WordPress, Drupal or Magento.
Lambda: very high-level backend for stateless APIs, usually Node.
Elastic Beanstalk: high-level backend for web applications, like Heroku, cheaper but stripped down and harder to setup.
Landing Zone: helps to set up secure multi-account AWS environments.
Control Tower: automates to set up secure multi-account AWS environments.
Security Hub: comprehensive view of security alerts and compliance status across AWS accounts.
Cognito: high-level auth backend for mobile apps, OAuth as a Service with optional 20M metadata for the user. Provides stuff like Facebook auth.
Device Farm: allows testing your app in various Android devices.
Mobile Analytics: tracks what your mobile app users are doing, basic integration provides a lot of metrics out of the box like MAU, DAU, New Users, Revenue and Retention. Can easily be piped to Redshift for analysis.
Directory Service: allows connecting to a Windows Active Directory with AWS.
WorkSpaces: virtual Windows desktops.
AppStream: high-level backend for applications with streamed data.
Timestream: time series database service for IoT and industrial applications.
Quantum Ledger Database: ledger database for immutable transaction logs.
Elastic Map Reduce: big data analysis, analyze massive text files from S3.
Elastic Transcoder: convert and package digital media to other formats.
Textract: extract data from scanned documents.
Comprehend: natural language processing service to find information and relationships in text.
Machine Learning: predict future behavior from existing data.
Forecast: time-series forecasting service.
Personalize: recommendation and personalization generation service.
Elastic Inference: attach GPUs to machines to boost deep learning inference.
Inferentia: machine learning inference chip, similar to Google TPUs.
SageMaker: Jupyter Notebook hosting service for data scientists. Also include some workload management.
SageMaker Neo: compile a trained neural model to an optimized executable.
SageMaker Ground Truth: human labeler access, workflow and management.

Many of the services have "subservices."

EC2 => Security Group (SG), Elastic Load Balancer (ELB), Elastic IP (EIP).

Services are global, regional or zonal. Tied to account (global), tied to region (regional) or tied to availability zone (zonal). Global services are more fault-tolerant by default.

Global services:   IAM, Route 53, CloudFront
Regional services: S3, DynamoDB, VPC
Zonal services:    EC2, RDS

Many services are high availability by default. Resource recovers after a slight downtime.

RDS (with Multi-AZ, goes down with the zone)
VPC subnet (goes down with the zone)
EBS (goes down with the zone)
EC2 (with alarm recovery, goes down with the zone)
ELB (on one zone)

Some services are fault-tolerant by default. Resource is duplicated so that a single failure doesn't bring down the service.

ELB (on at least two zones), VPC, S3, DynamoDB, ASG, SQS

Architecture Design

Web Application Tiers When designing your AWS architecture, it is good to divide your solution in these tier and see if you have everything covered. Most of the tiers can be covered by a single service e.g. you can have all of the tiers covered by a single EC2 instance (not recommended though).

Load Balancing Tier: Divides traffic to multiple web servers, enabling horizontal scaling. Elastic Load Balancer or EC2 with nginx.
Web Server Tier: Serves files and routes traffic to right endpoint. EC2 with nginx.
App Tier: Handles the main computational process. EC2 with the application.
Cache Tier: Stores commonly requested data for faster access e.g. user session. EC2 with Redis or ElastiCache.
Database Tier: Stores all persistent data e.g. user profile. EC2 with PostgreSQL or RDS.
Other Tiers: Optional but recommended tiers e.g. single server to maintain secure connections to instances or to offer continuous integration for your application.

The most common AWS usage pattern is:

Route53
    CloudFront -> S3
    ELB        -> 2xEC2 -> 2xRDS+

If you are going to do load testing on your AWS resources, remember to notify AWS first:

http://aws.amazon.com/aup/
http://aws.amazon.com/security/penetration-testing/

Load testing tools:

siege (Siege), found in siege package
- siege -c50 -d10 -t3M http://example.com
ab (Apache Bench), found in httpd-tools package
- ab -n 10000 -c 2 http://example.com

Tagging

Use consistent naming scheme for AWS resources:

S3 buckets and files are dash-delimited e.g. my-bucket, my-pic.jpg
Tag keys are colon-dash:delimited e.g. ruksi:env or company-name:app
Tag values depend on the content, but I try to keep then PascalCase.
Everything else in PascalCase e.g. MainInstance, HulkRole, IAMUserSSHKeys vs AWSCodeCommitFullAccess
Keep abbreviations in uppercase e.g. VPNInstance, HTTPRouter, JenkinsSG

Pascal case is better because AWS uses dash as delimiter so sometimes you get situations where you are not sure where identifier ends. Pascal case also aligns with AWS type notation like AWS::CloudFormation::Stack. S3 should be dash-limited because they are frequently accessed through HTTP and it's standard to use dash-limited URLs.

# is this Arcana stacks VPN Instance Profile or Instance Profile of Arcana VPN?
arcana-vpn-instance-profile-V41XQFCSVPIK

# ah, got it, Instance Profile of Arcana VPN stack
ArcanaVPN-InstanceRole-V41XQFCSVPIK

Always tag your resources. Without tagging, it will become impossible to find out from where the resource came from. Especially remember to add tags to your CloudFormation templates. Becomes very important when trying to make sense of your bills.

Some examples:
Key         Value Example #1    Value Example #2            Value Example #3
system      eng-blog-proto      eng-blog-staging-alpha      eng-blog-production
infra       holystore-demo      holystore-staging-pluto     holystore-production
apparatus   gurgle-proto        gurgle-staging-hermes       gurgle-production

Hints:
- if possible, the first part e.g. `gurgle` should be the CF template name
- I personally use more rare words like apparatus as AWS also adds some tags
- don't name your staging just "staging", you might have multiple in the future,
  you can use Greek alphabet, planets, gods, animals, plants; take your pick.

Tagging allows creating resource groups. You can create resource groups through the header navigation AWS > Create Resource Group. Resources in the same resource group can be easily navigated using a single interface that will hide all unrelated resources. This is very important for debugging and understanding the infrastructure.

Have consistent resource group naming:
eng-blog-rg
holystore-rg
gurgle-rg

Tips

Plan your systems for scale. Everything should be scalable by adding an extra instance, even if requires some additional work.

Don't manage your infrastructure manually. Use CloudFormation template to define your infrastructure. Updating templates will also change the running stack.

https://github.com/AWSinAction/code/blob/master/chapter2/template.json

Avoid elastic IPs. Use servers behind a load balancer and balance them between availability zones.

Always launch your EC2 instances in an auto scaling group. Even if it's a single instance. Auto scaling groups provide:

Health checks
Automatic termination and recreation if unhealthy
Warnings on metrics like CPU usage
Potency for auto scaling later
Logically groups virtual machines together

Never save application state in virtual machines. Use a remote database and S3 for files. Killing and bringing up virtual servers is common.

Never save logs in virtual machines. Use syslog or similar to send logs to a remote logging server.

Logs should be extra verbose. Add process id, timestamp, message, instance id, region, availability zone and environment. After a crash, the instance should be cleared so there is no way to inspect the cause any further than the logs, which is totally fine.

Periodically check your CloudWatch reports. CPU usage is usually nice to keep track of. Allows reducing EC2 instance tiers if underutilized.

Monitor how your systems works for your end-users. Don't keep your eyes peeled on CPU/GPU/memory, focus on live recording/logs how users/applications are using your services.

Use Amazon SDKs. Easy, fast and secure way to integrate other Amazon services e.g. S3 with your EC2.

Your aim is that you shouldn't have to SSH to your instance servers. If you have to access with SSH, you automation has failed. Even OS updates should be handled automatically.

Set smarter billing alerts.

1-week allowance alert: $1000 per month
2-week allowance alert: $2000 per month
3-week allowance alert: $3000 per month
If you 2-week alarm goes off before 15th, you know something is happening.

Use - instead of . in S3 bucket names for SSL. .s will give you certificate errors.

Use termination protection for non-auto-scaling EC2 instances. Stops anyone from accidentally deleting the instance.

Utilize your free health checks. Route 53 also has health checks similar to Pingdom, but they are free and you won't get a single marketing email.

First 50 inter-AWS basic checks are free.
External checks cost $0.75 / month per endpoint.
Additional features like HTTPS check, robust checking and string search are +$2 / month.
Supports IP, port and URL with path checks.
Checks are made once every ~3 seconds by default.
You can set notifications. First 1 000 emails are free, then they are $2.00 per 100,000. First 100 SMS are free, then they are $0.75 per 100.

Use Trusted Advisor. TA finds obvious performance problems, excessive permissions.

Enable weekly emails from Trusted Advisor. Check and fix issues reported by Trusted Advisor. Cost optimizations, performance issues, security issues and best practices.

Sources

AWS Tips I Wish I'd Known Before I Started
A Comprehensive Guide to Building a Scalable Web App on Amazon Web Services - Part 1
AWS in Plain English
5 AWS mistakes you should avoid
AWS in Action, Michael Wittig and Andreas Wittig
AWS re:invent newsletters