ruk·si

☁️ AWS
CloudWatch

Updated at 2016-06-17 12:34

CloudWatch allows logging custom events. Compared to other logging services, CloudWatch is the cheapest and most customizable by far. Downside is that you need to setup it yourself. When paired up with Sentry that handles email error notifications, I've found it to be really good match for most of my project.

Set retention time for less important logs. CloudWatch > Logs > Click column value at "Expire Events After". I use a week for development and staging logs.

You can read logs with AWS CLI. For frequent use, it's better to use a library though.

aws logs get-log-events \
    --region us-west-1 \
    --log-group-name my-app-and-environment \
    --log-stream-name i-e337c525_my-process_92133c94-0889-47e0-9b5e-2c5aa2a81a1e

# To get more events, use the token from the previous get
aws logs get-log-events \
    --region us-west-1 \
    --log-group-name my-app-and-environment \
    --log-stream-name i-e337c525_my-process_92133c94-0889-47e0-9b5e-2c5aa2a81a1e \
    --next-token f/32616107797712701555271248484522718302696966429774905344

# You can search the logs
aws logs filter-log-events \
    --region us-west-1 \
    --log-group-name my-app-and-environment \
    --filter-pattern "[level != INFO, ...]"

You can export logs to S3 using AWS CLI.

You can stream logs to Elasticsearch for faster global search. You need to create an interface for it though.

You can use logs as real-time data. Logs can be directed to Kinesis or Lambda through by setting a subscription. Kinesis can direct to S3 and Redshift.

You use filters to turn logs into CloudWatch metrics. You can create graphs and alarms from metrics, but not from logs.

You can create dashboards from a group of predefined graphs. Allows creating an overview to your resources, CloudWatch > Dashboards.

Filters don't retroactively filter data. The filter match is checked when the log event is created. Log events that were recorded before a filter was created won't be show through the filter.

Filter has:

  • Filter Pattern: How to interpret each log event e.g. which part is IP or which is the error level.
  • Metric Value: What to publish to the metric e.g. if doing error count, this is 1 for each occurrence or if doing bandwidth metric, take byte count inside a log message.
  • Metric Name: Pass filtered data to which metric e.g. ErrorCount or CPUUtilization.
  • Metric Namespace: Grouping for metrics, for reference here are some that AWS uses AWS/SNS, AWS/EC2 and AWS/AutoScaling.

Simple filter pattern syntax:

  • ERROR matches:
    • [ERROR] A fatal exception has occurred
    • Exiting with ERRORCODE: -1
  • ERROR Exception matches:
    • [ERROR] Caught IllegalArgumentException
    • [ERROR] Uncaught Exception
  • Failed to process the request matches:
    • [WARN] Failed to process the request
    • [ERROR] Unable to continue: Failed to process the request
  • Note that everything other than alphanumeric and underscore must be placed in double quotes ("") e.g. "[ERROR]".

Space-delimited filter pattern syntax:

  • [level != "INFO", ...]
    • [WARN] Failed to process the request
    • [ERROR] Uncaught Exception
  • [..., bytes > 1000, ip = "127.0.0.1"]
    • [WARN] Failed to process the request 2445 127.0.0.1

Consider using JSON as your logging format. If your logging events usually have more information than just a message, consider logging your events in JSON format for easier searching and filtering.

In the simplest form:
{ "level":"info","message": "Something happened" }
{ "level":"error","message": "Something bad happened", "related": { "id": 10 } }

JSON filter pattern syntax:

If we have the following log event:

{
  "eventType": "UpdateTrail",
  "sourceIpAddress": "111.111.111.111",
  "arrayKey": [
        "value",
        "another value"
  ],
  "objectList": [
       {
         "name": "a",
         "id": 1
       },
       {
         "name": "b",
         "id": 2
       }
  ],
  "SomeObject": null,
  "ThisFlag": true
}

All of the following will match the event:

{ $.eventType = "UpdateTrail" }         # equality
{ $.sourceIpAddress != 123.123.* }      # wildcard
{ $.arrayKey[0] = "value" }             # array check, false if not array
{ $.objectList[1].id = 2 }              # object check, false if not object
{ $.objectList[1].id > 1 }              # numerical comparisons
{ $.SomeOtherObject NOT EXISTS }        # existence
{ $.SomeObject IS NULL }                # checks for non-string null
{ $.ThisFlag IS TRUE }                  # checks for non-string boolean
{ ($.eventType = "UpdateTrail") && ($.arrayKey[0] = "value") } # and
{ ($.eventType = "UpdateTrail") || ($.ThisFlag IS FALSE) }     # or

You can also use filter pattern syntaxes when searching CloudWatch Logs.

You can create metric filters through web console or CLI. Web console version is under CloudWatch > Logs > select groups > Create Metric Filter. The web interface is better as it has a testing phase where you can try out the filter, but here is how to do it through CLI.

aws logs put-metric-filter \
  --log-group-name MyApp/message.log \
  --filter-name MyAppErrorCount \
  --filter-pattern 'Error' \
  --metric-transformations \
    metricName=EventCount,metricNamespace=YourNamespace,metricValue=1

aws logs put-log-events \
  --log-group-name MyApp/access.log --log-stream-name TestStream1 \
  --log-events \
    timestamp=1394793518000,message="This message contains an Error" \
    timestamp=1394793528000,message="This message also contains an Error"

You can turn numerical values in JSON to metric values.

{ "latency": 50, "requestType": "GET" }
cli command...,metricFilter="{ $.latency = * }",metricValue=$.latency
=> Publishes a metric data point with the value of 50.

CloudWatch alarm consists of:

  • A metric that it monitors e.g. CPU usage or health check.
  • A rule defining when it triggers:
    • Period: how frequently is the metric checked.
    • EvaluationPeriods: number of periods over which the data is compared.
    • Statistic: function applied on the period values e.g. Minimum.
    • ComparisonOperator: how is comparison done e.g. GreaterThanThreshold.
    • Threshold: threshold for triggering the alarm.
  • Actions to execute when it triggers e.g. recover EC2 instance or send email.
  • The current state; OK, INSUFFICIENT_DATA or ALARM.

Sources