☁️ AWS - 🌀 DynamoDB
DynamoDB is essentially a fully managed and infinite key-value store. Historically DynamoDB was used at Amazon internally before they released it as a service. Has tables, rows and columns but very different from relational databases like PostgresSQL. Rows are just called items and columns are called attributes.
DynamoDB works well if both of the following are true:
- You are retrieving individual records based on specific key lookups.
- Your data will reach billions of records or hundreds of gigabytes. Or you are working on some personal or prototype project.
Prefix your tables with the name of your application. All DynamoDB tables are in the name global namespace.
__You cannot change table name or the primary key after table has been created. __ You'll need to create a new table for that. Primary key is usually dubbed as key-schema
, which can contain one or more attributes that define the primary key.
DynamoDB has two kinds of indexes; hash and hash + range. Hash is basic identifier for a single item. Hash + range allows getting multiple items that are grouped by the hash but differentiated by range integer.
Secondary indexes are allowed. Secondary indexes improve performance but increase the price because of the increased storage requirement.
Filter and scan operations allow looking for unindexed attributes. This is possible by should be avoided as it is slow because it requires to the process to read each item individually. It' flexible but not efficient.
DynamoDB is not comparable with other databases. You will always have to write code or use a DynamoDB specific plugin to use it as the interaction is very different from it's competitors.
Each DynamoDB partition is 10GB. After that data gets divided into chunks on different machines.
You can listen for changes with DynamoDB streams. Allows listener to act based on any create, update or deletion of items.
DynamoDB is eventually consistent. IF you update an item, next read might still get the old data. You can use --consistent-read
to overwrite that but reads done this way have half the performance.
You specify throughput for a DynamoDB. By defining read and write capacity units, you control how fast the database is. No operation are lost, they just take longer. You can optimize your throughput by following the CloudWatch metrics.
You can configure write and read capacity only on table level. You can't configure per partition, thus your capacity gets divided by the partitions, thus your actual capacity depends how big the table is.
You allocate 100 write capacity units to a table.
=> If it has 4 partitions, each partition will have 25 write units.
=> If your partitions have different load, you'll have to overprovision.
Dealing with hot (frequently read/written) keys can be a hassle. Because capacity is divided by partition, you need to be extra careful how your partition the data. If the most of you data goes to a single partition, scaling up the capacity will be very costly.
Partition Key:
Status code where some codes are more common than others = Bad
Status code where all codes are equally common = Good
User ID where some are more active than others = Bad
User ID where all users are equally active = Good
Device ID when some of devices are far more popular reads = Bad
Device ID when each key is read at similar intervals = Good
Date e.g. 2014-07-09 = Bad
Date with random number appended e.g. 2014-07-09.146 = Good
Sources
- AWS in Action, Michael Wittig and Andreas Wittig
- Why Amazon DynamoDB isn’t for everyone