What should I consider when choosing a primary key for an Amazon DynamoDB table? Show
There are two types of primary keys in DynamoDB:
No matter what type of primary key you choose, the primary key must be unique for each item in the table. Failure to choose an appropriate primary key can lead to uneven data distribution and hot keys, which might cause throttling (ProvisionedThroughputExceededException).
Use one or more of the following strategies when choosing a primary key:
For detailed information about each of these strategies, see Choosing the right DynamoDB partition key. A primary key is a column or a combination of columns that uniquely defines each row in a table. When designing a database choosing a primary key is one of the most important steps. A table can contain one or more columns suitable to be a primary key and that column or columns are called a candidate key. Only one candidate key can be selected as a primary key There are a couple of rules to follow when choosing a primary key for a table: all records in a primary key must be unique, a primary key cannot contain NULL values, a composite primary key cannot exceed 16 columns and a key length of 900 bytes, primary key values shouldn’t be changed There are two types of a primary key – a natural key and a surrogate key and there is a lot of debating whether to choose one or another as a primary key A natural keyA natural key, otherwise called an intelligent or a domain key, is a key with a business value and logically related to a table, meaning that the data of a natural key exists in nature. If a more than one column is defined as a primary key on a table it is called a composite primary key. For example, the Customer table has a composite primary key combined from the FirstName, the LastName, and the Email columns:
Pros for using a natural key as a primary key
Cons for using a natural key as a primary key
In the following example a primary key is combined from three columns to achieve uniqueness of a primary key If the business logic changes, you’ll need to change all references to an existing primary key. Some examples are the change in ISBN (International Standard Book Number) from the 10 digit universal identifier into the 13 digit identifier. In the case of using SSN (Social Security Number) as a primary key you need to consider a possibility that SSNs may be reused after a person’s death and also that in cases of fraud, or an identity thefts an individual will get a new SSN.In the following example if a Customer’s SSN changes that change will have to be reflected in the two tables that reference the Customer table The record cannot be entered into a table until the value of a primary key is known. A surrogate keyA surrogate key is a unique number generated by SQL Server or a database itself, and has no business logic. In SQL Server it is most often used as an IDENTITY column or a Globally Unique Identifiers (GUID) column – a globally unique 128 bit long data type. Pros for using a surrogate key as a primary key
Cons for using a surrogate key as a primary key
Performance testThe CPU time needed to parse and compile the JOIN statement with varchar data type columns and a composite primary key:
The parse and compile time of the JOIN statement with an integer data type primary key:
Refactoring an existing primary keyIn a situation when one key is already implemented on a table, but another type of a primary key is more suitable you can implement database refactorings and introduce a surrogate key to replace an existing natural key with, or replace an existing surrogate key with a natural key Choosing a primary key by replacing a natural key with a surrogate key is a method mostly used to reduce coupling between a database schema and external applications in cases when an existing natural key may change. Also, a large natural key may reduce performance The opposite database refactoring method of introducing a surrogate key to a table is the replacing an existing surrogate key with a natural key. Motivation for using this refactoring is mostly to maintain a key strategy or to remove unnecessary keys – sometimes a surrogate key column is introduced to a table when it actually wasn’t needed Regardless of a chosen primary key type for your database tables you should choose a single strategy and be consistent in applying it throughout your database. By refactoring your database to consolidate key strategy you can achieve code consistency because having a variety of keys your code to access a database is also implemented in various ways which increases the maintenance of code having your developers to follow all different approaches. Also, if your company has a corporate standard for a preferred key strategy you may discover that your schema doesn’t comply with the rules, so you’ll need to refactor your primary keys and implement the consolidate key strategy refactoring Useful resources:SQL by Design: How to Choose a Primary Key |