Select whether the statement is true or false. a primary key is always made up of only one field.

What should I consider when choosing a primary key for an Amazon DynamoDB table?

There are two types of primary keys in DynamoDB:

  • Partition key: This is a simple primary key. If the table has only a partition key, then no two items can have the same partition key value.
  • Composite primary key: This is a combination of partition key and sort key. If the table has a composite primary key, then two items might have the same partition key value. However, those items must have different sort key values.

No matter what type of primary key you choose, the primary key must be unique for each item in the table. Failure to choose an appropriate primary key can lead to uneven data distribution and hot keys, which might cause throttling (ProvisionedThroughputExceededException).

Use one or more of the following strategies when choosing a primary key:

  • Use a high-cardinality attribute: The partition key should be an attribute that has unique values for each item, such as user ID, email, or phone number.
  • Use composite attributes: Combine multiple attributes to form a primary key. For example, you might have an "orders" table that combines the customer ID, product ID, and country code for the partition key, and then uses the order date as the sort key. This increases the likelihood that each item has a unique primary key.
  • Add random numbers to your partition key: For write-heavy use cases, consider adding random numbers from a predetermined range to your partition key. This increase the randomness of the partition key. For more information, see Using write sharding to distribute workloads evenly.

For detailed information about each of these strategies, see Choosing the right DynamoDB partition key.

A primary key is a column or a combination of columns that uniquely defines each row in a table. When designing a database choosing a primary key is one of the most important steps. A table can contain one or more columns suitable to be a primary key and that column or columns are called a candidate key. Only one candidate key can be selected as a primary key

There are a couple of rules to follow when choosing a primary key for a table: all records in a primary key must be unique, a primary key cannot contain NULL values, a composite primary key cannot exceed 16 columns and a key length of 900 bytes, primary key values shouldn’t be changed

There are two types of a primary key – a natural key and a surrogate key and there is a lot of debating whether to choose one or another as a primary key

A natural key

A natural key, otherwise called an intelligent or a domain key, is a key with a business value and logically related to a table, meaning that the data of a natural key exists in nature. If a more than one column is defined as a primary key on a table it is called a composite primary key. For example, the Customer table has a composite primary key combined from the FirstName, the LastName, and the Email columns:

FirstName varchar(20) NOT NULL,

LastName varchar(20) NOT NULL,

Email varchar(20) NOT NULL,

Pros for using a natural key as a primary key

  • A natural candidate key for a primary key already exists in a table – there is no need for adding additional column
  • A natural key can be used in a client’s code as a search criteria

Cons for using a natural key as a primary key

  • If a primary key is a combination of the several varchar columns it becomes large. SQL Server will automatically add a clustered index on a primary key, if a table already doesn’t have one. In this case an index also becomes big (much bigger on varchar data type columns than on an integer data type column) and the number of index pages which are used to store the index keys is increased. This increases the number of reads required to read the index and degrades overall index performance
  • If a primary key column(s) have varchar data type the JOIN statements are slower as compared to the integer data-type joins

In the following example a primary key is combined from three columns to achieve uniqueness of a primary key

If the business logic changes, you’ll need to change all references to an existing primary key. Some examples are the change in ISBN (International Standard Book Number) from the 10 digit universal identifier into the 13 digit identifier. In the case of using SSN (Social Security Number) as a primary key you need to consider a possibility that SSNs may be reused after a person’s death and also that in cases of fraud, or an identity thefts an individual will get a new SSN.In the following example if a Customer’s SSN changes that change will have to be reflected in the two tables that reference the Customer table

The record cannot be entered into a table until the value of a primary key is known.

A surrogate key

A surrogate key is a unique number generated by SQL Server or a database itself, and has no business logic. In SQL Server it is most often used as an IDENTITY column or a Globally Unique Identifiers (GUID) column – a globally unique 128 bit long data type.

Pros for using a surrogate key as a primary key

  • If the business logic changes a surrogate key will not change since it has no business value
  • Surrogate keys are typically integers, which only require 4 bytes to store, so the primary key index structure will be smaller
  • A naming system for a surrogate key is easier to create

Cons for using a surrogate key as a primary key

  • When a surrogate primary key is used an extra indexes may be required on the columns that used to be a part of a natural primary key. These indexes may be necessary to preserve uniqueness of those columns, and they may make an update of a table slower
  • Having a surrogate key may require an additional joins when searching a record. For example, the user can enter a known natural primary key (e.g. username) and retrieve wanted information (e.g. real name) based on a foreign key relationship with a natural key without accessing the primary key table. In a case of a surrogate key as a primary key the user would have to look up in the primary key table to retrieve information stored in a table with a foreign key relationship
  • A surrogate key cannot be used in a search

Performance test

The CPU time needed to parse and compile the JOIN statement with varchar data type columns and a composite primary key:

ON c.FirstName=i.CustomerFName

AND c.LastName=i.CustomerLName

SQL Server parse and compile time:

   CPU time = 4 ms, elapsed time = 4 ms.

SQL Server Execution Times:

   CPU time = 0 ms,  elapsed time = 0 ms.

SQL Server parse and compile time:

   CPU time = 0 ms, elapsed time = 0 ms.

The parse and compile time of the JOIN statement with an integer data type primary key:

ON c.CustomerID = i.CustomerID

SQL Server parse and compile time:

   CPU time = 0 ms, elapsed time = 2 ms.

SQL Server Execution Times:

   CPU time = 0 ms,  elapsed time = 0 ms.

SQL Server parse and compile time:

   CPU time = 0 ms, elapsed time = 0 ms.

Refactoring an existing primary key

In a situation when one key is already implemented on a table, but another type of a primary key is more suitable you can implement database refactorings and introduce a surrogate key to replace an existing natural key with, or replace an existing surrogate key with a natural key

Choosing a primary key by replacing a natural key with a surrogate key is a method mostly used to reduce coupling between a database schema and external applications in cases when an existing natural key may change. Also, a large natural key may reduce performance

The opposite database refactoring method of introducing a surrogate key to a table is the replacing an existing surrogate key with a natural key. Motivation for using this refactoring is mostly to maintain a key strategy or to remove unnecessary keys – sometimes a surrogate key column is introduced to a table when it actually wasn’t needed

Regardless of a chosen primary key type for your database tables you should choose a single strategy and be consistent in applying it throughout your database. By refactoring your database to consolidate key strategy you can achieve code consistency because having a variety of keys your code to access a database is also implemented in various ways which increases the maintenance of code having your developers to follow all different approaches. Also, if your company has a corporate standard for a preferred key strategy you may discover that your schema doesn’t comply with the rules, so you’ll need to refactor your primary keys and implement the consolidate key strategy refactoring

Useful resources:

SQL by Design: How to Choose a Primary Key
Surrogate vs Natural Primary Keys – Data Modeling Mistake 2 of 10
Primary Keys: IDs versus GUIDs