Requirements Clarifications

  1. IDs should be unique
  2. IDs should be sortable by time
  3. IDs should be 64-bit size.
  4. The system should be able to handle 10,000 IDs per second

Multi-master replication

This approach make use of the databases’ auto_increment feature. We increase the ID in each server by step size k, where k is the number of database servers.

As illustrated in the image, the IDs generated increase by 2.

This approach has several drawbacks:

  • Across multiple servers, IDs are not ordered by time
  • Does not work well when adding or removing servers

Centralised Auto Increments

If we can’t make auto_increments work across multiple databases, what if we just use one database? This is the approach by Flickr.

A Flickr ticket server is a dedicated database server, with a single database on it, and in that database there are tables like Tickets32 for 32-bit IDs, and Tickets64 for 64-bit IDs.

The Tickets64 schema looks like:

CREATE TABLE `Tickets64` (
`id` bigint(20) unsigned NOT NULL auto_increment,
`stub` char(1) NOT NULL default '',
PRIMARY KEY  (`id`),
UNIQUE KEY `stub` (`stub`)

SELECT * from Tickets64 returns a single row that looks something like:

| id                | stub |
| 72157623227190423 |    a |

When I need a new globally unique 64-bit ID I issue the following SQL:

REPLACE INTO Tickets64 (stub) VALUES ('a');

This approach has several drawbacks:

  • Single point of failure


String generateID() {
    return UUID.randomUUID().toString().replaceAll("-","");

UUIDs are 128-bit hexadecimal numbers. They are easy to use and has a very low probability of collision. The system can scale very easily because each server is responsible for their own ID generation.

This approach has several drawbacks:

  • Does not meet requirement of 64-bits size
  • IDs are not ordered by time

Twitter Snowflake

This approach divides the ID into different sections.

1 bit 41 bits 5 bits 5 bits 12 bits
0 timestamp data center id machine id sequence number

The IDs are made up of the following components:

  • Epoch timestamp - 41 bits which gives us 69 years for any custom epoch.
  • Data Center ID - 5 bits which gives us 32 data centers
  • Machine ID - 5 bits which gives us 32 machines per data centers
  • Sequence number - Sequence number is incremented by 1 for every ID generated on the machine. Resets to 0 every millisecond. In theory, we can support 2^12 = 4096 new IDs every millisecond.
  • The extra one reserved bit is to make the overall number positive.

This approach meets our requirements, however there are still several considerations:

  • Clock synchronization. In distributed systems, servers may not use the same clock to generate the timestamp. Popular solution to clock synchronization may be to use Network Time Protocol.

Similar Approaches Based On Snowflake