KSUID unique identifiers

Introduction

One of the first tasks when starting a new software project is to decide how to identify objects. While it may not seem important at first, the choice of the format used for identifiers is critical since it is very hard to change once the application has been deployed.

Properties

The most important property for an identifier scheme is their uniqueness: you want to make sure that a single object is associated with each identifier.

In addition, we would like identifiers not to leak any information, for example the total number of objects; they should also be hard to guess.

Finally, it is convenient for identifiers to be ordered by generation time.

Existing solutions

Sequential integers

One of the most common methods is to use a single incrementing integer for each collection of objects. SQL-based relational databases provide sequences which can be used to ensure that each object in a table have a different identifier.

Example:

Id Name
1 Bob
2 Alice
3 Eve

While simple, this method suffers from two major drawbacks:

  • These identifiers cannot be generated without database access: in a world where distributed systems are more and more present, having to coordinate with a central database just to create objects is a hindrance.
  • Using sequential values leak information: if someone sees that their user identifier is 250, they can infer that you probably have at least 250 users, and that other users can be found with identifiers 1 to 249. Given that identifiers are usually visible in web URI, leaking information about them increase the attack surface for your organization.

UUID

UUID identifiers are 128 bit values standardized in RFC 4122 . They are usually represented in hexadecimal form, e.g. f6309ea2-68fd-43d2-9e11-07c21d09e537.

The RFC document describes several UUID variants:

  • UUID v1, containing a timestamp and the MAC address of the host.
  • UUID v3, based on the MD5 hash of a name.
  • UUID v4, containing only random data.
  • UUID v5, based on the SHA1 hash of a name.

Example:

Id Name
267e4037-6f6e-48de-addc-bb1dcc67836d Bob
34eb46ca-e482-4200-bc3b-25e9c650e9a8 Alice
16d65bf1-2243-42cf-a232-d8e14362268d Eve

Unfortunately all variants suffer from various issues:

  • Variant 1 requires unique MAC addresses, which is inconvenient in many environments, and does not provide any randomness.
  • Variant 3 and 5 require unique names to be used for generation, defeating the very purpose of unique identifier generation.
  • Variant 4 seems better suited, but identifiers are not naturally ordered by generation time.

UUID values are also case insensitive, which is a common source of bugs in software, and are not ordered by generation time.

ULID

ULID identifiers are also 128 bit values and combine a 48 bit millisecond timestamp and 80 bit of random data.

Example:

Id Name
01FZJ4V2G5FE5QV93F78VR7XEC Bob
01FZJ4V7W7AYWWQZBTBZKRTWNM Alice
01FZJ4VFKF9PTCJ6PYSS4P4BC8 Eve

While ULID identifiers are a significant improvement upon more naive identifier schemes, providing reasonable randomness and ordering, they still have several drawbacks:

  • They only contain 80 bit of random data, compared to the 122 bit of UUID v4, which massively increases the chances of collision.
  • They are case insensitive.
  • Generation requires a state.

KSUID

KSUID identifiers were designed at Segment; they contain a 32 bit millisecond timestamp and 128 bit of random data.

Example:

Id Name
27BtF4TFs8SIc3SVNvlx3Fj2mz4 Bob
27BtFf7k36omyePOU7PJ1JiztU5 Alice
27BtGIM0Tm4pLccwqFK4E4kkHaA Eve

KSUID have multiple advantages:

  • They contain much more random data than other identifier schemes, improving resistance against collisions.
  • They are naturally ordered by generation time.
  • They are case sensitive.

We decided to use KSUID, and have not regretted it yet.

Implementation

Since Eventline is written in Go, we could have use the Segment library. But we ended up writing our own implementation in order to add missing methods.

You can also use erl-ksuid, the Erlang library which was used in the old SaaS version of Eventline.

Both our Go and Erlang library are available under an ISC open source license.