Design URL Shortener
Requirements Clarification
- Given a long URL, return a short URL
- Given a short URL, redirect to the long URL
- System should be able to handle 100 million new URLs per day
- Short URLs should be a combination of numbers (0-9) and characters (a-z, A-Z)
If you haven’t used a URL shortening service before, refer to TinyURL and try out some of the features.
Back-of-the-envelope Estimation
Since we have 100 million new URLs, we need to handle 100 million/24/3600 = 1160 requests per second on average.
Assuming that the system will run for 10 years, we need to support 100 million * 365 * 10 = 365 billion records.
If average URL length is 100, then storage requirements over 10 years is 365 billion * 100 bytes * 10 = 365TB.
API Definition
We can expose the functionality of the system via REST APIs.
POST api/v1/shorten-url?longUrl={long URL string}
response: short URL
GET api/v1/{short URL string}
If short URL exists, return 301 redirect to long URL else 404 not found.
Note that another http status code, 302 should be used if the redirect is temporary.
URL Shortening
In this section, we will discuss how to generate short and unique key for a given URL.
Encoding entire long URL
A hash function is used to hash a long URL to a short URL. The hash value consists of 62 possible characters (0-9, a-z, A-Z). With 7 characters, we can support approximately a total of 62^7 = 3.5 trillion URLs.
We could use well known hash function like MD5 or SHA-1. We then take the first 7 characters of the hash value. However, this would lead to hash collision. If a hash collision occur, we then append an increasing sequence number until it succeeds.
This approach may be expensive as it continuously inserts into database until successful. A technique call bloom filter may improve performance.
Generate Keys Offline
Another approach to shortening URL is to generate unique keys offline. We make use of the unique ID generator we discussed here. We generate IDs beforehand, convert them to base62 encoding and hands it out whenever a request for URL shortening comes in.
Other Points To Consider
- Rate limiter. Malicious users may send a huge amount of requests. Rate limiter helps to filter out requests base on IP addresses or other filtering rules.
- Analytics Data is important in making business decisions. Integrating an analytics will help to answer questions like how many clicks? When did the clicks occur?