In this blog article I will discuss various methods of generating unique keys, hashing, and encryption, examining their benefits and drawbacks. I thought of writing a separate article for each, but that would result in very short articles, and I don’t like that. I like long articles. OK. Here we go.
Unique Keys
What is a “unique key”? If you have an object, you want to identify it using a brief number that will never, ever point to another object. Guaranteed. That’s a unique key (or unique identifier).
I would further say that in order to have a truly global key (one that will never clash with another key no matter what the context), you need to have a globally unique identifier (GUID). This is a stronger form of unique key, in that it’s context-free. You can generate one anywhere, in any environment, and be guaranteed that it’s globally unique.
Let’s start with normal (not globally unique) keys. How many bits are sufficient? Let’s suppose you choose to generate the key based on a hash of the object data.
The birthday problem is the right kind of model to use here. It turns out, for a 32-bit hash, if you have more than about 10,000 objects chances are you’ll have a clash (two objects with the same key). At about 48 bits, clashes become extremely unlikely even for a million objects.
It turns out, with CRC64, you can generate a unique hash for up to 18 million objects (and probably more) without worrying about collisions. Thus, 64-bit keys in most instances will be sufficient.
If you want to use a randomly generated GUID, you can use a regular GUID generator, then take only the first 8 bytes (64 bits) of the GUID. This should give you a decent random 64-bit GUID. To be sure, however, I would recommend hashing (with CRC64) the full 128-bit GUID, although it may hinder performance slightly.
Summary: 64-bit GUIDs can exist and can be used in most applications. 128-bit GUIDs are kind of overkill.
Hashing Algorithms
There are many hashing algorithms out there, but only a few you should be aware of and know intimately.
1. CRC
CRC (Cyclic Redundancy Check) is a very basic hash function that generates a 32-bit hash (for CRC-32) or in rarer cases a 64-bit hash (for CRC-64).
Advantages: Speed. CRC is the fastest hashing algorithm out there.
Disadvantages: Lack of security. CRC is the least secure hashing algorithm. Very prone to attack.
Find out what secure (cryptographic) hashing is and how it differs from regular (checksum) hashing. CRC is considered a basic checksum hash.
2. MD5
MD5 is also a basic hash function. It generates a 128-bit hash.
Advantages: Speed. Fastest cryptographic hash function. Convenience would be a second advantage, as nearly every platform has a built-in MD5 hash function.
Disadvantages: Lack of security. MD5 can be broken relatively easily and is no longer suitable for use in secure systems. Use MD5 only as a checksum hash, like CRC. MD5 is also significantly slower than CRC.
3. SHA-1
SHA-1 is another common hash function. It’s relatively fast – as fast as MD5 – and generates a 160-bit hash. SHA-1 is the most basic hash that I would consider “secure.”
Advantages: Harder to attack than MD5 and just as fast. Faster (though only slightly) than some of the more secure hashes, like SHA-256.
Disadvantages: Slower than CRC. Slightly more vulnerable to attack than SHA-256, but in most applications it’s not likely to be a problem… yet.
4. SHA-256 and SHA-512
SHA-256 and SHA-512 are uber-secure hash functions used when security is critical (for example, for protecting passwords in a banking application). The hash size is 256 bits and 512 bits, respectively.
Advantages: Uber-secure. Safe from attack for probably a few decades, until Moore’s Law catches up.
Disadvantages: Slowest hash function. For a system with high transaction rate, these hash functions can take a significant toll on the CPU. Note: The difference in performance versus SHA-1 is not that great.
Summary
Use SHA-256 or SHA-512 where security is paramount. Don’t settle for some of the other less secure hashes. When security doesn’t matter, just use MD5. Use CRC if you need a more compact hash (< 128 bits) and don’t care about security but do care about performance.
Encryption Algorithms
Encryption algorithms are harder to summarize because there are so many of them. However, I’ll look at the most popular forms of private-key encryption (better known as symmetric key encryption) here. When it comes to symmetric-key encryption, you will see that the choice is clear.
1. Triple DES
Triple DES is a 168-bit encryption algorithm, but with only 80 bits of effective security. Triple DES is vulnerable to attacks and should no longer be used in building secure systems.
Advantages: None.
Disadvantages: Insecure. Slow.
2. AES
AES is a highly secure 128-bit / 192-bit / 256-bit encryption algorithm designed to replace DES. It’s significantly faster than Triple DES and is now being implemented in hardware.
Advantages: Secure. Extremely fast. Now implemented in mainstream hardware.
Disadvantages: Some implementations are vulnerable to cache attack.
Summary
AES is your best friend. It’s fast, secure, and implemented virtually everywhere. If you’ve got a Sandy Bridge CPU, it’s there, and its speed can reach 2 GB/s.