上QQ阅读APP看书，第一时间看更新

Understanding SHA-1's running state and compression function

In our browser window, let's pick Challenge 5 (gain access to /etc/passwd), change the algorithm to SHA-1, click save, and then click on test:

Well, I don't see much happening here. But that URL sure looks interesting. Check out the parameters visible to us (and, apparently, under our control): http://192.168.108.106/ctf/challenge5/index.php?algo=sha1&file=test&hash=dd03bd22af3a4a0253a66621bcb80631556b100e

Clearly, algo=sha1 is defining the algorithm we selected. But file=test and the hash field should be catching our attention, as it appears to be a message authentication code mechanism for authorizing access to the file called test. If I modify the hash right now, I get a File Not Found error. Let's do a quick review of how this works before we conduct the attack.

In our example, access to the test file is authenticated with the attached hash. One might think, what good is that? All the signature will tell me is that no one modified the name of the file – unless we attach a secret to the message, in which case we're hashing secret + message. Surely, based on what we know about hashes, only secret + message would produce the correct hash. Hash functions are one-way functions, so it's impossible to reverse and find the secret. We want to inject our own data: a directory traversal attack to obtain /etc/passwd; that is, request a file and provide a valid hash to validate the request. This seems impossible on the surface, but we're missing two crucial mechanisms built into the hashing algorithm: padding and initial hash values (also called registers).

SHA-1 is iterative. It takes a message and splits it into 512-bit blocks of data, and then applies a compression function with each block. There are two inputs to each round of the compression function: the 160-bit hash from the previous round, and the next 512-bit block of message data. I can hear you literally shouting at the book, so does that mean there's an initialization vector? Yes, there is. What's interesting about SHA algorithms is their IV – called the initial hash value – is standardized and fixed. In the case of SHA-1, the initial hash value is 67452301efcdab8998badcfe10325476c3d2e1f0. With 3.97 bits of entropy, it's a good random number (but of course, since it's standardized, it isn't really random – the entire world knows it). That initial hash value is actually split into five 32-bit chunks. During the hashing process, the five chunks are stored in registers (H₀ to H₄). These values are known as the running state. When the whole message has been processed and the final block's compression function has spit out the final 160-bit running state, that value is the actual SHA-1 hash for the whole message.

Put simply, whenever you see a SHA-1 hash, you're actually seeing the final running state for the final 512-bit block of message data. The compression function took the previous running state as one of the inputs, going back to the beginning of the message and the specification-defined initial hash value.

So why do we care about all these nifty details? The key to how the length extension attack works is the SHA-1 hash isn't just the output of the entire operation; it's the running state at that point in the hashing process. Suppose the hash process were to continue with another block of message data; the running state at the penultimate block would be exactly what we see here. That running state came from the output of the last compression function, which itself took in the previous running state, and so on – until we're back at the initial hash value as the 160-bit input and the first block of message data as the 512-bit input, which contains the unknown secret! We'll create a new message with the attacker's data on the end, plus whatever padding is needed to get us to a 512-bit block. We'll then take the original hash as the running state input to the compression function for the last block, ending up with a new hash that fundamentally derives from the first secret block. We never find out what the secret is, and we don't have to – its DNA is built into the numbers we do have:

I know what the hacker in you is saying at this point: since the final block will have padding, we don't know the length of the padding without knowing the length of the secret; therefore, we can't slip our data in without knowledge of the secret's length. True, but elementary, Watson! We will rely on one of the most powerful, dangerous, mind-blowing hacking techniques known to mankind: we'll just guess. The secret can't be just any length; it has to fit in the block. This limits the guessing, making this feasible. But let's make life a little easier by using Burp Suite to send the guesses.