# 01 Convert hex to base64
This is as easy as it gets. If you fail here, go improve your
programming skills first. If it's too easy for you, implement base64
instead of using the standard library implementation. I've used this
chance to create a `util.rb` which holds all reused code. The first
function I wrote is `assert` to document all implicit assumptions, I
recommend doing the same, especially in a language with a dynamic type
system. It doubles as a light-weight testing tool.
# 02 Fixed XOR
XOR is the bread-and-butter operator in cryptography. Expect to use
it a lot.
# 03 Single-byte XOR cipher
It took me some head-scratching to come up with a good scoring metric.
An obvious one is to give strings with unprintable characters a
failing score, however that's not enough to tell whether they resemble
English. I've looked at some statistics stuff, found an English
histogram and implemented the [Chi-squared test](https://en.wikipedia.org/wiki/Chi-squared_test) to compare the
encountered distribution with the ideal one. It's not hard, but
definitely no 9th grader mathematics.
# 04 Detect single-character XOR
Easy-peasy. If a string has been encrypted by single-character XOR,
it will have a good English score for a particular byte, otherwise
not.
# 05 Implement repeating-key XOR
The only way you can screw this up is by not taking into account that
the length of the plaintext you're encrypting doesn't have to be a
multiple of the key length. This gotcha will come up in a few more
exercises.
# 06 Break repeating-key XOR
To implement the Hamming distance it's useful to have a `popcount`
function. If you xor two bytes, the equal bits will become ones and
the unequal ones become zeroes. Count the ones with `popcount` and
you get the Hamming distance.
There's a few error-prone parts in this challenge that require
tweaking. For step #4 I went with four key-sized blocks. I also
ended up improving my English scoring function to figure out the
correct text. Instead of just comparing the given with the ideal
histogram, I added a check for unusual characters: If there's less
than there typically are, double the score. The reasoning here is
that an unusually high number of particular letters is still more
likely to look like English than an unusually high number of
non-letters.
# 07 AES in ECB mode
Oh noes, OpenSSL. The APIs are weird, so the greatest issue here is
figuring out how to use it.
# 08 Detect AES in ECB mode
Could this be as easy as finding duplicates as long as the block size?
Yes!
# 09 Implement PKCS#7 padding
PKCS#7 padding is simple, but has one caveat that's easy to overlook.
Assuming the length of the plaintext to be padded is already a
multiple of the block size, you still have to add padding as large as
the block size. Otherwise unpadding will fail.
# 10 Implement CBC mode
The big realization here is that ECB used on a single block can be
used to encrypt and combine multiple blocks independently. This
allows you to implement CBC (and later, CTR) in terms of it. Just
make sure that if you decrypt, OpenSSL doesn't try to remove any
PKCS#7 padding for you because that will fail horribly.
# 11 An ECB/CBC detection oracle
Oracles are great. In cryptography, an oracle refers to a function
that can be used by the attacker to gain a piece of information. This
one doesn't seem to be terribly useful as it only returns encrypted
text, however it still manages to leak information about the used
cipher mode. Just remember about challenge #08 and you'll be golden.
# 12 Byte-at-a-time ECB decryption (Simple)
The most involved part of this is detecting the block size of the
cipher. Other than that it's a straight-forward implementation of the
description.
# 13 ECB cut-and-paste
I lazied out initially and reused Ruby's support for `form-urlencoded`
data. This was a bad idea as it made the challenge impossible to
solve, not only will it touch the ampersand and equals characters, but
everything unprintable as well...
If you're wondering why, consider that even data encrypted with ECB
needs to be padded (how else would you encrypt arbitrarily long data
that may or may not match up with the cipher's block size?). If you
cut and paste at the block boundaries, the block going at the end
needs to resemble correct padding. This is going to be a problem if
the sanitizing function mangles binary data.
# 14 Byte-at-a-time ECB decryption (Harder)
It took me a while to figure this one out. You can mostly solve this
challenge like #12, but need to detect the length of the random prefix
first. Once that's been taken care of you can proceed with creating
the dictionary of bytes, with the caveat that you may need to generate
more padding to fill up the random prefix until the next block
boundary. The rest works the same.
# 15 PKCS#7 padding validation
Remember that there must be at least one byte of padding and that the
last byte tells how large the padding is.
# 16 CBC bitflipping attacks
I'm afraid there's no bitflipping operator, but XOR comes close
enough. Given an existing byte and a target byte, XORing them gives
you a number. XORing the existing byte by that number will yield the
target byte. Use this to craft an input string to bitflip into the
target string.
# 17 The CBC padding oracle
I found [the paper for this attack](https://infoscience.epfl.ch/record/52417/files/IC_TECH_REPORT_200150.pdf) hard to comprehend, fortunately
there are many explanations on the details available online. There's
much that can go wrong with your implementation. You'll have to work
one block each, but in reverse, need to keep the correct padding in
mind, piece together information from already guessed bytes and apply
the IV to the result (something most explanations of the attack I've
found gloss over). Still, the attack is conceptually elegant and
breaks cryptosystems that don't use signing, so it's definitely worth
doing.
# 18 Implement CTR, the stream cipher mode
CTR isn't nearly as confusing to get right as CBC. The only thing to
watch out for is how to encode a 8-byte number in little-endian.
You'll want to use `pack` if your language provides it.
# 19 Break fixed-nonce CTR mode using substitutions
Throws one back to the first challenges, huh. You'll want to look up
English trigrams and test for them first, then score by a metric like
number of intelligible words found in the decrypted text. Once you've
found a promising part of the keystream, you can guess more words in
the partially decrypted text. Bonus points if you manage creating a
visualization of the process.
# 20 Break fixed-nonce CTR statistically
This is challenge #06 all over again. It saddens me a bit that you
have to truncate the messages, but the full decrypted text can be
easily found online.
# 21 Implement the MT19937 Mersenne Twister RNG
I've made the mistake of looking at [the underlying theory](https://en.wikipedia.org/wiki/Semisimple_Lie_algebra) to
implement it from the original paper. Don't. The field
visualizations are pretty, but won't help you at all. Pseudocode from
Wikipedia is much better, you'll want to go for the 32-bit variant.
If you're having issues, grab their Python implementation, add logging
and compare the output of intermediate state with yours. This trick
works with other things as well, such as debugging hash functions.
# 22 Crack an MT19937 seed
The RNG implemented in #21 will always return the same sequence of
random numbers given the same seed. Considering that the seed is some
past point in time, you can establish a range of possible seeds and
try each until you find one leading to the RNG output you've captured
before.
# 23 Clone an MT19937 RNG from its output
This one is rather annoying to get right. Observing enough numbers to
get a tempered internal state is the easy part, the hard part is
untempering them. I eventually gave up figuring out the equivalent
transforms and looked them up online.
# 24 Create the MT19937 stream cipher and break it
The first challenge can be brute-forced. The prefix size can be
deducted from looking at the ciphertext size. For each possible seed
value fast-forward the RNG by the amount of prefix bytes, then check
for the remaining bytes whether XORing them with the keystream gives
you the known plaintext.
The second challenge is even easier. A password reset token's
randomness depends on the exact time it was created. Create one at
the same time and they'll be equal. If they aren't, well, the
captured reset token must have been created at a different time...
# 25 Break "random access read/write" AES CTR
This one sounds harder than it is. The key is to realize that if the
attacker replaces a piece of the ciphertext and the result is the same as
before, they must have chosen a piece equal to the plaintext. To
guess as little as possible, edit one byte at a time. I initially
implemented the `edit` function as stupidly as possible, but that made
guessing the plaintext very slow, so the hardest part of the challenge
was implementing a more efficient `edit` function that synthesized the
necessary part of the keystream, XORed it with the new plaintext and
returned the ciphertext with the result patched in.
edit: I found a way more efficient way. The `edit` API call gives you
a way to generate an attacker-controlled ciphertext using the same
nonce and key. Encryption combines the plaintext with a keystream
that is the same for both ciphertexts. Therefore, XORing both
ciphertexts will cancel the keystreams out and give you the
combination of the unknown plaintext and the known one. XOR the
result with the known plaintext and you've recovered the unknown one.
The lesson from this one is probably that you should use a new nonce
when re-encrypting.
# 26 CTR bitflipping
Same approach as with challenge #16 except that bitflipping will
affect the same position (as opposed to the next block).
# 27 Recover the key from CBC with IV=Key
Implement exactly what's described here. It will miraculously reveal
the key. Note that the attacker *must* receive an error message
containing the mogrified plaintext because that's the one you'll slice
apart in the final step.
# 28 Implement a SHA-1 keyed MAC
I looked up a pure Ruby implementation of SHA1 on Rosetta Code. Make
sure it can process bytes, just like all of your utility functions.
For the test I wrote a verification function receiving a buffer and
MAC which checks whether the MAC is the same as running `sha1_mac` on
the buffer and secret. To demonstrate that it can detect tampering I
wrote two assertions where either the buffer or mac had a byte
mutated.
# 29 Break a SHA-1 keyed MAC using length extension
First of all, get acquainted with your SHA1 implementation,
particularly with how it generates padding (which includes the length
of the message). You'll have to modify it to allow starting from a
different internal state and more importantly, to use a custom length
in the final padding bit. If you forget doing that, you'll end up
with a different padding than the server validating the MAC would.
Alternatively, consider splitting it up into `initialize`, `digest`
and `finalize`.
The actual attack requires guessing the length of the secret prefix,
so just repeat it with different lengths until you guess the right
one.
# 30 Break an MD4 keyed MAC using length extension
Same deal as with #29. MD4 uses a construction similar to SHA1, so no
surprises here.
# 31 Implement and break HMAC-SHA1 with an artificial timing leak
This one is fun. Implementing HMAC as described on Wikipedia is easy,
the hardest part here is writing that web service verifying HMACs.
For the actual attack, consider that `insecure_compare` will either
exit immediately (because the compared bytes don't match) or succeeds,
sleeps and continues comparing the next bytes. You can therefore tell
what the right byte is by trying all for a given position and picking
the one that took longest. Repeat for the next position, but with the
correctly guessed byte fixed. You'll eventually have tested the
complete HMAC and the web service will return status code 200 for it.
I recommend displaying the found bytes Hollywood-style, like [here](http://brause.cc/gifs/hollywoodhacking.gif).
# 32 Break HMAC-SHA1 with a slightly less artificial timing leak
This is a bit harder than #31 because it's not guaranteed that
measuring once is enough to tell the correct byte. You'll have to
come up with some way to smooth the measurement errors, like by
measuring multiple times for each candidate and averaging the times.
I went for guessing characters repeatedly (at least twice) until more
than 50% of my guesses resulted in the same character.
# 33 Implement Diffie-Hellman
Get acquainted with modular arithmetic and implement `modexp`.
Alternatively, use the OpenSSL version. Implementing DH with it is
trivial.
# 34 Implement a MITM key-fixing attack on Diffie-Hellman with parameter injection
Ah, protocols. I've used named pipes to allow communication between
processes, with one per recipient. For example if A wants to send
text to B, A writes into B's named pipe; at some point B will read
from its named pipe and thus receive the message. You could
alternatively use some variation on OOP and keep things in one
process, but I don't find it as neat.
The presented MITM attack has the effect of turning the shared key
into zero, no matter the user inputs. This means that Mallory can
decrypt all messages encrypted with that key.
# 35 Implement DH with negotiated groups, and break with malicious "g" parameters
This MITM attack is slightly less realistic because A and B won't
agree on the same shared secret and the communication will fail after
the first message has been exchanged. Mallory can still figure out
that message as the key is predictable again. Either do some basic
algebra to figure out the exact value or observe what values occur.
# 36 Implement Secure Remote Password (SRP)
The description for this one is misleading. SRP is all about the
client never sending a password to the server. Another difference is
that the protocol is split into a registration and authentication
phase. Other than that the description is valid.
# 37 Break SRP with a zero key
Do exactly as told. Have a client register on your server, then
simulate a rogue client with the suggested parameters. You'll find
that they result in the shared secret becoming zero.
# 38 Offline dictionary attack on simplified SRP
Flawed description again (why would the server need to crack the
password if it already knew it?). Use the parameters on the server
side to crack the client credentials by bruteforcing with a dictionary
and checking for every possible password what `v` would be. Once it
matches, you have a compatible password.
# 39 Implement RSA
OpenSSL provides both prime generation and `invmod`. They'll be
faster than your own implementations, but feel free to implement them
anyway for the educational effect. Everything else is easy.
# 40 Implement an E=3 RSA Broadcast attack
CRT stands for Chinese Remainder Theorem, you may want to watch a
YouTube video on how to apply it for solving an equation similar to
what's given in this exercise. Implement an integer cube root if you
don't have one in your language, it can be something as simple as
doing a binary search. Note that the description of the algorithm is
subtly flawed, you'll have to take the modulus of the whole sum, not
the last term...
# 41 Implement unpadded message recovery oracle
I found OOP useful to model the server setting, but there isn't really
a need to do it that way. Remember that `(modexp(S, E, N) * C) % N` is
equivalent to `(S * C) % N` and `(P_ * invmod(S, N)) % N` is to `(P_ /
S) % N`. Modular arithmetic is easy to get wrong by forgetting a
modulo at some point, so err on the side of caution.
# 42 Bleichenbacher's e=3 RSA Attack
A 1024-bit RSA signature refers to the modulus being 1024 bits long,
so you'll need to generate two 512 bit primes. For the
signature, [use MD5 as that produces the smallest ASN.1 padding](https://tools.ietf.org/html/rfc3447#section-9.2)
and leaves you more room for fooling around. The mentioned write-up
makes it very clear how exactly the forged signature should end up
looking, use that with your integer cube root function to calculate
the forged signature.
# 43 DSA key recovery from nonce
I found DSA far more annoying to get right than RSA, simply because
there's far more math and parameters
involved. [The original DSA paper](https://web.archive.org/web/20131226115544/http://csrc.nist.gov/publications/fips/fips1861.pdf) includes test data in the end
which greatly helps with debugging your implementation. Cracking the
key with the given formula is just a matter of brute force. One more
thing that might trip you up, the SHA1 hash given for verification has
to be applied to the hexadecimal string representation of the private
key...
# 44 DSA nonce recovery from repeated nonce
It's easy to find the messages with a repeated `k`, they will have the
same `r`. Pick two messages of that kind and apply the given
formula. If you want an actual challenge, puzzle the formula together
on your own. Hint: Start off with subtracting two signature equations
from each other.
# 45 DSA parameter tampering
Same story as in #34 and #35 except that you aren't told to simulate
networking. The magic signature thing is something I haven't figured
out yet, but it should be a matter of algebra.
# 46 RSA parity oracle
Implement the algorithm exactly as described. You'll want to use
decimals or rationals for the boundaries to guess the correct number,
otherwise the last byte will come out wrong. The Hollywood-style
display is pretty nice, see it [here](http://brause.cc/gifs/hollywoodhacking2.gif).
# 47 Bleichenbacher's PKCS 1.5 Padding Oracle (Simple Case)
I went way too many times over [the Bleichenbacher paper](http://archiv.infsec.ethz.ch/education/fs08/secsem/Bleichenbacher98.pdf) (and a few
other explanations of the attack) until I was sure I understood every
step of the algorithm. This would have been much easier to comprehend
with pseudocode. Anyway, you'll want to have a `floor` / `ceil`
function, be it to use on floats or for integer division. There is no
blinding needed whatsoever, so just use a `s0` of 1. The algorithm
boils down to starting with step 2a, then entering a loop of step 3
and 2c. The final number is a differently padded plaintext.
# 48 Bleichenbacher's PKCS 1.5 Padding Oracle (Complete Case)
Same as #47, but you'll need to flesh out your code to actually work
with multiple intervals. The loop consists of step 3 followed by
either step 2b or 2c, depending on whether step 3 gave you multiple
intervals or not.
# 49 CBC-MAC Message Forgery
For part 1 you'll want to think about how a manipulation to the first
block would give your account a million spacebucks. If you xor the
relevant bytes into that message, you'll have to xor the same bytes in
the IV for decryption to work.
Part 2 requires two messages, one captured from the sender of the
million spacebucks, one captured from an account you control. Xor the
MAC of the first message into the first block of the second message,
pad the first message and add the second message and its IV. The
resulting message should be valid, but may not be decoded correctly,
so retry with a different second message.
# 50 Hashing with CBC-MAC
It took me rather long to realize that you need the key so that you
can encrypt and decrypt blocks as needed to pull off the attack. The
idea is that you start encrypting blocks resembling the message to be
forged (plus a line comment starter to ignore the rest of the
message), then append two more blocks, one randomly chosen and one
resembling the MAC that should look like a padded block after
decrypting. To keep the amount of attempts low, go for the most
likely PKCS#7 padding, a single byte.
I completed the extra credit exercise as well. It uses a form for
uploading a local file and the `FileReader` API to process it.
Verification is done with the [aesjs](https://github.com/ricmoo/aes-js) library. The easiest way of
inserting something into the DOM is `document.write` which replaces
the whole document and runs it, so you'll see a Wu-Tang Clan alert
after uploading both the original and the forged script.
# 51 Compression Ratio Side-Channel Attacks
[This paper](https://www.iacr.org/cryptodb/archive/2002/FSE/3091/3091.pdf) looks relevant and is what enables the [CRIME](https://en.wikipedia.org/wiki/CRIME)
and [BREACH](https://en.wikipedia.org/wiki/BREACH) attacks. For the first CTR results in as many
encrypted as compressed bytes, so you can write an algorithm that
tries a concatenation of the secret prefix, the bytes guessed so far
and the guessed byte, the one yielding the smallest result is likely
to be the correct one. Adding a random suffix improves the chances
considerably. The algorithm terminates once there is no correct
looking candidate, however that fails early sometimes. There's two
possible solutions here, backtracking to guessing the previous byte
again or retrying the whole guess until there's a correct looking
one. I obviously went for the latter because I implemented the same
strategy in #32.
With CTR swapped out for CBC things get harder because there's no
longer an exact correlation between plaintext and ciphertext size.
Thanks to PKCS#7 padding the size of the ciphertext will increase by a
block once the plaintext reaches a multiple of the block size.
Similarly, the size of the ciphertext will decrease by a block once
the plaintext becomes smaller than a multiple of the block size.
Another fact helping us is that random bytes are (close to)
uncompressable by definition, so they'll come in useful as glue to
achieve padding to the nearest block size. The easiest way is to wrap
the earlier logic into a loop retrying with increasingly bigger random
blocks until reaching the block size.
# 52 Iterated Hash Function Multicollisions
[Relevant paper](http://math.boisestate.edu/~liljanab/Math509Spring10/JouxAttackSHA-1.pdf) explaining the idea in more detail. Note that if
you use AES-128 for your weak hash function you first pad the IV to
the full key's length, then truncate the output to the IV size.
Another important detail is that you don't employ your homebrew
hashing functions to find collision blocks (which works on an
arbitrary number of blocks and appends a padding block), but the
compression function (which works on a single block). Implement the
strategy of generating chained collision pairs, then create the
product of all pairs and concatenate the combinations into messages.
All these messages hash to the same value, even after padding (as they
share the same intermediate states and lengths).
If you're wondering why generating tons of messages and randomly
trying until you find a collision for an unrelated hashing function
works, it's the [birthday attack](https://en.wikipedia.org/wiki/Birthday_attack) at work. Basically, having
`2**(n/2)` colliding candidates (with n being the state size of the
stronger hash function in bits) makes it very likely that one pair
collides. It's only a matter of chance, hence why you're told to just
generate more candidates in case you're unlucky.