Python Forum - AES Decryption too slow

Pages: 1 2

Hi everybody

I’m still fairly new to python so please bear with me if I ask any daft questions.

I’m currently working on a crypto challenge to decode some message encoded with AES 256.

Within the challenge we know the first 24 bits of the password used for encryption. We also know that it’s only numeric.

I’ve written a script that tries decoding the message. Since I need to figure out the last 8 bits of code, I start with 00000000, add it to the end of the password, decrypt the message and write the result to a file. I then increment this figure by 1, and repeat the process. Eventually it will iterate every single possible combination and I will have the decoded output written to file.

The down side is that working it’s way through 100,000,000 attempts takes time. Included in the code is a counter so I can see the progress. I’ve had it running for many hours and not even hit the ten million mark.

The other problem isn’t that the output file size gets very large and causes the script to hang.

Anyone got any suggestions for a more efficient way of doing this?

Thanks!

If you only need to guess 1 byte that means you have 256 options (from 0x00 to 0xFF) and except if your teacher is cruel and is using the bible as plain text you shall be able to try all of them in seconds.

I think the problem is that you are iterating from 0 to 99999999... and that is 390000 times more effort.

Hi thanks for the reply.

Sorry for any confusion but with this being aes 256 the password is 32 bits in length. So we need to figure out the last 8 bits of the password which is why I’m iterating from 00000000 to 99999999.

(May-05-2018, 12:03 AM)Hairy_Ape Wrote: [ -> ]we need to figure out the last 8 bits of the password which is why I’m iterating from 00000000 to 99999999.

Eight bits wouldn't be "from 00000000 to 99999999". There is no bit that represents 9. You'd be counting from 00000000 to 11111111, is as has already been mentioned is 256.

Can you like to the "challenge"? Can you post your code? With more information, we can probably be a lot more helpful.

(May-05-2018, 12:03 AM)Hairy_Ape Wrote: [ -> ]Sorry for any confusion but with this being aes 256 the password is 32 bits in length. So we need to figure out the last 8 bits of the password which is why I’m iterating from 00000000 to 99999999.

You are mixing bits and bytes.
The key in a AES256 is... 256 bits that are exactly 32 bytes.
I have done some tests and a script that encrypt a text of ~1500 bytes and after that tries to decode it changing the last byte of the key in the 256 options takes less than a second and almost all the time is lost printing to the stdout.

To help you this is the script that encodes a secret and returns 3 values: the coded text, the nonce and the truncated key (is missing the last byte)
It also prints in the stdout the right solution.

#!/usr/bin/env python3

import os

from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes
from cryptography.hazmat.backends import default_backend


def prepare_secret(msg: bytes):
    key = os.urandom(32)
    
    algo = algorithms.AES(key)
    # Block size in bytes... I expect nobody will create an algorithm with a
    # block size not multiplo of 8
    sz = algo.block_size // 8
    # Create the nonce
    iv = os.urandom(sz)
    cipher = Cipher(algo, modes.CBC(iv), backend=default_backend())

    # Pad the message with nulls up to the chipher block
    # This is essentially *WRONG* but for this demo...
    msg += bytes(sz - len(msg) % sz)

    encryptor = cipher.encryptor()
    ct = encryptor.update(msg) + encryptor.finalize()
    
    print(f"The answer is at {key[-1]}")
    return key[:-1], iv, ct


def hack(hint: bytes, iv: bytes, ct: bytes):
    """
    This is the answer to your homework... And I am a mean person and I do not give it to you :D
    """
    pass

hint, iv, ct = prepare_secret(b'This is the Plain Text message that you are trying to recover')

hack(hint, iv, ct)

You can use this to explore the problem and understand the different element sizes.
Also notice that the padding I am using to make the lenght of the plain text a multiplo of the cypher block size is WRONG... in reality you shall use something like PKCS7 or similar.

import base64
from Crypto.Cipher import AES


message = open ('aestest.txt', 'r')
results = open ('results.txt', 'w')


bloo = message.read()
decoded = base64.b64decode(bloo)

x = 0 

while x <= 99999999:

    y = "%08d" % (x)
    keyword = '546327364536211087236533' + y
    print 'Attempt number: ' + y

    
    decryptor = AES.new(keyword, AES.MODE_ECB)
    results.write(y + ' ' + decryptor.decrypt(decoded) + '\n')

    x +=1

print 'Check the results file!'

results.close()

Yes you are correct. I'm talking bytes not bits (sorry, very new to this still). so it's the last 8 bytes that are variable and only numeric.

I've posted my script above. The message to be decoded is base64 encoded to start with and it uses ECB mode for AES decryption as we know there's no IV used.

The output file size racks up quickly and that when i'm only in the 10,000 range of combinations. We're talking 200MB size.

killerrex - I'll look at your code in a minute and see if it works any better thanks.

Any other advice from anyone would be greatly appreciated.

Ok, now I see you are attacking not the key but the ECB+padding.

Remember that the ECB has a big weakness, all the blocks are coded using the same key, so you can attack just the last block of 16 bytes and and do not attempt to decode the rest until you have found the key.

I imagine the exercise comes with something like "we know the key is composed only of numbers and we know the first 24" because otherwise if key you have is '546327364536211087236533' and you read it as a base 10 number you only have 10 bytes of information.

Assuming that as you say you need to add 8 decimal digits to the key, if you want to store all the results of trying the 10**8 possibilities for just the last block you will need:

Output:
16 bytes * 10**8 = 1.490 Gb

Big, but not impossible... although I think you need to attack more the weakness of the ECB and keep only the promising results:
- If you know the used padding keep only the valid ones
- Once removed the padding, keep the ones that produces only printable characters.
And so on...

I have done a small test with my old computer (a laptop with a i5-3360M CPU @ 2.80GHz) trying the 10**8 options to decode a 16 bytes encoded text and it has taken just 15 minutes to run.
I was not writing to the disk the 1.5 Gb file, just the promising ones.

Ok that makes sense. Didn't realise that was how it worked in ECB mode.

I've changed my code to only attempt the last block but I'm not sure about how to only writing the promising ones. Will have to look into this.

I'm currently writing all results and it is a bit faster that my original code.

The bug problem is the program hanging when the output file gets to around 200mb.

I think this is where only writing the promising ones would help.

Ok I've changed my code to only attempt to decode the last 16 bytes, and to only write to file if the returned string is ascii.

My code now looks like this:

import base64
from Crypto.Cipher import AES


message = open ('aesInput.txt', 'r')
results = open ('aesOutput.txt', 'w')


bloo = message.read()
decoded = base64.b64decode(bloo)
short = decoded[-16:]

x = 0    

while x <= 99999999:

    y = "%08d" % (x)
    keyword = '333466657657652021001011' + y 
    print 'Attempt number: ' + y

    
    decryptor = AES.new(keyword, AES.MODE_ECB)
    shortout = decryptor.decrypt(short)

    try:
        shortout.decode('ascii')

    except UnicodeDecodeError:
        pass

    else:
        results.write(y + ' ' + shortout + '\n')

    x +=1

print 'Check the results file!'

results.close()

Even doing this, it took an hour hit the 1 million mark. I would have to leave it running for 4 days solid to get a result.

I'm guessing my code is just inefficient.

killerrex (or anyone else), if you have any suggestions or hints on how to improve this, I would be grateful.

It is quite possible that you are loosing all the time in the print function...
It is better to report just a few times, for wxample:

if x % 100000 == 0:
    print(f'{x} of {total} done so far...')

writes only one every 100000 times, like 1000 times less that your code. Remember that the stdout is quite slow compared to the memory.

Another aspect is the check you are doing decoding to ascii, the idea is not bad as I imagine that the plain text would contain just ascii chars. In my case I would rather go to something like all(w.isalnum() for w in shortcut.split()) or even a regexp like r'\s*(?:[\w.-]+`s+)*' but that is up to you.
In any casest you need to remove the padding, otherwise you have set of end characters that are not printable.
There are many schemas, one of the most basic ones is filling the n bytes missing to reach the code block lenght (16 bytes in this case) with the number n, so you know that the only valid enings for your message are b'\01', b'\02\02', b'\03\03\03' and so on until b'\10'*16 (as if you do not need to add padding, you add a whole extra block)

Pages: 1 2