Python Forum

Full Version: Convert file of hex strings to binary file
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello,
I have a file that has several hex string values, separated by the newline character. E.g. the file looks like:

dd5bda81
ae0ac495
b97a7664
...
I can easily parse this file and find e.g. the 10th string based on the line number.

However I want to convert the file to binary to save disk space.
I was thinking of using
binascii.unhexlify()
however I'm not sure what's the best way to handle their ordering, i.e. just concatenate the byte arrays? Note that the original file can be huge, maybe several gigabytes in size, and I'm not sure how efficient it would be to parse the billion-th value.
If the source file has line endings like your example, you can process line by line.
Here as short example:

with open("source.hex") as fd_in, open("destination.bin", "wb") as fd_out:
    for line in fd_in:
        chunk = binascii.unhexlify(line.rstrip())
        fd_out.write(chunk)
  • Open source-file in read text mode, open output file in binary write mode. The example shows how to do it in one line.
  • iterate over lines. for line in fd_in
  • Strip from the right side whitespace: line.rstrip()
  • Convert hex-string into bytes (binary data) with binascii.unhexlify
  • Write the processed data to fd_out
Sometimes it makes more sense to write a C program. Am I going to get kicked off the forum now?
Hm, why?

Try your luck.

If you have done it right, then create a Python Module in C.
My test:

[deadeye@nexus ~]$ dd if=/dev/urandom of=random.bin bs=1M count=64
64+0 Datensätze ein
64+0 Datensätze aus
67108864 Bytes (67 MB, 64 MiB) kopiert, 0,815292 s, 82,3 MB/s
[deadeye@nexus ~]$ python file2hex.py 
[deadeye@nexus ~]$ md5sum random.bin random2.bin 
929b3a89653f956721743a93955e2ec2  random.bin
929b3a89653f956721743a93955e2ec2  random2.bin
Code:
from binascii import hexlify, unhexlify


def file2hex(input_file, output_file):
    with open(input_file, "rb") as fd_in, open(output_file, "wb") as fd_out:
        while chunk := fd_in.read(20):
            fd_out.write(hexlify(chunk))
            fd_out.write(b"\n")


def hex2file(input_file, output_file):
    with open(input_file, "rb") as fd_in, open(output_file, "wb") as fd_out:
        for line in fd_in:
            fd_out.write(unhexlify(line.rstrip()))



file2hex("random.bin", "random.hex")
hex2file("random.hex", "random2.bin")