Python Forum
Convert file of hex strings to binary file
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Convert file of hex strings to binary file
#1
Hello,
I have a file that has several hex string values, separated by the newline character. E.g. the file looks like:

dd5bda81
ae0ac495
b97a7664
...
I can easily parse this file and find e.g. the 10th string based on the line number.

However I want to convert the file to binary to save disk space.
I was thinking of using
binascii.unhexlify()
however I'm not sure what's the best way to handle their ordering, i.e. just concatenate the byte arrays? Note that the original file can be huge, maybe several gigabytes in size, and I'm not sure how efficient it would be to parse the billion-th value.
Reply
#2
If the source file has line endings like your example, you can process line by line.
Here as short example:

with open("source.hex") as fd_in, open("destination.bin", "wb") as fd_out:
    for line in fd_in:
        chunk = binascii.unhexlify(line.rstrip())
        fd_out.write(chunk)
  • Open source-file in read text mode, open output file in binary write mode. The example shows how to do it in one line.
  • iterate over lines. for line in fd_in
  • Strip from the right side whitespace: line.rstrip()
  • Convert hex-string into bytes (binary data) with binascii.unhexlify
  • Write the processed data to fd_out
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#3
Sometimes it makes more sense to write a C program. Am I going to get kicked off the forum now?
Reply
#4
Hm, why?

Try your luck.

If you have done it right, then create a Python Module in C.
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#5
My test:

[deadeye@nexus ~]$ dd if=/dev/urandom of=random.bin bs=1M count=64
64+0 Datensätze ein
64+0 Datensätze aus
67108864 Bytes (67 MB, 64 MiB) kopiert, 0,815292 s, 82,3 MB/s
[deadeye@nexus ~]$ python file2hex.py 
[deadeye@nexus ~]$ md5sum random.bin random2.bin 
929b3a89653f956721743a93955e2ec2  random.bin
929b3a89653f956721743a93955e2ec2  random2.bin
Code:
from binascii import hexlify, unhexlify


def file2hex(input_file, output_file):
    with open(input_file, "rb") as fd_in, open(output_file, "wb") as fd_out:
        while chunk := fd_in.read(20):
            fd_out.write(hexlify(chunk))
            fd_out.write(b"\n")


def hex2file(input_file, output_file):
    with open(input_file, "rb") as fd_in, open(output_file, "wb") as fd_out:
        for line in fd_in:
            fd_out.write(unhexlify(line.rstrip()))



file2hex("random.bin", "random.hex")
hex2file("random.hex", "random2.bin")
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
Question [SOLVED] Correct way to convert file from cp-1252 to utf-8? Winfried 8 547 Feb-29-2024, 12:30 AM
Last Post: Winfried
  file open "file not found error" shanoger 8 946 Dec-14-2023, 08:03 AM
Last Post: shanoger
  Need to replace a string with a file (HTML file) tester_V 1 699 Aug-30-2023, 03:42 AM
Last Post: Larz60+
  Trying to understand strings and lists of strings Konstantin23 2 699 Aug-06-2023, 11:42 AM
Last Post: deanhystad
  How can I change the uuid name of a file to his original file? MaddoxMB 2 874 Jul-17-2023, 10:15 PM
Last Post: Pedroski55
  Convert File to Data URL michaelnicol 3 1,083 Jul-08-2023, 11:35 AM
Last Post: DeaD_EyE
  How do I read and write a binary file in Python? blackears 6 6,020 Jun-06-2023, 06:37 PM
Last Post: rajeshgk
  Python Script to convert Json to CSV file chvsnarayana 8 2,346 Apr-26-2023, 10:31 PM
Last Post: DeaD_EyE
  Reading data from excel file –> process it >>then write to another excel output file Jennifer_Jone 0 1,048 Mar-14-2023, 07:59 PM
Last Post: Jennifer_Jone
Thumbs Up Need to compare the Excel file name with a directory text file. veeran1991 1 1,071 Dec-15-2022, 04:32 PM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020