Python Forum

Full Version: Manipulating Binary Data
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2
Hi,

Dealing with some python at my work and having a little trouble writing a good script.

I have a txt file with a bunch of Hexadecimal data (1000's lines), e.g 2 frames shown below


AA08430022 AA08410234



The first 2 bytes of each 'frame' is the timestamp signal (in this case AA08). The rest of the frame (3 bytes) is the actual data.

I need to separate the data so that the timestamps can be in their own list and the data can be separated into 3 bins. This data needs to all be linked together at the end so that it is all sequential.

I need to get to a situation where this data is in a csv file in the following way:

          COL                            BIN1                BIN2                    BIN3
timestamp(frame1)              data(frame1)      data(frame1)        data(frame1)
timestamp(frame2)              data(frame2)      data(frame2)        data(frame2)


I'm not sure how to go about doing this. 
Not expecting any solution but advice what direction to go in would be great as I'm at a dead end.

Thanks











my code here
I would:
  • Read them in as strings
  • Split them into timestamp vs. data with string slicing
  • Create a dictionary, with the timestamps as the keys, and the values being lists of the data for that timestamp.*
  • Append each data point to the list for the appropriate timestamp
  • Once all the data is read, get the keys into a list and sort it.
  • Loop through the list, writing the keys and the data out to the file.
* You could maybe do this step as a list of lists, but I would only do that if you are sure the data you are reading in is in the order you want to output things in.
Thanks for the response. Yea my initial idea was to treat is all as 1 large string:

<python>with open('test_data.txt') as hexData:
data = "".join(line.rstrip() for line in hexData)
</python>


This now creates a string 'data' with all the frames in 1 long row:
AA08430022AA08410234


I know how to slice strings based on indices but not sure when there is a reoccuring pattern (timestamps)? Some sort of for loop over the data and extract patterns based on the values?
Would I include regex?

Cheers
There is a nice and simple package just for this purpose - parsing binary data strings, see struct
I am using regex to match all the values in the string that contain the timestamp and put it into a list:

match = re.findall(r'(AA08)', data)

I think i'll need to use this because some frames are a little corrupted, containing perhaps A0AA as a timestamp instead etc so I'll need a way of identifying these corruptions and finding similar patterns.

I want to create a list of all the valeus that do NOT contain the timestamp, so that each list index in 'match 'will correlate to the index of 'noMatch'... then I can go from there.

But it seems quite difficult to get any regex to work that returns a list of the string that does not contain AA08


Any idea how to get that?
Since the time stamps can be the same for a bunch of frames you may face some obstacles processing them. It depends on what you need. List of tuples could be more convenient.
You don't need to mess with regular expressions. Just chop them up and feed them in to a defaultdict, and you're good to go.

import collections
data = collections.defaultdict(list)
for file_data in ('AA08430022', 'AA08410234', 'AB81130138'):
    timestamp = file_data[:4]
    datum = file_data[4:]
    data[timestamp].append(datum)
Output:
>>> data {'AA08': ['430022', '410234'], 'AB81': ['130138']}
Split them into timestamp vs. data with string slicing


That bit there is proving to be tricky for me.

I have tried the split method, to turn the long string into a list and separate all the timestamps:

data.split('AA08')

This gives me ['AA08', data, 'AA08', data, etc..]

I want it to be [AA08+data, AA08+data]

It's probably a simple solution but I'm such a newbie with python. I'm an embedded C engineer and don't deal with manipulating these kind of data structures or methods often.. :(
(Apr-22-2017, 07:46 PM)arsenal88 Wrote: [ -> ]I want it to be [AA08+data, AA08+data]
You could probably join element before the goes in the list.
It had been better if you post sample code that we can run.
Can of course join element in a list that has been made.
>>> lst = ['AA08', 'data', 'AA08', 'data']
>>> it = iter(lst)
>>> [''.join(each) for each in zip(it, it)]
['AA08data', 'AA08data']
*Create a dictionary, with the timestamps as the keys, and the values being lists of the data for that timestamp.*

Ok so I've got up to this step.. My dictionary contains 1 key ('AA08') and then a list of all the data [data1, data2, data3 etc..] corresponding to that 1 key.



I need to seperate the data into 3 bins... i.e create a list of 3 values within each data list value...
Then I need to somehow put this into a csv in the format I described in my original post.

I have done the following:

file = open("test_data.csv", "w")
file.write(Timestamp")
for h in range(0, 3):
file.write("Bin " + str(h + 1) + ",")
file.write("\n")

Thats my headers sorted... but for looping through the dictionary, I'm stuck! I'm not sure how to extract the dictionary key value pairs into the csv format described originally.

Apologies if I'm asking seemingly trivial things.. tried for hours to get things working but I'm so unfamiliar with python and its data structures.
Pages: 1 2