Python Forum
Manipulating Binary Data - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Manipulating Binary Data (/thread-2948.html)

Pages: 1 2


Manipulating Binary Data - arsenal88 - Apr-20-2017

Hi,

Dealing with some python at my work and having a little trouble writing a good script.

I have a txt file with a bunch of Hexadecimal data (1000's lines), e.g 2 frames shown below


AA08430022 AA08410234



The first 2 bytes of each 'frame' is the timestamp signal (in this case AA08). The rest of the frame (3 bytes) is the actual data.

I need to separate the data so that the timestamps can be in their own list and the data can be separated into 3 bins. This data needs to all be linked together at the end so that it is all sequential.

I need to get to a situation where this data is in a csv file in the following way:

          COL                            BIN1                BIN2                    BIN3
timestamp(frame1)              data(frame1)      data(frame1)        data(frame1)
timestamp(frame2)              data(frame2)      data(frame2)        data(frame2)


I'm not sure how to go about doing this. 
Not expecting any solution but advice what direction to go in would be great as I'm at a dead end.

Thanks











my code here



RE: Manipulating Binary Data - ichabod801 - Apr-21-2017

I would:
  • Read them in as strings
  • Split them into timestamp vs. data with string slicing
  • Create a dictionary, with the timestamps as the keys, and the values being lists of the data for that timestamp.*
  • Append each data point to the list for the appropriate timestamp
  • Once all the data is read, get the keys into a list and sort it.
  • Loop through the list, writing the keys and the data out to the file.
* You could maybe do this step as a list of lists, but I would only do that if you are sure the data you are reading in is in the order you want to output things in.


RE: Manipulating Binary Data - arsenal88 - Apr-21-2017

Thanks for the response. Yea my initial idea was to treat is all as 1 large string:

<python>with open('test_data.txt') as hexData:
data = "".join(line.rstrip() for line in hexData)
</python>


This now creates a string 'data' with all the frames in 1 long row:
AA08430022AA08410234


I know how to slice strings based on indices but not sure when there is a reoccuring pattern (timestamps)? Some sort of for loop over the data and extract patterns based on the values?
Would I include regex?

Cheers


RE: Manipulating Binary Data - volcano63 - Apr-21-2017

There is a nice and simple package just for this purpose - parsing binary data strings, see struct


RE: Manipulating Binary Data - arsenal88 - Apr-21-2017

I am using regex to match all the values in the string that contain the timestamp and put it into a list:

match = re.findall(r'(AA08)', data)

I think i'll need to use this because some frames are a little corrupted, containing perhaps A0AA as a timestamp instead etc so I'll need a way of identifying these corruptions and finding similar patterns.

I want to create a list of all the valeus that do NOT contain the timestamp, so that each list index in 'match 'will correlate to the index of 'noMatch'... then I can go from there.

But it seems quite difficult to get any regex to work that returns a list of the string that does not contain AA08


Any idea how to get that?


RE: Manipulating Binary Data - wavic - Apr-21-2017

Since the time stamps can be the same for a bunch of frames you may face some obstacles processing them. It depends on what you need. List of tuples could be more convenient.


RE: Manipulating Binary Data - ichabod801 - Apr-21-2017

You don't need to mess with regular expressions. Just chop them up and feed them in to a defaultdict, and you're good to go.

import collections
data = collections.defaultdict(list)
for file_data in ('AA08430022', 'AA08410234', 'AB81130138'):
    timestamp = file_data[:4]
    datum = file_data[4:]
    data[timestamp].append(datum)
Output:
>>> data {'AA08': ['430022', '410234'], 'AB81': ['130138']}



RE: Manipulating Binary Data - arsenal88 - Apr-22-2017

Split them into timestamp vs. data with string slicing


That bit there is proving to be tricky for me.

I have tried the split method, to turn the long string into a list and separate all the timestamps:

data.split('AA08')

This gives me ['AA08', data, 'AA08', data, etc..]

I want it to be [AA08+data, AA08+data]

It's probably a simple solution but I'm such a newbie with python. I'm an embedded C engineer and don't deal with manipulating these kind of data structures or methods often.. :(


RE: Manipulating Binary Data - snippsat - Apr-22-2017

(Apr-22-2017, 07:46 PM)arsenal88 Wrote: I want it to be [AA08+data, AA08+data]
You could probably join element before the goes in the list.
It had been better if you post sample code that we can run.
Can of course join element in a list that has been made.
>>> lst = ['AA08', 'data', 'AA08', 'data']
>>> it = iter(lst)
>>> [''.join(each) for each in zip(it, it)]
['AA08data', 'AA08data']



RE: Manipulating Binary Data - arsenal88 - Apr-25-2017

*Create a dictionary, with the timestamps as the keys, and the values being lists of the data for that timestamp.*

Ok so I've got up to this step.. My dictionary contains 1 key ('AA08') and then a list of all the data [data1, data2, data3 etc..] corresponding to that 1 key.



I need to seperate the data into 3 bins... i.e create a list of 3 values within each data list value...
Then I need to somehow put this into a csv in the format I described in my original post.

I have done the following:

file = open("test_data.csv", "w")
file.write(Timestamp")
for h in range(0, 3):
file.write("Bin " + str(h + 1) + ",")
file.write("\n")

Thats my headers sorted... but for looping through the dictionary, I'm stuck! I'm not sure how to extract the dictionary key value pairs into the csv format described originally.

Apologies if I'm asking seemingly trivial things.. tried for hours to get things working but I'm so unfamiliar with python and its data structures.