I have a special format file with a header and some content. In other languages I wrote a file subclass for reading and writing this file type. The file class has a dictionary like interface for the header and a streaming interface for reading/writing the other content. I am thinking about how to do this in Python. Could anyone point me to examples of subclassing io for reading/writing a special file format or suggest better ways to do this kind of thing in Python.
could you show a file sample with perhaps one complete record?
Custom file class can be quite easily written.
My file is a binary file with a text header. The header consists of multiple records all 128 bytes long. Each record has a 32 byte key followed by a 96 byte value. It looks something like this:
FILE_TYPE TIME_HISTORY_FILE
CREATION_DATE JANUARY 27, 2021
SAMPLE_PERIOD 0.001
NUMBER_OF_SAMPLES 4
NUMBER_OF_CHANNELS 2
CHANNEL_1_NAME X
CHANNEL_1_UNITS m
CHANNEL_2_NAME Y
CHANNEL_2_UNITS mm
END_OF_HEADER
The length of the header changes to allow different numbers of channels.
Following the header is the time history data.
x0,y0,x1,y1,x2,y2,x3,y3
Each time history value is 4 bytes long. The length of the time history data is NUMBER_OF_SAMPLES * NUMBER_OF_CHANNELS.
I cannot change the file organization. There are several legacy applications that work with this file format.
In other languages the API for this file type is:
open(filename) : Opens file and reads header
get(key) : Returns value associated with this header key
set(key, value) : Set value associated with this header key
read(count, buffer) : Read count time history values into buffer
save(filename) : Opens file for writing. Writes header to file.
write(count, buffer) : Write count time history values stored in buffer
close() : Close the file
I would like to look at ways that others have solved similar problems. I am currently studying the gzip library.
A couple of questions about the header:
- is header rec number actually part of record?
- id header tab delimited?
- is FILE_TYPE always TIME_HISTORY_FILE?
1: ? I do not understand the question
2: No. Key and value information is padded with spaces so the key is always 32 bytes long and value 96 bytes long.
3: No. There is also an extended time history type which has a slightly different header.
Quote:1: ? I do not understand the question
sample from post 3:
Output:
1 FILE_TYPE TIME_HISTORY_FILE
is the '1' at start actually part of the record.
No. That is just an artifact of wrapping with Python tags. Guess I should have used something else.
How about reading data and converting into something like this:
record = {
'FILE_TYPE': 'TIME_HISTORY_FILE',
'CREATION_DATE': 'JANUARY 27, 2021',
'SAMPLE_PERIOD': 0.001,
'ch1': {
'x': 'value',
'y': 'value',
},
'ch2': {
'x': 'value',
'y': 'value',
},
'ch3': {
'x': 'value',
'y': 'value',
},
'ch4': {
'x': 'value',
'y': 'value',
},
...
}
I really don't have a problem with how to represent the data. My question have more to do with the mechanics of opening the file, reading from the file, writing to the file.
For example, I would really like to do this:
def dump_file(filename):
with timehistoryfile.open(filename) as file:
for key, value in file.header.items():
print(key.strip(), value.strip())
So I want to inherit or implement the things that support context management.
I want to read a bunch of the time history values all at once, so I need to implement read(count). How do I implement "read(count)" Is it as simple as converting count to size and calling read(size) from the base class? If so, what is a good base class to use?
You don't need a base class to create a context manager, you just need to implement the enter/exit interface.
>>> class TimeHistory:
... def __init__(self, filename):
... self.filename = filename
... self.fobj = None
... def __enter__(self):
... self.fobj = open(self.filename, "r")
... return self
... def __exit__(self, *args):
... if self.fobj:
... self.fobj.close()
... self.fobj = None
...
>>> with TimeHistory('test.txt') as file:
... print(file)
...
<__main__.TimeHistory object at 0x000001AA1BB40AF0>