pickle or txt

DPaul · Aug-22-2022, 09:23 AM

Hi,
After an OCR-session, I have very large files for people to search for data (prayer cards).
I "dump" them both in text format (.txt) and in binary, using pickle.
So far , so good.
Now I need to read the data:
With pickle it is file.load(...) and I get the whole thing as a list, I can go through, record by record, that is ok.
My traditional way of reading the .txt file is:

        
              with open('sourcefile', 'r') as source:
        for idx, line in enumerate(source):
              ...code ...

In both cases, after scanning through all data , i just close the file, and continue with the results.
The pickle file is much smaller that my txt file, but:

Question: is the pickle load(...) method more taxing on the computer's memory that the enumerate(...) method.
Even if I can empty the list after using it?
Any pros or cons?
thx,
Paul

**Gribouillis** · Aug-22-2022, 09:36 AM

Pickle can store several objects by successive calls to dump. These objects can be retrieved one by one, which solves your memory issue.

        
              from pathlib import Path
import pickle
 
this_dir = Path(__file__).parent
 
idata = ['some spam', '4 slices of ham', '12 eggs']
 
filename = this_dir / 'idata.pkl'
 
# pickle a sequence of objects one by one
with filename.open('wb') as ofh:
    pkl = pickle.Pickler(ofh)
    for x in idata:
        pkl.dump(x)
 
# unpickle a sequence of objects one by one
with filename.open('rb') as ifh:
    pkl = pickle.Unpickler(ifh)
    try:
        while True:
            x = pkl.load()
            print(x)
    except EOFError:
        pass

Output:λ python paillasse/pf/pick.py
some spam
4 slices of ham
12 eggs

DPaul · Aug-22-2022, 10:54 AM

[quote="Gribouillis" pid='160814' dateline='1661160975']
Pickle can store several objects by successive calls to dump. These objects can be retrieved one by one, which solves your memory issue.
[python]

OK, Thanks, I'll try this method to understand what it does !
thx,
Paul

DPaul · Aug-22-2022, 02:26 PM

[quote="Gribouillis" pid='160814' dateline='1661160975']
These objects can be retrieved one by one, which solves your memory issue.
[python]
Python is wonderful Cool

thx,
Paul

pickle or txt

User Panel Messages

Announcements