Python Forum

Hi,
After an OCR-session, I have very large files for people to search for data (prayer cards).
I "dump" them both in text format (.txt) and in binary, using pickle.
So far , so good.
Now I need to read the data:
With pickle it is file.load(...) and I get the whole thing as a list, I can go through, record by record, that is ok.
My traditional way of reading the .txt file is:

with open('sourcefile', 'r') as source:
        for idx, line in enumerate(source):
              ...code ...

In both cases, after scanning through all data , i just close the file, and continue with the results.
The pickle file is much smaller that my txt file, but:

Question: is the pickle load(...) method more taxing on the computer's memory that the enumerate(...) method.
Even if I can empty the list after using it?
Any pros or cons?
thx,
Paul

Pickle can store several objects by successive calls to dump. These objects can be retrieved one by one, which solves your memory issue.

from pathlib import Path
import pickle

this_dir = Path(__file__).parent

idata = ['some spam', '4 slices of ham', '12 eggs']

filename = this_dir / 'idata.pkl'

# pickle a sequence of objects one by one
with filename.open('wb') as ofh:
    pkl = pickle.Pickler(ofh)
    for x in idata:
        pkl.dump(x)

# unpickle a sequence of objects one by one
with filename.open('rb') as ifh:
    pkl = pickle.Unpickler(ifh)
    try:
        while True:
            x = pkl.load()
            print(x)
    except EOFError:
        pass

Output:λ python paillasse/pf/pick.py
some spam
4 slices of ham
12 eggs

[quote="Gribouillis" pid='160814' dateline='1661160975']
Pickle can store several objects by successive calls to dump. These objects can be retrieved one by one, which solves your memory issue.
[python]

OK, Thanks, I'll try this method to understand what it does !
thx,
Paul

[quote="Gribouillis" pid='160814' dateline='1661160975']
These objects can be retrieved one by one, which solves your memory issue.
[python]
Python is wonderful Cool

thx,
Paul

DPaul

Gribouillis

DPaul

DPaul