Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
pickle or txt
#1
Hi,
After an OCR-session, I have very large files for people to search for data (prayer cards).
I "dump" them both in text format (.txt) and in binary, using pickle.
So far , so good.
Now I need to read the data:
With pickle it is file.load(...) and I get the whole thing as a list, I can go through, record by record, that is ok.
My traditional way of reading the .txt file is:
with open('sourcefile', 'r') as source:
        for idx, line in enumerate(source):
              ...code ...
In both cases, after scanning through all data , i just close the file, and continue with the results.
The pickle file is much smaller that my txt file, but:

Question: is the pickle load(...) method more taxing on the computer's memory that the enumerate(...) method.
Even if I can empty the list after using it?
Any pros or cons?
thx,
Paul
It is more important to do the right thing, than to do the thing right.(P.Drucker)
Better is the enemy of good. (Montesquieu) = French version for 'kiss'.
Reply
#2
Pickle can store several objects by successive calls to dump. These objects can be retrieved one by one, which solves your memory issue.
from pathlib import Path
import pickle

this_dir = Path(__file__).parent

idata = ['some spam', '4 slices of ham', '12 eggs']

filename = this_dir / 'idata.pkl'

# pickle a sequence of objects one by one
with filename.open('wb') as ofh:
    pkl = pickle.Pickler(ofh)
    for x in idata:
        pkl.dump(x)

# unpickle a sequence of objects one by one
with filename.open('rb') as ifh:
    pkl = pickle.Unpickler(ifh)
    try:
        while True:
            x = pkl.load()
            print(x)
    except EOFError:
        pass
Output:
λ python paillasse/pf/pick.py some spam 4 slices of ham 12 eggs
Reply
#3
[quote="Gribouillis" pid='160814' dateline='1661160975']
Pickle can store several objects by successive calls to dump. These objects can be retrieved one by one, which solves your memory issue.
[python]

OK, Thanks, I'll try this method to understand what it does !
thx,
Paul
It is more important to do the right thing, than to do the thing right.(P.Drucker)
Better is the enemy of good. (Montesquieu) = French version for 'kiss'.
Reply
#4
Thumbs Up 
[quote="Gribouillis" pid='160814' dateline='1661160975']
These objects can be retrieved one by one, which solves your memory issue.
[python]
Python is wonderful Cool
thx,
Paul
It is more important to do the right thing, than to do the thing right.(P.Drucker)
Better is the enemy of good. (Montesquieu) = French version for 'kiss'.
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020