Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Memory Use in array
#1
Hi all.
I have a loop that populates an array via a DB query of keys (short text) .... raw data size is [1580990412 bytes] => 1.5GB

    mydocsarray=[]
    for eachrow in cbs.my_query(q):
        mydocsarray.append(eachrow)

    print (len(mydocsarray))
    print (sys.getsizeof(mydocsarray))
I find that the memory for the code when executed blows to over 14GB on my windows PC in taskmanager.
   

The print of the array length and size shows the following ....
49381034
424076264


Is this a coding issue or a pycharm issue ?
Reply
#2
Quote:Is this a coding issue or a pycharm issue ?

If you use sys.getsize on a list, you'll see only how much the list itself consumes in memory.
Each element in the list is just a reference to the object, which lives somewhere in memory.
Each element has also a size, but sys.getsize is not recursive. You get only the memory consumption
about the object itself, but not of the references.
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#3
(Jan-22-2020, 07:48 PM)DeaD_EyE Wrote:
Quote:Is this a coding issue or a pycharm issue ?

If you use sys.getsize on a list, you'll see only how much the list itself consumes in memory.
Each element in the list is just a reference to the object, which lives somewhere in memory.
Each element has also a size, but sys.getsize is not recursive. You get only the memory consumption
about the object itself, but not of the references.

Thanks.
But if the base dataset size is ~1.5GB ..... how to I use the array to be somewhat close to that...... not 10x that :)
How to avoid this overhead.
Reply
#4
Anyone with any ideas/suggestions ?

thanks
Reply
#5
For what is worth: using tuples (list of tuples)instead of lists should decrease memory consumption by ~15%
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply
#6
(Jan-22-2020, 04:31 PM)fakka Wrote: Is this a coding issue or a pycharm issue ?
is there difference when run it from command line, not from pycharm?
what libraries do you use? what db?
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#7
There is built-in tracemalloc which could be used to trace memory allocation.
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply
#8
(Jan-23-2020, 05:14 PM)perfringo Wrote: There is built-in tracemalloc which could be used to trace memory allocation.

Thanks I will google it.

Is there a suggestion on doing it a different way ? Meaning these are just raw text fields ... assume people hit this everyday when pulling larger datasets (~1.5GB) from a DB ?
Reply
#9
Did confirm its not pycharm. Occurs in regular linux python also.

I dumped some data into a text file - 40 million lines
$ cat /tmp/datadump.txt | wc -l
49568121

]$ du -sh /tmp/datadump.txt
2.2G /tmp/datadump.txt


{"metaid": "2018-09-19T18:22:12.577::2c6dba80-31b0-48ab-b40e-560822c46321"}
{"metaid": "addressbook:us:q:0000000002:steviejobs4"}
{"metaid": "addressbook:us:q:0000000002:steviejobs5"}
{"metaid": "addressbook:us:q:0000000007:itcinf1"}
{"metaid": "addressbook:us:q:0000000022:jj"}
{"metaid": "addressbook:us:q:0000000123:test4"}
{"metaid": "addressbook:us:q:0000000183:snake"}
{"metaid": "addressbook:us:q:0000000200:godofthunder3"}
{"metaid": "addressbook:us:q:0000000200:load test2"}
{"metaid": "addressbook:us:q:0000000430:delivery"}

I then ran this ....

if __name__ == '__main__':

        myarray=[]

        f= open("/tmp/datadump.txt","r+")
        for eachrow in f:
                myarray.append(eachrow)
And monitored memory use on OS in another session

MemFree: 6231624 kB
MemFree: 6231164 kB
MemFree: 6197600 kB
MemFree: 5962440 kB
MemFree: 5623356 kB
MemFree: 5265816 kB
MemFree: 4894304 kB
MemFree: 4530768 kB
MemFree: 4200192 kB
MemFree: 3876088 kB
MemFree: 3544328 kB
MemFree: 3169744 kB
MemFree: 2871492 kB
MemFree: 2595768 kB
MemFree: 2276404 kB
MemFree: 1986584 kB
MemFree: 1705416 kB
MemFree: 1471016 kB


So went from 6.2GB down to 1.5GB .... or 4.7GB for a 2.2gb dataset.
Reply
#10
Any ideas here ?

Not sure how to take this any further ?
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  How do I get memory consumed by each item in the array? Bhavika 2 2,131 May-24-2020, 06:09 PM
Last Post: Bhavika
  What work faster and take less memory array or class? Kamilbek 1 3,144 Apr-20-2017, 05:32 PM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020