Memory Use in array

fakka · (This post was last modified: Jan-22-2020, 04:33 PM by fakka.)

Hi all.
I have a loop that populates an array via a DB query of keys (short text) .... raw data size is [1580990412 bytes] => 1.5GB

    mydocsarray=[]
    for eachrow in cbs.my_query(q):
        mydocsarray.append(eachrow)

    print (len(mydocsarray))
    print (sys.getsizeof(mydocsarray))

I find that the memory for the code when executed blows to over 14GB on my windows PC in taskmanager.

The print of the array length and size shows the following ....
49381034
424076264

Is this a coding issue or a pycharm issue ?

DeaD_EyE · (This post was last modified: Jan-22-2020, 07:48 PM by DeaD_EyE.)

Quote:Is this a coding issue or a pycharm issue ?

If you use sys.getsize on a list, you'll see only how much the list itself consumes in memory.
Each element in the list is just a reference to the object, which lives somewhere in memory.
Each element has also a size, but sys.getsize is not recursive. You get only the memory consumption
about the object itself, but not of the references.

fakka · Jan-22-2020, 07:50 PM

(Jan-22-2020, 07:48 PM)DeaD_EyE Wrote:
Quote:Is this a coding issue or a pycharm issue ?

If you use sys.getsize on a list, you'll see only how much the list itself consumes in memory.
Each element in the list is just a reference to the object, which lives somewhere in memory.
Each element has also a size, but sys.getsize is not recursive. You get only the memory consumption
about the object itself, but not of the references.

Thanks.
But if the base dataset size is ~1.5GB ..... how to I use the array to be somewhat close to that...... not 10x that :)
How to avoid this overhead.

fakka · Jan-23-2020, 04:22 PM

Anyone with any ideas/suggestions ?

thanks

**perfringo** · Jan-23-2020, 05:02 PM

For what is worth: using tuples (list of tuples)instead of lists should decrease memory consumption by ~15%

**buran** · Jan-23-2020, 05:06 PM

(Jan-22-2020, 04:31 PM)fakka Wrote: Is this a coding issue or a pycharm issue ?

is there difference when run it from command line, not from pycharm?
what libraries do you use? what db?

**perfringo** · Jan-23-2020, 05:14 PM

There is built-in tracemalloc which could be used to trace memory allocation.

fakka · Jan-24-2020, 04:01 PM

(Jan-23-2020, 05:14 PM)perfringo Wrote: There is built-in tracemalloc which could be used to trace memory allocation.

Thanks I will google it.

Is there a suggestion on doing it a different way ? Meaning these are just raw text fields ... assume people hit this everyday when pulling larger datasets (~1.5GB) from a DB ?

fakka · (This post was last modified: Jan-24-2020, 06:01 PM by fakka.)

Did confirm its not pycharm. Occurs in regular linux python also.

I dumped some data into a text file - 40 million lines
$ cat /tmp/datadump.txt | wc -l
49568121

]$ du -sh /tmp/datadump.txt
2.2G /tmp/datadump.txt

{"metaid": "2018-09-19T18:22:12.577::2c6dba80-31b0-48ab-b40e-560822c46321"}
{"metaid": "addressbook:us:q:0000000002:steviejobs4"}
{"metaid": "addressbook:us:q:0000000002:steviejobs5"}
{"metaid": "addressbook:us:q:0000000007:itcinf1"}
{"metaid": "addressbook:us:q:0000000022:jj"}
{"metaid": "addressbook:us:q:0000000123:test4"}
{"metaid": "addressbook:us:q:0000000183:snake"}
{"metaid": "addressbook:us:q:0000000200:godofthunder3"}
{"metaid": "addressbook:us:q:0000000200:load test2"}
{"metaid": "addressbook:us:q:0000000430:delivery"}

I then ran this ....

if __name__ == '__main__':

        myarray=[]

        f= open("/tmp/datadump.txt","r+")
        for eachrow in f:
                myarray.append(eachrow)

And monitored memory use on OS in another session

MemFree: 6231624 kB
MemFree: 6231164 kB
MemFree: 6197600 kB
MemFree: 5962440 kB
MemFree: 5623356 kB
MemFree: 5265816 kB
MemFree: 4894304 kB
MemFree: 4530768 kB
MemFree: 4200192 kB
MemFree: 3876088 kB
MemFree: 3544328 kB
MemFree: 3169744 kB
MemFree: 2871492 kB
MemFree: 2595768 kB
MemFree: 2276404 kB
MemFree: 1986584 kB
MemFree: 1705416 kB
MemFree: 1471016 kB

So went from 6.2GB down to 1.5GB .... or 4.7GB for a 2.2gb dataset.

fakka · Jan-28-2020, 06:22 PM

Any ideas here ?

Not sure how to take this any further ?

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Getting an error while trying to process data, low memory when memory is not low?	bkeith12	0	560	Dec-20-2024, 03:06 PM Last Post: bkeith12
	How do I get memory consumed by each item in the array?	Bhavika	2	2,884	May-24-2020, 06:09 PM Last Post: Bhavika
	What work faster and take less memory array or class?	Kamilbek	1	3,711	Apr-20-2017, 05:32 PM Last Post: Larz60+

Memory Use in array

User Panel Messages

Announcements