Python Forum
Is Python Suitable? - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: Is Python Suitable? (/thread-2226.html)



Is Python Suitable? - summerleas - Feb-28-2017

I don't know if Python is a suitable language for what I want to do. In summary:

Read a file of about 6GB into an array (list I presume). The array/list has 6,000,000 entries each of about 1,000 bytes containing a mix of text, integer and real numbers.

Process the data. Loading it all into RAM is very desirable because effectively random subsets of the data must be processed up to about 1,000 times. Anyway, the subsets are very large.

In my youth, about 50 years ago, I used Fortran, Algol and Pascal. All of these have no important problems, but I was curious to learn something of Python to see if a claim made in a tutorial, that it can be used for anything, is correct. 

Comments?


RE: Is Python Suitable? - Larz60+ - Feb-28-2017

My experience says try it.

http://stackoverflow.com/questions/855191/how-big-can-a-python-array-get
(In 64 bit python)
Looks like the limit is on number of list entries, not the size of each so I think you'll be ok


RE: Is Python Suitable? - Ofnuts - Feb-28-2017

Depends what you do with each entry... Do you need to have everything in RAM at the same time, or are you processing things sequentially? In some case you can do things in two passes, a first one to get just the global info from each line (leading to potentially much less memory usage), and a second to process things sequentially.


RE: Is Python Suitable? - merlem - Feb-28-2017

I would also say, yes, it's surely worth a try.
And: you should consider using numpy from the beginning. There are some tutorials online, but I can't evaluate the quality. Probably those who work with numpy already can say more about that.

How much experience do you have with python already? Are you 'familiar' with importing modules and so on?