Python Forum
A small data sorting program - couple of general and hopefully easy questions - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/Forum-Python-Coding)
+--- Forum: Data Science (https://python-forum.io/Forum-Data-Science)
+--- Thread: A small data sorting program - couple of general and hopefully easy questions (/Thread-A-small-data-sorting-program-couple-of-general-and-hopefully-easy-questions)



A small data sorting program - couple of general and hopefully easy questions - Ansifatcat - Jan-17-2018

Hi,

First post, don't know if there's somewhere for an intro but they always seem somewhat trite on internet forums (fora?) anyway. Hi anyway.

I have a small project that I'm undertaking. I have already written this many tears ago using MSaccess and visual basic within MSAccess so it was effectively a self contained MsAccess application. It's not by any means complex, and I'm sure the methodology used before would work again, but a couple of pointers in respect of Python would be useful

Briefly what the the program does is look for a post code (zip code) in a csv address file. It then uses the postcode to look up a routing code which is appended to that record, and thereafter the file is sorted by the routing code.

The method I used previously was to make extensive use of SQL within MsAccess, and using visual basic for manipulation and repetitive tasks, like cycling through fields to find something resembling a postcode. I have MariaDB and I have a working python connection to it, but am not too familiar with Python.

Within Access I imported the file to be sorted into a new table though that doesn't seem too viable a proposition within MariaDB since the table needs to be defined in advance. Are there any more suitable strategies I could use within a MariaDB and Python combination? Does this sound like a viable methodology, or would there be a means of doing this solely within Python (data files can be very big, 40,000 records etc)? Any thoughts are welcome before I expend too much time teaching myself Python/MAriaDB.

Many thanks.

Rob


RE: A small data sorting program - couple of general and hopefully easy questions - Gribouillis - Jan-17-2018

If there are only 40000 records, I'm not sure it can be called a big file. Did you try loading the whole file in memory to see if python can manipulate the data as a whole? Also note that you may need to load only a few columns for the main work of sorting the lines.

Nowadays, everybody seems to be using the pandas library to handle tabular data. I don't know this library, but it is probably the first thing you could check: try to load your data with pandas.

There is also a classical and very mature library named pytables that can manage the storage of very large amounts of data. It can also be a much more comfortable alternative than using an sql database. It may have nice sorting capabilities too.


RE: A small data sorting program - couple of general and hopefully easy questions - Ansifatcat - Jan-25-2018

Much obliged, Pandas does indeed look to be a potential candidate. For some reason that escapes me for now, I did take a look at Pytables, but I had some difficulty getting my head round it. Pandas looks easier for a not very current programmer like me,

Thanks again