Python Forum

Full Version: A small data sorting program - couple of general and hopefully easy questions
You're currently viewing a stripped down version of our content. View the full version with proper formatting.

First post, don't know if there's somewhere for an intro but they always seem somewhat trite on internet forums (fora?) anyway. Hi anyway.

I have a small project that I'm undertaking. I have already written this many tears ago using MSaccess and visual basic within MSAccess so it was effectively a self contained MsAccess application. It's not by any means complex, and I'm sure the methodology used before would work again, but a couple of pointers in respect of Python would be useful

Briefly what the the program does is look for a post code (zip code) in a csv address file. It then uses the postcode to look up a routing code which is appended to that record, and thereafter the file is sorted by the routing code.

The method I used previously was to make extensive use of SQL within MsAccess, and using visual basic for manipulation and repetitive tasks, like cycling through fields to find something resembling a postcode. I have MariaDB and I have a working python connection to it, but am not too familiar with Python.

Within Access I imported the file to be sorted into a new table though that doesn't seem too viable a proposition within MariaDB since the table needs to be defined in advance. Are there any more suitable strategies I could use within a MariaDB and Python combination? Does this sound like a viable methodology, or would there be a means of doing this solely within Python (data files can be very big, 40,000 records etc)? Any thoughts are welcome before I expend too much time teaching myself Python/MAriaDB.

Many thanks.

If there are only 40000 records, I'm not sure it can be called a big file. Did you try loading the whole file in memory to see if python can manipulate the data as a whole? Also note that you may need to load only a few columns for the main work of sorting the lines.

Nowadays, everybody seems to be using the pandas library to handle tabular data. I don't know this library, but it is probably the first thing you could check: try to load your data with pandas.

There is also a classical and very mature library named pytables that can manage the storage of very large amounts of data. It can also be a much more comfortable alternative than using an sql database. It may have nice sorting capabilities too.
Much obliged, Pandas does indeed look to be a potential candidate. For some reason that escapes me for now, I did take a look at Pytables, but I had some difficulty getting my head round it. Pandas looks easier for a not very current programmer like me,

Thanks again