Python Forum
A small data sorting program - couple of general and hopefully easy questions
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
A small data sorting program - couple of general and hopefully easy questions
#1
Hi,

First post, don't know if there's somewhere for an intro but they always seem somewhat trite on internet forums (fora?) anyway. Hi anyway.

I have a small project that I'm undertaking. I have already written this many tears ago using MSaccess and visual basic within MSAccess so it was effectively a self contained MsAccess application. It's not by any means complex, and I'm sure the methodology used before would work again, but a couple of pointers in respect of Python would be useful

Briefly what the the program does is look for a post code (zip code) in a csv address file. It then uses the postcode to look up a routing code which is appended to that record, and thereafter the file is sorted by the routing code.

The method I used previously was to make extensive use of SQL within MsAccess, and using visual basic for manipulation and repetitive tasks, like cycling through fields to find something resembling a postcode. I have MariaDB and I have a working python connection to it, but am not too familiar with Python.

Within Access I imported the file to be sorted into a new table though that doesn't seem too viable a proposition within MariaDB since the table needs to be defined in advance. Are there any more suitable strategies I could use within a MariaDB and Python combination? Does this sound like a viable methodology, or would there be a means of doing this solely within Python (data files can be very big, 40,000 records etc)? Any thoughts are welcome before I expend too much time teaching myself Python/MAriaDB.

Many thanks.

Rob
Reply
#2
If there are only 40000 records, I'm not sure it can be called a big file. Did you try loading the whole file in memory to see if python can manipulate the data as a whole? Also note that you may need to load only a few columns for the main work of sorting the lines.

Nowadays, everybody seems to be using the pandas library to handle tabular data. I don't know this library, but it is probably the first thing you could check: try to load your data with pandas.

There is also a classical and very mature library named pytables that can manage the storage of very large amounts of data. It can also be a much more comfortable alternative than using an sql database. It may have nice sorting capabilities too.
Reply
#3
Much obliged, Pandas does indeed look to be a potential candidate. For some reason that escapes me for now, I did take a look at Pytables, but I had some difficulty getting my head round it. Pandas looks easier for a not very current programmer like me,

Thanks again
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Easy analysis of Data ranjjeetk 1 538 Jun-06-2020, 01:44 AM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020