Python Forum
database decision....
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
database decision....
#1
Hello,
I have MariaDB up and running (see previous post), it's Miraculous!
Now, I have a decision to make on how to proceed. I'm afraid it's a lenghty intro for a simple question.
I am still swamped with genealogical legacy 'databases' (dbase, excel various versions, etc.)
Here is the problem: say I get 50 excel files , each with "birth" data in them, from various villages.
They were all created when the animals still could talk, they all have different fields, in different columns, often various formats.

I can take one of 2 roads:
A) Make an inventory of all the different fields that have been created in all 50 "databases", and make a table in MariaDb with those fields.
Of course some are common like name of the newborn, name of the father, but that's about it.
First names are sometimes in 1 field, sometimes in many. Dates as well.
Sometimes there is a mother, sometimes there is a remark, sometimes there is a hyperlink to a scanned page in an old register. etc...etc ...etc;
Then I have to write a mapping program for each case, in order to get the SQL INSERT right. A tremendous amount of work.
Anyway, I will have a massive number of redundant fields, and if tomorrow they give me another excel, I might have to add some extra fields.

B) Alternative :
I can make only one (text) field in Mariadb. And concatenate every excel record as one string, with a separator between the fields: eg.
Name:Jones#firstnames:John,Keith#Father=Eric#Village:London....etc In the order the columns are presented in each excel.
This way I have a searchable field using the SQL SELECT... FROM ... LIKE "%xxx%". Bingo.
No redundant fields, Possibility of multi-criteria search using sets and intersections....

Furthermore, we want Marriages, Deaths, etc.. in the same table , all come with their own fields, more redundancy...
So for a given name + criteria you could find all events across villages.

A or B or something else. Where to go?
Thanks,
Paul
It is more important to do the right thing, than to do the thing right.(P.Drucker)
Better is the enemy of good. (Montesquieu) = French version for 'kiss'.
Reply
#2
I may have been unclear in the previous post, so let me rephrase, as this is a big decision:
There are paper documents, zillions of them, that contain info such as names, dates and places.

a) Traditionally some volunteers read these, and type the contents into, excel, dbase ...etc
From there, these columns could go into eg. MariaDB. But this whole process takes an inordinate amount of time.

b) We can scan these documents, creating little jpg or pdf files.
I can use python to read these and write the result into a zillion little txt files or into one large txt file.
I can look for a given pattern eg. "Jones" in the large txt file, by reading the whole thing line by line.
That works. Gets more cumbersome if you are looking for 3 criteria simultaneously.

c) I could put each little scanned text into MariaDB as a string into one record. Now i have much better and faster select/wildcard possibilities
in a database environment.

Basically I propose to use MariaDB als a text repository, using the SQL select features as a bonus. only 2 fields (source_filename and text)
Using python and tkinter, you search, select results and see the original doc on a canvas. Lightning fast.

The question is: could there be an even better way to organise this? Some python text construction that doesn't need a database...?
Paul
It is more important to do the right thing, than to do the thing right.(P.Drucker)
Better is the enemy of good. (Montesquieu) = French version for 'kiss'.
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020