Python Forum
database versus flat files
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
database versus flat files
#1
Hi,
I'm currently involved in transforming large legacy "databases" (excel, access, csv...)
into something organised and searchable by the public. We are talking millions of genealogy-type records.
The first reaction is to put it all in a few sqlite3 databases by type (Marriages, baptisms...etc)
Unfortunately the database creation , even on a fast SSD is very, very slow.
A few 100.000 records/hour, when not even indexed.
The alternative is to leave everything in smaller csv files, eg. by village. Speedwise it is lightning fast, compared.

So he question is basically, does one have to put up with slow database creation,
or is there something else to try ?
Paul
It is more important to do the right thing, than to do the thing right.(P.Drucker)
Better is the enemy of good. (Montesquieu) = French version for 'kiss'.
Reply
#2
(Apr-19-2022, 07:39 AM)DPaul Wrote: I'm currently involved in transforming large legacy "databases" (excel, access, csv...)
into something organised and searchable by the public.

Searchable by public means, that there is a web server with a database behind?
You should avoid the use of SQLite for this use case.

Try MariaDB or if you need more features, then try PostgreSQL.
Both are relational databases.
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#3
DeaD_EyE, thanks for the advice.
MariaDB ?
Will try and let you know.
Paul
It is more important to do the right thing, than to do the thing right.(P.Drucker)
Better is the enemy of good. (Montesquieu) = French version for 'kiss'.
Reply
#4
I tried MariaDb and PostgreSQL.
PostgreSQL is much too feature-rich for my needs, although you can connect easily to the db.

MariaDB seems a lot more convenient, but connection to the DB via python seems to be a pain in the ...
The internet is full of advice of how to connect (try this, do that, pip install this...) but I cannot get it to connect.
Mariadb is running as a service, something is preventing access. (Cannot connect to Localhost or 127.0.0.1 etc...
Unless somebody has a magical potion, it's a lot of hassle. Huh
Paul
It is more important to do the right thing, than to do the thing right.(P.Drucker)
Better is the enemy of good. (Montesquieu) = French version for 'kiss'.
Reply
#5
Mariadb neophyte here, but why not use this pypi package https://pypi.org/project/mariadb/ ? The installation looks straightforward...
Reply
#6
Hi Gribouillis.

pip install MariaDB
was the first and obvious thing I did.
And the recommended codelines to make it work. To no avail.

It is striking how many problems are posted, connecting to MariaDB.
It's a bit like python and MSaccess.

thx,
Paul
It is more important to do the right thing, than to do the thing right.(P.Drucker)
Better is the enemy of good. (Montesquieu) = French version for 'kiss'.
Reply
#7
Can you connect the database from outside Python, for example from a terminal?
Reply
#8
Smile 
Yes I can, also via the sidekick SQL viewer package you get with the installation.
They call it "Heidi". Smile
I just uninstalled and reinstalled MariaDB for the 3rd time and tried the simplest possible connection for the 10th time.
Suddenly it worked.... Why is anybody's guess.

import mariadb
connection= mariadb.connect(user="root", host="localhost", password="123")
print(connection.character_set)
print(connection.database)

Now I need to test how fast 550.000 records can be inserted.
Thx,
Paul
It is more important to do the right thing, than to do the thing right.(P.Drucker)
Better is the enemy of good. (Montesquieu) = French version for 'kiss'.
Reply
#9
I sent you some magnetic fluid through this thread. The most temperemental software surrender to it.
Reply
#10
@ Gribouiilis: keep that Magnetic fluid coming !
I just tested the insert speed, and it is quite a change from SQLite3.

- 8 VARCHAR fields, no index, 550.000 records "committed" into a table
- SSD disk
Average speed = 10.000 records / 12 seconds, almost 10 minutes total.
I need to check but Heidi (!) tells me all the records are there, and shows them.
thx,
Paul
It is more important to do the right thing, than to do the thing right.(P.Drucker)
Better is the enemy of good. (Montesquieu) = French version for 'kiss'.
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020