Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
from-to search
#11
I would recommend to have a look to the hdf5 format using h5py library, and its useful viewer.

You can record, read, modufy etc.. array, pictures (that is no more than an array), etc.

Paul
Reply
#12
Ah well, it's "the original "bracketing" problem." that I don't understand.

What's the problem?

I believe databases can store images. (I never had the need.)
Reply
#13
Hi Guys, I appreciate your help in trying to find a solution.
I have now at least 5 viable solutions, of which I implemented a few for test purposes.
They work beautifully for 100 records.

The thing is that they have given me 150.000 scanned images, for starters.
Any solution needs to take speed into account.
It takes ages to update these kinds of volumes into an sql database,
and if you want an index you can wait until the cows come home.

I have even attempted to put the infotext into the metadata of the scan (tif file), using pyexif and the likes.
Enough space in the exif comment section, but it looks a bit tentative.

I'll test until i find the fastest solution, whatever that may be.
Paul
It is more important to do the right thing, than to do the thing right.(P.Drucker)
Better is the enemy of good. (Montesquieu) = French version for 'kiss'.
Reply
#14
MySQL is very fast. It is built to handle very large amounts of data. If MySQL is slow, I think you are doing something wrong.

I don't have any very large tables, but this table has 86 columns and 826 rows, containing students' answers to online classwork.
The number of rows increases each week, as the students send in new answers.

I connect to my cloud server using phpMyAdmin.

This query executes in 0.0008 seconds.

Quote:SELECT * FROM allstudentsAnswers20BECW WHERE id = 826

Showing rows 0 - 0 (1 total, Query took 0.0008 seconds.)

I have a little webpage to display the correct answers. That fetches the question number, correct answers and the student's answers, creates an html table with three columns: question number, correct answer, student answer, runs a little javascript to mark wrong answers red. That also executes almost instantaneously.

If I had a million records, I imagine it might take a little longer, but not appreciably.

Quote:"It takes ages to update these kinds of volumes into an sql database,"

You said you have the data as Excel. Save as csv and import. If there is existing data, click the box "Update data when duplicate keys found on import (add ON DUPLICATE KEY UPDATE)" But the first time, there is no data.

After that, you can use pymysql to add new data as it arrives. From a csv of new data to be added, use Python to create INSERT or UPDATE queries for each new data.
Reply
#15
Just a reminder, all this is not homework, and not for profit. Smile
It is a number of volunteers trying to save genealogical data for posterity.
I came late to this situation, and have to find an acceptible way to put these dormant data to use .

If I follow the sql/mariadb route (see iBreeden, Pedroski) , that is a possibility, but it is always an extra step.
Excel lines are added, because scans are added. Now the excel has to be converted into sql.
However fast, it is extra, and the original bracheting problem remains to be adressed.
Saving the scans into the sql database is a non starter.
Although the use of a large SSD could be beneficial.
I did tests and i can insert 420.000 records per hour with this technology, (without an index field),
versus all night on a classic HDD. Still it's not a solution.

The excel file is small enough (500-1.000 lines as it stands) to read into memory and will not grow exponentially.
I can solve the bracketing issue in memory as well, no need for an extra sql step,
The many posts here prove to me that the original question of post #1,
"how can I simulate 1000 from-to switches", has no obvious answer.
My best attempt is the "vertical" solution, in memory.
thx,
Paul
It is more important to do the right thing, than to do the thing right.(P.Drucker)
Better is the enemy of good. (Montesquieu) = French version for 'kiss'.
Reply
#16
with openpyxl you can access spreadsheets directly
Pypi: https://pypi.org/project/openpyxl/
docs: https://openpyxl.readthedocs.io/en/stable/
source: https://foss.heptapod.net/openpyxl/openpyxl

pandas is a very good option for conversion to another format
pypi: https://pypi.org/project/pandas/
docs: https://pandas.pydata.org/pandas-docs/stable/
source: https://github.com/pandas-dev/pandas
Reply
#17
Larz, using Padas is obvious.
I can extract all the from-to brackets easily.
But pandas nor pypi will give me a substitute for switch case.
From-to is always tought of as something horizontal. x>= y and x< z
Vertical is easier in my case.
Not to worry, i'm about to test how my code will perform with 150.000 images;
Paul
It is more important to do the right thing, than to do the thing right.(P.Drucker)
Better is the enemy of good. (Montesquieu) = French version for 'kiss'.
Reply
#18
Well, I think you don't want a db, but for future reference, check out this link: how to store and retrieve images in a database: use BLOB

If you wish to make your images available to the general public, you will need a db.
Reply
#19
EPILOG.
(Jun-03-2022, 09:24 PM)Pedroski55 Wrote: If you wish to make your images available to the general public, you will need a db.
tKinter canvas does that job also, and very efficiently. Users think it's magic.

The only thing I had left to do was a (fast) link between image and description, the now famous from-to issue.
(Because the numbers between a "from"...and "to" have the same description)
You should never doubt Python. Last night i worded my searches for a solution differently,
and lo and behold, python comes pre-installed with something called "bisect".
It will give you the nearest number in a list, with various flavours (left,right, etc)
This is a 100% substitute for "from-to" switches, no if then else.., no loops, just 1 line of code, and lightning fast.
Consider it done.
Paul
ibreeden likes this post
It is more important to do the right thing, than to do the thing right.(P.Drucker)
Better is the enemy of good. (Montesquieu) = French version for 'kiss'.
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020