Python Forum
from-to search - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: from-to search (/thread-37348.html)

Pages: 1 2


from-to search - DPaul - May-30-2022

Hi,
A directory contains, say, 100.000 images.
They are filenamed: img000001.tif, img000002.tif,...... img099999.tif, etc.

An excel sheet tells me where an image fits:
A1, V1, img000001,(to), img000765 #any image between 1 and 765 belongs to cat A1, subcategory V1...
A1, V2, img007584, (to), img008679 ...
etc....(before you ask, it is more complicated, that is why this is not in the filename)

This is a dynamic environment, as imgs are added from time to time, new lines are added to the excel sheet.
When a user clicks an image, I need my program to tell him/her : that is from Category 77, Subcat 125
It would seem that the obvious way to do this is a switch statement.
Two problems:
1) I am not going to type 1.000+ switch cases
2) And if I do, next week I'll have to change my code and add 10 more.
If i do Lists[], no from-to available, or they would be 100.000+ items long. Seems unpractical.
The same problem occurs if i use a dictionary.
How else could I implement an efficient and dynamic from-to search ?
thx,
Paul


RE: from-to search - Pedroski55 - May-30-2022

Maybe you could give a simple example, for those of us less familiar with your objective than yourself, say 9 images, 3 categories and 3 subcategories and how you want to add data.

That way, things would be clearer and help, well, nearer.

If this is a "How can I dynamically generate variables?" question, recently, it seems to me, that issue is raised a lot. Python don't do that.

Advice: use PHP (in combination with MySQL)


RE: from-to search - DPaul - Jun-01-2022

Ok, i did my best , maybe it is unclear.
I feel that adding code makes it more complicated Smile
The question is almost like "how can i generate variables", but
in reality it is "how can i generate 1000 "from/to" or "between" switches"
without typing them.
I see 2 solutions:
a) Simply write a template program that will generate the code and write it into
a python.py file. Quick and dirty .... works like a charm.
b) Use a 2D list (sorted) and not think horizontally "from-to", but vertically.
The from-to bracket in the excel file is on a horizontal line, but as
every next item starts with the ending of the previous one "+1",
a sorted list and some code might also do the trick.

My only question was to find out how somebody would go about
generating 1000 switches, without having to type them.
Paul
Update: now matter how many switch (from-to) cases you have, you don't need to code
them if your from-to is numeric, but not necessarily contiguous. Just put all the "froms"
in 2D list (second element is the value of the bracket). Sorted, every next element is
the "to" of the previous from. This is also dynamic, because the list & values can be imported.



RE: from-to search - Larz60+ - Jun-01-2022

it seems as though you have a category number (you mention A1) associated with each image.
Keep a separate index containing this category and the image name and path, and sort on that, allowing images to remain stationary. something like sorted_index = sorted(s, key = lambda x: (x[1], x[2]))

I think of city, state as an example where state could equated to category and city to the image.

index:
Output:
State City City relative data path ------ ------------------------- ------------------------- .. ............... ./data/State/City
...

This will make it easy to create a catalog, and also to add, or remove, new images.

on massively large collections, you can use a hash code in the index which will allow blazingly fast lookup.


RE: from-to search - DPaul - Jun-01-2022

thx Larz,
It has always been a possibility to make a massive index list (i.e. one entry per image)
But I am somewhat afraid of keeping these large things in memory, and i'm trying to understand how hash codes would help me.
The from-to helps me to reduce the lookup list by a large factor. Each item between the "from" and the "to" has the
same category and subcategory.
Allow me to use a very simple example:
There are 100.000 images of 10.000 different actors in a directory.
Each image is integer numbered, but does not contain the name of the actor/actress
Fortunately brackets in an excel file (1-10 = "Stan Laurel", 11-22 = "Gloria Swanson", etc...) are available
The user clicks on an image, and the name pops up.
(In reality, the number of images can grow to a million, that is why the system needs to be dynamic)
That's the situation. Smile
Paul


RE: from-to search - Larz60+ - Jun-01-2022

I worked in call record processing for one of the largest telecommunications companies back in the 1990's.

Back then calls were broken into one minute segments, and each minute was rated on not only on time of day, but by type of call as well (800, point to point, conference, etc.). To complicate matters, taxes had to be calculated for origination point as well as each destination.

We processed 80 million calls per day and did that all in just 20 minutes using a hashing algorithm that was created, and kept as a separate index file , for all of the call segments ( as many as 80 million * (average of) 20 segments ).

The part of the index that was hashed (the origination phone number) would be equivalent to your image name, not category which would be a small enough set that a hashing would not be necessary.

I used a similar method to that in Aho's Compiler Principles ( https://en.wikipedia.org/wiki/Compilers:_Principles,_Techniques,_and_Tools ), with modifications that used a dynamic hash table size (by list linking collisions). Each hash code was reproducible when reusing the same key

It was blazingly fast.

Not trivial to write, but was worth the effort.


RE: from-to search - Pedroski55 - Jun-02-2022

Maybe I don't understand what you want correctly.

An Excel file is a form of database. All the data you need is in there, howsoever it got there, presumably added by hand.

All you gotta do is read it. Is that correct?

If you had a MySQL table with the columns: id, imagename, Category, Sub_Category, any other columns you need.

Isn't it very easy to pull out the data you want? Pass an image name via PHP to a SELECT query:

SELECT id, Category, Sub_Category FROM my_images WHERE imagename = 'an_image1.tiff'


RE: from-to search - DPaul - Jun-02-2022

@Larz: ok, i wil look into "hashing", new territory for me.
@Pedro: yes excel, mariadb, sql, i can put those basic data anywhere,
and query them, but that does not solve the original "bracketing" problem.

The simplest solution is to create a database table with the filename of the image,
and the text in the second field. Now i can do a 1 on 1 query, piece of cake.
But this implies 100thousands of identical text fields.

Come to think of it, I could write a one to many setup, one name, 56 picture filenames (join).

I need to go back to the drawing board.
thx,
Paul


RE: from-to search - ibreeden - Jun-02-2022

Indeed a database would be much better than an Excel sheet. Of course a database table with the filename of the image and the text in the second field would be most efficient for searching but I understand your problem with the size of such a database.
Perhaps you can use the following principle which middles between size and speed.
import sqlite3

create_statements = []
create_statements.append("""create table if not exists photos
        (low integer,
        high integer,
        name text)""")
# No need for sorting when we use indexes.
create_statements.append("""create index if not exists photo_idx01
        on photos(low)""")
create_statements.append("""create index if not exists photo_idx02
        on photos(high)""")

insert_values = []
insert_values.append((1, 10, "Stan Laurel"))
insert_values.append((11, 22, "Gloria Swanson"))

# Create or open a database (inventory.db)
connection = sqlite3.connect("inventory.db")
cursor = connection.cursor()

for statement in create_statements:
    cursor.execute(statement)

insert_statement = """insert into photos 
                      (low, high, name)
                      values
                      (?, ?, ?)"""

for row in insert_values:
    cursor.execute(insert_statement, row)

# Commit the changes
connection.commit()

select_statement = """select name from photos
                      where ? between low and high"""
for id in range(1, 22):
    cursor.execute(select_statement, (id,))
    print(cursor.fetchone()[0])

# Close the connection
connection.close()
Output:
Stan Laurel Stan Laurel Stan Laurel Stan Laurel Stan Laurel Stan Laurel Stan Laurel Stan Laurel Stan Laurel Stan Laurel Gloria Swanson Gloria Swanson Gloria Swanson Gloria Swanson Gloria Swanson Gloria Swanson Gloria Swanson Gloria Swanson Gloria Swanson Gloria Swanson Gloria Swanson



RE: from-to search - DPaul - Jun-02-2022

@ibreeden: yes, the sql BETWEEN statement is tempting, i don't know how fast it is, i'll try.
The downside is that this is a dynamic system, users enter extra lines & images (scans actually) from time to time.
Excel is their favorite tool Rolleyes .
Now we have an extra step to get the data into sql, or write a gui for the users.
thx,
paul