![]() |
from-to search - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: from-to search (/thread-37348.html) Pages:
1
2
|
from-to search - DPaul - May-30-2022 Hi, A directory contains, say, 100.000 images. They are filenamed: img000001.tif, img000002.tif,...... img099999.tif, etc. An excel sheet tells me where an image fits: A1, V1, img000001,(to), img000765 #any image between 1 and 765 belongs to cat A1, subcategory V1... A1, V2, img007584, (to), img008679 ... etc....(before you ask, it is more complicated, that is why this is not in the filename) This is a dynamic environment, as imgs are added from time to time, new lines are added to the excel sheet. When a user clicks an image, I need my program to tell him/her : that is from Category 77, Subcat 125 It would seem that the obvious way to do this is a switch statement. Two problems: 1) I am not going to type 1.000+ switch cases 2) And if I do, next week I'll have to change my code and add 10 more. If i do Lists[], no from-to available, or they would be 100.000+ items long. Seems unpractical. The same problem occurs if i use a dictionary. How else could I implement an efficient and dynamic from-to search ? thx, Paul RE: from-to search - Pedroski55 - May-30-2022 Maybe you could give a simple example, for those of us less familiar with your objective than yourself, say 9 images, 3 categories and 3 subcategories and how you want to add data. That way, things would be clearer and help, well, nearer. If this is a "How can I dynamically generate variables?" question, recently, it seems to me, that issue is raised a lot. Python don't do that. Advice: use PHP (in combination with MySQL) RE: from-to search - DPaul - Jun-01-2022 Ok, i did my best , maybe it is unclear. I feel that adding code makes it more complicated ![]() The question is almost like "how can i generate variables", but in reality it is "how can i generate 1000 "from/to" or "between" switches" without typing them. I see 2 solutions: a) Simply write a template program that will generate the code and write it into a python.py file. Quick and dirty .... works like a charm. b) Use a 2D list (sorted) and not think horizontally "from-to", but vertically. The from-to bracket in the excel file is on a horizontal line, but as every next item starts with the ending of the previous one "+1", a sorted list and some code might also do the trick. My only question was to find out how somebody would go about generating 1000 switches, without having to type them. Paul Update: now matter how many switch (from-to) cases you have, you don't need to code them if your from-to is numeric, but not necessarily contiguous. Just put all the "froms" in 2D list (second element is the value of the bracket). Sorted, every next element is the "to" of the previous from. This is also dynamic, because the list & values can be imported. RE: from-to search - Larz60+ - Jun-01-2022 it seems as though you have a category number (you mention A1) associated with each image. Keep a separate index containing this category and the image name and path, and sort on that, allowing images to remain stationary. something like sorted_index = sorted(s, key = lambda x: (x[1], x[2])) I think of city, state as an example where state could equated to category and city to the image. index: ...This will make it easy to create a catalog, and also to add, or remove, new images. on massively large collections, you can use a hash code in the index which will allow blazingly fast lookup. RE: from-to search - DPaul - Jun-01-2022 thx Larz, It has always been a possibility to make a massive index list (i.e. one entry per image) But I am somewhat afraid of keeping these large things in memory, and i'm trying to understand how hash codes would help me. The from-to helps me to reduce the lookup list by a large factor. Each item between the "from" and the "to" has the same category and subcategory. Allow me to use a very simple example: There are 100.000 images of 10.000 different actors in a directory. Each image is integer numbered, but does not contain the name of the actor/actress Fortunately brackets in an excel file (1-10 = "Stan Laurel", 11-22 = "Gloria Swanson", etc...) are available The user clicks on an image, and the name pops up. (In reality, the number of images can grow to a million, that is why the system needs to be dynamic) That's the situation. ![]() Paul RE: from-to search - Larz60+ - Jun-01-2022 I worked in call record processing for one of the largest telecommunications companies back in the 1990's. Back then calls were broken into one minute segments, and each minute was rated on not only on time of day, but by type of call as well (800, point to point, conference, etc.). To complicate matters, taxes had to be calculated for origination point as well as each destination. We processed 80 million calls per day and did that all in just 20 minutes using a hashing algorithm that was created, and kept as a separate index file , for all of the call segments ( as many as 80 million * (average of) 20 segments ). The part of the index that was hashed (the origination phone number) would be equivalent to your image name, not category which would be a small enough set that a hashing would not be necessary. I used a similar method to that in Aho's Compiler Principles ( https://en.wikipedia.org/wiki/Compilers:_Principles,_Techniques,_and_Tools ), with modifications that used a dynamic hash table size (by list linking collisions). Each hash code was reproducible when reusing the same key It was blazingly fast. Not trivial to write, but was worth the effort. RE: from-to search - Pedroski55 - Jun-02-2022 Maybe I don't understand what you want correctly. An Excel file is a form of database. All the data you need is in there, howsoever it got there, presumably added by hand. All you gotta do is read it. Is that correct? If you had a MySQL table with the columns: id, imagename, Category, Sub_Category, any other columns you need. Isn't it very easy to pull out the data you want? Pass an image name via PHP to a SELECT query: SELECT id, Category, Sub_Category FROM my_images WHERE imagename = 'an_image1.tiff' RE: from-to search - DPaul - Jun-02-2022 @Larz: ok, i wil look into "hashing", new territory for me. @Pedro: yes excel, mariadb, sql, i can put those basic data anywhere, and query them, but that does not solve the original "bracketing" problem. The simplest solution is to create a database table with the filename of the image, and the text in the second field. Now i can do a 1 on 1 query, piece of cake. But this implies 100thousands of identical text fields. Come to think of it, I could write a one to many setup, one name, 56 picture filenames (join). I need to go back to the drawing board. thx, Paul RE: from-to search - ibreeden - Jun-02-2022 Indeed a database would be much better than an Excel sheet. Of course a database table with the filename of the image and the text in the second field would be most efficient for searching but I understand your problem with the size of such a database. Perhaps you can use the following principle which middles between size and speed. import sqlite3 create_statements = [] create_statements.append("""create table if not exists photos (low integer, high integer, name text)""") # No need for sorting when we use indexes. create_statements.append("""create index if not exists photo_idx01 on photos(low)""") create_statements.append("""create index if not exists photo_idx02 on photos(high)""") insert_values = [] insert_values.append((1, 10, "Stan Laurel")) insert_values.append((11, 22, "Gloria Swanson")) # Create or open a database (inventory.db) connection = sqlite3.connect("inventory.db") cursor = connection.cursor() for statement in create_statements: cursor.execute(statement) insert_statement = """insert into photos (low, high, name) values (?, ?, ?)""" for row in insert_values: cursor.execute(insert_statement, row) # Commit the changes connection.commit() select_statement = """select name from photos where ? between low and high""" for id in range(1, 22): cursor.execute(select_statement, (id,)) print(cursor.fetchone()[0]) # Close the connection connection.close()
RE: from-to search - DPaul - Jun-02-2022 @ibreeden: yes, the sql BETWEEN statement is tempting, i don't know how fast it is, i'll try. The downside is that this is a dynamic system, users enter extra lines & images (scans actually) from time to time. Excel is their favorite tool ![]() Now we have an extra step to get the data into sql, or write a gui for the users. thx, paul |