Python Forum

Hello!

For a project I am in need of a specific data structure. I am not sure how to call this or where to find this so I hope someone can help me. In the project I detecting objects from a live stream. These detections have a id, location and a timestamp.

When the system detects an object in a new location it must create a new case with an ID. When it detects an object in a location it have seen before, it should just add a timestamp to this location. Example:

ID: 1 loc:xyxy 12:32
13:30
15:22

ID: 2 loc:xyxy 12:22
19:00

When a new detections comes in, its location is not in database, it should create a new ID and add the time stamp. When it sees that there is already a detection with that location, it should just add the time stamp.

I think it would be useful to store this in csv files but I do not know what the best way is to do this. Can someone help me out or help me to think in the right direction?

Thanks in advance!
Josef

why not sqlite database? It seems perfect for this use case and better than the csv and python support it right from the standard library - https://docs.python.org/3/library/sqlite3.html. It will allow to query if ID is already in the DB (or directly insert if not exists, add time stamps, etc. If in plain text file - csv, json (that would be better than csv), etc. you will need to keep everything in memory and save to disc regularly to prevent data loss.

In a sqlite db I see two tables - one for id and loc and one for id and timestamps

Mmh okay I had a look into this, but I am not really familiar with sql. How can I make the ID dependent on the location from the other table?

(May-16-2019, 07:46 AM)JosefFilosopio Wrote: [ -> ]Mmh okay I had a look into this, but I am not really familiar with sql. How can I make the ID dependent on the location from the other table?

I may have misunderstood your goal. id is linked to a location or object (i.e. each objects at same location have same id?). in other words what is primary - object or location. same object at different location will have what id?

I made a image of how i think it should work Image

In the second table the CaseID is dependent on the location of the detection
The N in the first table is dependent on the amount of records with the same CaseID if you understand what I mean

Here is very basic example. NOTE: it's not the best way to structure the code - a lot of repeating code (you should go OOP for example). It's just to give you idea

import random
import sqlite3

sqlite_file = 'sample.sqlite'

def setup(sqlite_file):
    conn = sqlite3.connect(sqlite_file)
    cur = conn.cursor()
    cur.execute('CREATE TABLE IF NOT EXISTS locations (loc TEXT, unique (loc))')
    cur.execute('CREATE TABLE IF NOT EXISTS timestamp (case_id INTEGER, datetime TEXT)')
    conn.commit()
    conn.close()
    
def add_record(sqlite_file, loc):
    conn = sqlite3.connect(sqlite_file)
    cur = conn.cursor()
    cur.execute('INSERT OR IGNORE INTO locations (loc) VALUES (?)', (loc,))
    cur.execute('INSERT INTO timestamp (case_id, datetime) VALUES ((SELECT rowid from locations WHERE loc=?), CURRENT_TIMESTAMP)', (loc,)) # NOTE timestamp would be GMT
    conn.commit()
    conn.close()
    
def get_num_detections(sqlite_file,loc):
    conn = sqlite3.connect(sqlite_file)
    cur = conn.cursor()
    cur.execute('SELECT COUNT(a.rowid) FROM timestamp as a JOIN locations as b ON a.case_id = b.rowid WHERE b.loc = ?', (loc, ))
    result = cur.fetchone()[0]
    conn.close()
    return result
    
def get_stat(sqlite_file):
    conn = sqlite3.connect(sqlite_file)
    cur = conn.cursor()
    cur.execute('SELECT b.loc, COUNT(a.rowid) FROM timestamp as a JOIN locations as b ON a.case_id = b.rowid GROUP BY b.loc ORDER BY b.loc ASC')
    result = cur.fetchall()
    conn.close()
    return result
    
def get_detections(sqlite_file,loc):
    conn = sqlite3.connect(sqlite_file)
    cur = conn.cursor()
    cur.execute('SELECT b.loc, datetime(datetime, "localtime") FROM timestamp as a JOIN locations as b ON a.case_id = b.rowid WHERE b.loc = ?', (loc, ))
    result = cur.fetchall()
    conn.close()
    return result
    
    
# initial setup of db tables if not exists
setup(sqlite_file)

#just 4 sample locations
locations = ['loc A', 'loc B', 'loc C', 'loc D']

#add some random observations, in production it would come from your live stream
for obj in range (100):
    loc = random.choice(locations)
    add_record(sqlite_file, loc)

# example of query DB    
print(get_num_detections(sqlite_file, 'loc A'))
for loc, num in get_stat(sqlite_file):
    print(f'{loc} --> {num}')

print()
for loc, tstamp in get_detections(sqlite_file, loc='loc A'):
    print(f'{loc} --> {tstamp}')

There are tools that you can use to view and manage the DB, e.g. https://sqlitestudio.pl/ but also many others

(May-16-2019, 08:55 AM)JosefFilosopio Wrote: [ -> ]he N in the first table is dependent on the amount of records with the same CaseID if you understand what I mean

You should not store number of detections in the table. These statistics should be derived in a query, so that it's not possible to have discrepancy between real number of detections stored in the DB and the count if stored in the DB

I just seen your other thread. before that I didn't really understand your example, e.g. 1,2,3,4. that's why i used sample location names.
given that these are 4 numbers it may be better if you have 4 columns xmin, ymin, xmax, ymax. You may keep loc column as descriptive name for location.
or you can join all 4 numbers in a single str with some separator and store this string in loc (instead of 'loc A' etc)

Hello Buran,

The last day I have been playing around with your code and managed to make it suitable for my project. I want to thank you a lot for helping me!

Josef

JosefFilosopio

buran

JosefFilosopio

buran

JosefFilosopio

buran

JosefFilosopio