Trying to sort a MYSQL Database

PierreSoulier · Jul-04-2018, 08:39 AM

Hello, (first of all sorry for my english, I'm french **biggrin** )

I have a database in MySQL with this table:

CREATE TABLE Tri(
id VARCHAR(200) PRIMARY KEY,
nom TEXT,
adresse TEXT,
CP TEXT,
ville TEXT,
details TEXT,
organisme_gest TEXT,
organisme_gest_associe TEXT,
tel TEXT,
web TEXT,
mail TEXT,
date_d TEXT,
horaire TEXT,
photo TEXT,
date_f TEXT,
handicap TEXT
);
This basicly is storing data for event in France. In details there is all the information about the event.
I actually use only the id and details in my program.

I want to sort the database using keywords. To do so i have an idea:

Creating the list of the keyword (already done):

#MC = ['route','autoroute','éclairage','public','parking','couvert','non','stationnement','rue','habitat',
#          'logement','transport','bus','metro','tramway','train','avion','piste','cyclable','aire','jeu','enfant',
#          'crêche','parentale','haltes','garderies','jardin','d\éveil','assistante maternelle',
#          'école maternelle','école élémentaire','collège','lycée professionel','lycée général','lycée technologi',
#          'lycée polyvalent','université','fac','campus','étudiant','centre','formation','sport','scolaire',
#          'loisir','école','centre','aéré','rythme','péri','extra','MDPH','CDAPH','IME','IEM','ITEP','SESSAD',
#          'FAM','SAVS','SAMSAH','ESAT','maladie','mentale','soin','infirmier','CLIC','aidant','EHPAD','alzheimer',
#          'pôle','accueil','familial','domicile','agé','CDD','senior','loisir','CHRS','hebergement','urgence','asile',
#          'femme','batue','addictologie','desintoxication','insertion','remobilisation','aide','alimentaire','colis',
#          'urgence','RSA','ASE','aide','socia','enfan','protection','enfance','CAF','allocation','famill','aide',
#          'devoir','soutien','MJC','maison','jeune','majeur','accès','soin','APL','résidence','universitaire',
#          'CLE','hebergement','scolaire','mission','locale','Crous','financ','bourse','pharmacie','médicament',
#          'santé','hôpita','hospital','clinique','ambulance','formation','sanitaire','toilette','douche',
#          'patrimoine','château','exposition','collection','musée','art','contemporain','cinéma','projection','film',
#          'télévision','reportage','salsa','tango','danse','hip-hop','bourrée','farandole','rock-and-roll','twist',
#          'valse','bal','musette','country','flamenco','concert','musique','hestiv\'oc','concert','musique','hestiv\'oc',
#          'opéra','théatre','cirque','artifice','illumination','pyrotechnie','pyromusical','carnaval','urbain','défilé',
#          'médiathèque','bibliothèque','livre','lecture','ludothèque','baseball','basket-ball','football',
#          'football américain','handball','hockey','pétanque','boulodrome','rugby','volley-ball','athlétisme',
#          'ball-trap','poing','billard','snooker','bowling','quille','echec','equitation','cheval','equestre',
#          'ecrime','fleuret','epée','gymnastique','musculation','haltérophilie','culturisme','javelot','pentathlon',
#          'roller','skate','roulette','pistolet','carabine','arc','triathlo','twirling','majorette','aikido','aikibudo',
#          'budo','boxe','ring','savate','full contact','judo','kendo','jujitsu','karaté','martiaux','kick','boxing',
#          'lutte','Muaythai','teakwondo','tai','chuan','gong','badminton','tambourin','paume','padel','pelote',
#          'pala','squash','tennis','pong','ping','aviron','canöe','kayak','char','voile','plongée','apnée','sous',
#          'marin','joute','sauvetage','nautique','motonautique','jetski','natation','nage','piscine','pêche','ski',
#          'nautique','surf','voile','aéromodélisme','aéronautique','aérostation','camarguaise','landaise','toromachie',
#          'cyclisme','cyclotourisme','giraviation','golf','escalade','motocyclisme','parachutisme','ULM','pédestre',
#          'sauvetage','secourisme','ski','spéléologie','automobile','glace','patinage','traineau','pulka','parapente',
#          'planeur','état','civil','fiscal','impôt','taxe','administation','numérique','démocratie','participative',
#          'participatif','assainissement','ATMO','déchêt','collecte','traitement','déchetterie','recyclage',
#          'gendarmerie','commisariat','police','municipale','sécurité','ASVP','SDIS','pompiers','parc','potager',
#          'lac','étang','rivière','marais','site','naturel','biodiversité','ENS','forêt','bois','point de vue',
#          'table d\'orientation','chemin','pédestre','propre','netto','énergie','chauffage','électricité',
#          'écologie','renouvable','durable','environnement','économie','innovation','commerc','artisan',
#          'fabication','tourisme','vin','gastronomie','emploi','chômage','job','recrute','entreprise','industrie',
#          'entreupreun','création','agricult','agricole','coopération','international']

And i want to count the occurence of every word in every details. Depending on the word that appears the most i will put the data in a new database(I can manage this part).

The only problem is how to do it efficiently? I've tried this program here:

QUERY="""SELECT details from Tri"""
cur.execute(QUERY)
#result est une liste contenant les détails
result_details=cur.fetchall()
QUERY="""SELECT id from Tri"""
cur.execute(QUERY)
#result est une liste contenant les id
result_id=cur.fetchall()
#On teste les mots clés un par un
MC = ['route',...,'international'] #as writen above
          
          
for i in result_details :
    #création d'un tableau: 1ère colonne: mot-clé 2ème colonne: nb d'occurence
    OCCUR=[0]*len(MC)
    for j in range(len(MC)-1):
        OCCUR[j] = i.count(MC[j])
#Distinction de tous les mots qui sont apparus un maximum de fois
    #Liste qui contient tous les mots apparus un maximum de fois
    MA=[]
    #Occurence du mot étant apparu un nombre maximal de fois
    occ=0
    for k in range(len(MC)-1):
        if OCCUR[k] > occ:
            occ = OCCUR[k]
            MA = [MC[k]]
        elif OCCUR[k] == occ:
            MA.append(MC[k])

But i tried to run the program for an entire night and it keep running....

In the end i don't know if the program is running properly and/or if I do it right!

Do you have any suggestions?

Thank you so much in advance :)

**Larz60+** · Jul-04-2018, 08:51 AM

You don't sort a database, that's one of the reasons you use it.
Instead, you add a ORDER BY clause to your Query which sorts the output of the query:
example:

for oby in MC:
    cur.execute("SELECT FROM Tri VALUES (%s, %s, %s, ... ORDER BY %)", (var1, var2, var3. ..., oby))
    ...

Something like above.

PierreSoulier · Jul-04-2018, 08:57 AM

I think you don't get the point, maybe because i'm having a hard time speaking english.

Details is a description, a list of word if you want.

For each data i'm counting how many time each word appear in the text. In the end i want to know which word has appeared most of the time.

**Larz60+** · (This post was last modified: Jul-04-2018, 09:20 AM by Larz60+.)

It would be easier to get a count of all words in result_details, using:

from collections import Counter
...
# If not already a list:
delimiter = '|' # Or whatever it is
data_list = result_details.strip().split(delimiter)
most_common_list = Counter(data_list).most_common()

And then find your list of words in most_common_list
If it's already a list, use directly. You will have to match ID to your list, but that's simple enough using index

PierreSoulier · Jul-04-2018, 11:35 AM

I want to count words that has appeared most of the time and that are in my MC list.
If I understand it well, your program only search for the word that has appeared most of the time, either or not it is in MC, right?

**Larz60+** · Jul-04-2018, 04:19 PM

No, it will provide a list of all words in most used order. You can match this to your list simply:

untested and unfinished code:

from collections import Counter
...
# If not already a list:
delimiter = '|' # Or whatever it is
data_list = result_details.strip().split(delimiter)
most_common_list = Counter(data_list).most_common()

mc_dict = dict((x, y) for x, y in most_common)

print(mc_dict)
matches = []

for idx, word in enumerate(MC):
    try:
        count = mc_dict[word]
        matches.append(result_id[idx], word, count)
    except:
        continue

print(matches)

matches will contain a list of all your MC entries that were found with corresponding id and a count of the number of times occurred.

PierreSoulier · (This post was last modified: Jul-05-2018, 08:18 AM by PierreSoulier.)

Thank you for the answer :) I think i get your idea.

However result_details is a tuple composed with string of word, each word separated with ' ' (space).
So i don't really understand what you are trying to do on line 5?

Thank you for taking your time to answer :)!

I've tried something like that based on your program:

for text in result_details:
    # If not already a list:
    delimiter = ' ' 
    text=''.join(text) #from tuple to string
    data_list = text.strip().split(delimiter)
    most_common_list = Counter(data_list).most_common(5)
     
    mc_dict = dict((x, y) for x, y in most_common_list)
     
    print(mc_dict)
    matches = []
     
    for idx, word in enumerate(MC):
        try:
            count = mc_dict[word]
            matches.append(result_id[idx], word, count)
        except:
            continue
     
    print(matches)

But this is what i get in return:

{'le': 3, '(jazz': 1, 'de': 6, '\xc3\xa0': 2, 'samedi': 2}
[]
{'vent': 1, '8h': 1, 'de': 4, 'Le': 1, 'plein': 1}
[]
{':': 19, 'Prix': 20, 'du': 12, 'de': 33, '-': 13}
[]
{'ville': 1, 'de': 2, ':': 5, 'pour': 2, 'la': 2}
[]
{'12h': 1, 'et': 3, 'de': 3, 'sur': 3, '\xc3\xa0': 2}
[]
{'\xc3\xa0': 3, 'et': 3, 'de': 6, 'pour': 3, 'les': 5}
[]
{'et': 4, 'de': 6, '\xc3\xa0': 7, 'du': 4, 'la': 7}
[]
{'tir\xc3\xa9': 1, 'port.': 1, 'Feu': 1, 'du': 1, "d'artifice": 1}
[]
{'': 1, 'et': 5, 'au': 2, '18': 2, 'la': 3}
[]
{'et': 2, 'des': 2, 'de': 7, 'le': 2, 'la': 2}
[]
{'60': 1, 'Loto': 1, 'des': 1, '56': 2, '06': 2}
[]

So it seems like the first part is working great but it can't stock data in matches
I've updated the MC list with very common words as 'le','la','et' etc... but i still get matches=[]

**Larz60+** · Jul-05-2018, 10:23 AM

you can reverse the order of the dictionary
example:

data = [('Genres','Action'), ('Adventure','Action'), ('Adventure', 'Animation'), ('Adventure', 'Animation'), ('Comedy', 'Drama'),  ('Action','Adventure'), ('Fantasy', 'Science Fiction')]

mc_dict = dict((y, x) for x, y in data)

print(mc_dict)

gives you:

Output:
{'Action': 'Adventure', 'Animation': 'Adventure', 'Drama': 'Comedy', 'Adventure': 'Action', 'Science Fiction': 'Fantasy'}

PierreSoulier · (This post was last modified: Jul-05-2018, 01:27 PM by PierreSoulier.)

Ok thank you for your help, i've with this working program:

 db=MySQLdb.connect(host='localhost',user='root',passwd='eisti0001',db='Citela')
cur=db.cursor()
QUERY="""SELECT details from Tri"""
cur.execute(QUERY)
#result est une liste contenant les détails
result_details=cur.fetchall()
QUERY="""SELECT id from Tri"""
cur.execute(QUERY)
#result est une liste contenant les id
result_id=cur.fetchall()
QUERY="""SELECT nom from Tri"""
cur.execute(QUERY)
#result est une liste contenant les titre
result_nom=cur.fetchall()
#On teste les mots clés un par un
MC = ['route','autoroute','éclairage','public','parking','couvert','stationnement','rue','habitat',
          'logement','transport','bus','metro','tramway','train','avion','piste','cyclable','aire','jeu','enfant','enfants'
          'crêche','parentale','haltes','garderies','jardin','d\éveil','assistante maternelle',
          'école maternelle','école élémentaire','collège','lycée professionel','lycée général','lycée technologi',
          'lycée polyvalent','université','fac','campus','étudiant','centre','formation','sport','scolaire',
          'loisir','école','centre','aéré','rythme','péri','extra','MDPH','CDAPH','IME','IEM','ITEP','SESSAD',
          'FAM','SAVS','SAMSAH','ESAT','maladie','mentale','soin','infirmier','CLIC','aidant','EHPAD','alzheimer',
          'pôle','accueil','familial','domicile','agé','CDD','senior','loisir','CHRS','hebergement','urgence','asile',
          'femme','batue','addictologie','desintoxication','insertion','remobilisation','aide','alimentaire','colis',
          'urgence','RSA','ASE','aide','socia','enfan','protection','enfance','CAF','allocation','famill','aide',
          'devoir','soutien','MJC','maison','jeune','majeur','accès','soin','APL','résidence','universitaire',
          'CLE','hebergement','scolaire','mission','locale','Crous','financ','bourse','pharmacie','médicament',
          'santé','hôpita','hospital','clinique','ambulance','formation','sanitaire','toilette','douche',
          'patrimoine','château','exposition','collection','musée','art','contemporain','cinéma','projection','film',
          'télévision','reportage','salsa','tango','danse','hip-hop','bourrée','farandole','rock-and-roll','twist',
          'valse','bal','musette','country','flamenco','concert','hestiv\'oc','musique','hestiv\'oc',
          'opéra','théatre','cirque','artifice','illumination','pyrotechnie','pyromusical','carnaval','urbain','défilé',
          'médiathèque','bibliothèque','livre','lecture','ludothèque','baseball','basket-ball','football',
          'football américain','handball','hockey','pétanque','boulodrome','rugby','volley-ball','athlétisme',
          'ball-trap','poing','billard','snooker','bowling','quille','echec','equitation','cheval','equestre',
          'ecrime','fleuret','epée','gymnastique','musculation','haltérophilie','culturisme','javelot','pentathlon',
          'roller','skate','roulette','pistolet','carabine','arc','triathlo','twirling','majorette','aikido','aikibudo',
          'budo','boxe','ring','savate','full contact','judo','kendo','jujitsu','karaté','martiaux','kick','boxing',
          'lutte','Muaythai','teakwondo','tai','chuan','gong','badminton','tambourin','paume','padel','pelote',
          'pala','squash','tennis','pong','ping','aviron','canöe','kayak','char','voile','plongée','apnée','sous',
          'marin','joute','sauvetage','nautique','motonautique','jetski','natation','nage','piscine','pêche','ski',
          'nautique','surf','voile','aéromodélisme','aéronautique','aérostation','camarguaise','landaise','toromachie',
          'cyclisme','cyclotourisme','giraviation','golf','escalade','motocyclisme','parachutisme','ULM','pédestre',
          'sauvetage','secourisme','ski','spéléologie','automobile','glace','patinage','traineau','pulka','parapente',
          'planeur','état','civil','fiscal','impôt','taxe','administation','numérique','démocratie','participative',
          'participatif','assainissement','ATMO','déchêt','collecte','traitement','déchetterie','recyclage',
          'gendarmerie','commisariat','police','municipale','sécurité','ASVP','SDIS','pompiers','parc','potager',
          'lac','étang','rivière','marais','site','naturel','biodiversité','ENS','forêt','bois','point de vue',
          'table d\'orientation','chemin','pédestre','propre','netto','énergie','chauffage','électricité',
          'écologie','renouvable','durable','environnement','économie','innovation','commerc','artisan',
          'fabication','tourisme','vin','gastronomie','emploi','chômage','job','recrute','entreprise','industrie',
          'entreupreun','création','agricult','agricole','coopération','international']
          
for i in range(len(result_details)):
    # If not already a list:
    text = (''.join(result_details[i])) + (''.join(result_nom[i]))
    delimiter = ' ' 
    #text=''.join(text) #from tuple to string
    text=text.lower() #avoid case
    data_list = text.strip().split()
    most_common_list = Counter(data_list).most_common()
    #print(most_common_list)
     
    mc_dict = dict((x, y) for x, y in most_common_list)
    #print('\n\n\n~~~~~~~~Mots et occurence des mots:')
    #print(mc_dict)
    matches = []
     
    for idx, word in enumerate(MC):
        try:
            count = mc_dict[word]
            matches.append((word, count))
        except:
            continue
    print('\n\n\nIdentifiant et mots clés trouvés')
    if not matches:
        print(result_id[i],':Aucun mot trouvé')
    else:     
        print(result_id[i],matches)

And here is what the result looks like:

Identifiant et mots clés trouvés
(('https://data.datatourisme.gouv.fr/20/d7b292c0-931d-301c-8c7b-4eae0b1bc8e1',), [('rue', 1), ('jardin', 3)])



Identifiant et mots clés trouvés
(('https://data.datatourisme.gouv.fr/20/d7c8ed18-4746-3a01-be47-17ce9534eb86',), [('musique', 2), ('sous', 1)])



Identifiant et mots clés trouvés
(('https://data.datatourisme.gouv.fr/20/d7e7bc31-1fce-3e7c-a7db-82bb3681b315',), ':Aucun mot trouv\xc3\xa9')



Identifiant et mots clés trouvés
(('https://data.datatourisme.gouv.fr/20/d8157157-94e4-3766-8ec0-caac8a0e4623',), [('sport', 1)])

With first the id then the word and its occurence
One last problem i have is that it display non encoded utf-8 character. I've tried many things but nothing seemed to work

**Larz60+** · Jul-05-2018, 03:38 PM

which version of python are you using?
Can you supply an example?

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Mysql and mysql.connector error	lostintime	2	683	Oct-03-2023, 10:25 PM Last Post: lostintime
	a.sort() == b.sort() all the time	3lnyn0	1	1,321	Apr-19-2022, 06:50 PM Last Post: Gribouillis
	Mysql error message: Lost connection to MySQL server during query	tomtom	6	16,089	Feb-09-2022, 09:55 AM Last Post: ibreeden
	mySQL Database error not resolving.	cybertooth	2	3,220	Aug-30-2021, 05:45 PM Last Post: ibreeden
	SaltStack: MySQL returner save less data into Database table columns	xtc14	2	2,168	Jul-02-2021, 02:19 PM Last Post: xtc14
	Problem updating value in MySQL database	dangermaus33	1	1,634	Nov-24-2020, 08:32 PM Last Post: dangermaus33
	chatterbot utf-8 errors with mysql database	isolatedastronaut	0	1,584	Nov-08-2020, 06:54 AM Last Post: isolatedastronaut
	MYSQL how to assign a table for each user in my username table database?	YoshikageKira	1	2,813	Dec-26-2019, 05:57 AM Last Post: buran
	How to return value from a stored procedure of a MySQL database?	UtiliseIT	0	2,335	Jul-22-2019, 10:14 AM Last Post: UtiliseIT
	Need help saving data into MySQL database	reezalaq	0	2,401	Jun-03-2018, 07:50 PM Last Post: reezalaq

Trying to sort a MYSQL Database

User Panel Messages

Announcements