Python Forum
Validating information from .csv file before executemany
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Validating information from .csv file before executemany
#1
Hi,

I'm currently having an issue with my code. I have an option for a user to choose a .csv file with names and surnames of their students and this will be entered into the database. The data goes into the database fine as long as there are 'Name' and 'Surname' fields in the .csv file. This is an issue as if there is an invalid name in the .csv file (e.g. something with a number in), it will be entered into the database anyway, even if it shouldn't be allowed. I can't figure out how to validate the information before it goes into the database. Here is my code:

file = askopenfilename(parent = root)
if file.endswith('.csv'):
    with open(file, 'r') as names:
        reading = csv.DictReader(names)
        toDB = [(i['Name'], i['Surname']) for i in reading]
accounts = sqlite3.connect("accounts.db")
c = accounts.cursor()
c.executemany('''INSERT INTO Students (FirstName, Surname) VALUES (?, ?)''', toDB)
Can anyone help, please? <3
Reply
#2
Well, in general you would have some validation functions that return True is the name is valid, or False if it isn't. Then you would test on those before entering into the database:

file = askopenfilename(parent = root)
if file.endswith('.csv'):
    with open(file, 'r') as names:
        toDB = []
        reading = csv.DictReader(names)
        for line in reading:
            if valid_name(line['Name']) and valid_surname(line['Surname']):
               toDB.append((i['Name'], i['Surname']))
accounts = sqlite3.connect("accounts.db")
c = accounts.cursor()
c.executemany('''INSERT INTO Students (FirstName, Surname) VALUES (?, ?)''', toDB)
What goes into the functions valid_name and valid_surname is going to depend on how you define what a valid name is.
Craig "Ichabod" O'Brien - xenomind.com
I wish you happiness.
Recommended Tutorials: BBCode, functions, classes, text adventures
Reply
#3
Thanks for the help, but now it's not reading anything from the file. I also needed to fix the issue that the program would stop working if a file was entered with the wrong headings, and my solution worked, but for some reason, now that I have both of the pieces together, csv.DictReader doesn't seem to read anything. It's probably something stupid, but I can't figure out why it won't work anymore. Any more help? Thank you.

    if file.endswith('.csv'):
        with open(file, 'r') as names:
            titles = csv.reader(names)
            titles = list(titles)
            titles = titles[0]

            if titles == ['Name', 'Surname']:
                toDB = []
                reading = csv.DictReader(names)
                for row in reading:
                    if (row['Name']).isalpha() and (row['Surname']).isalpha():
                        toDB.append((i['Name'], i['Surname']))
            else:
                root = tk.Tk()
                error = tk.Label(root, text = "Error: invalid titles. Must be 'Name' and 'Surname'.")
                error.pack()
                root.after(5000, lambda: root.destroy())
                root.mainloop()
                root = tk.Tk()
                root.geometry("0x0")
                dbChoice(root)
Reply
#4
When you say it's not reading anything, what exactly do you mean? Is it going to the error in the else clause no matter what, or is it just not outputting anything? Also, where did you move the sql code to when you added the if/else? And please post a few lines of your name file we can test with.
Craig "Ichabod" O'Brien - xenomind.com
I wish you happiness.
Recommended Tutorials: BBCode, functions, classes, text adventures
Reply
#5
Use functions to separate code.
There should be one function to read the data.
Another function can check the suffix.
A third function can call this and handle all bubbling up exceptions and print them.
Using pathlib gives you more abstraction.
The csv_reader is an Iterator. You can use next(csv_reader) to get the next values.
The first row is the header.

Then you can still use the same Iterator in a for-loop.
The Iterator knows it's own position and when it has to stop.

Example where I catch all Exceptions and show it in a messagebox:
import tkinter
import tkinter.messagebox
from pathlib import Path
import csv


def csv_reader(file: Path) -> list:
    """
    This function expects a path object.
    Return: a list with [[name, surname], ...]
    """
    result = []
    with file.open() as csv_fd:
        reader = csv.reader(csv_fd)
        titles = next(reader)
        if titles != ['Name', 'Surname']:
            raise ValueError(f'Wrong titles: {titles}')
        for row in reader:
            if not row: # row is empty, just continue
                print('Empty row')
                continue
            if len(row) != 2:
                raise ValueError(f'Wrong CSV-Format. Row has too many columns. 2 columns are needed: {row}')
            # if row has more or less then 2 elements, the following code is not
            # executed
            # you can decide if you raise an error or just continue,
            # if wrong number of elements is in one row
            name, surname = row
            if name.isalpha() and surname.isalpha():
                # you can append row or [name, surname] or (name, surname)
                print('append', row)
                result.append(row)
    return result


def collect_data(file):
    """
    This function checks if the extension is right.
    Then it returns the data for the databes
    """
    file = Path(file)
    if file.suffix.endswith('.csv'):
        data = csv_reader(file)
    else:
        raise ValueError('Only .csv file extension is allowed')
    return data


def caller(root):
    """
    A function which calls collect data and
    catches all Exceptions which bullbe up.
    """
    try:
        data = collect_data('test.csv')
    except Exception as e:
        print('Error:', e)
        tkinter.messagebox.showerror('Error', e)
    else:
        # kill all children from bottom frame
        [c.destroy() for c in tuple(root.children.values())]
        # during destroying the children (labels), the size of the dict is changing
        # this is the cause, why tuple is used
        for row, (name, surname) in enumerate(data):
            tkinter.Label(root, text=name).grid(row=row, column=0)
            tkinter.Label(root, text=surname).grid(row=row, column=1)


# the caller tries to get the data
# if an error happens, the general Exception block catches this error.
# There you can add error messages to your GUI



root = tkinter.Tk()
t_frame = tkinter.Frame(root)
b_frame = tkinter.Frame(root)
t_frame.pack()
b_frame.pack()
tkinter.Label(t_frame, text='Name').grid(row=1, column=0)
tkinter.Label(t_frame, text='Surname').grid(row=1, column=1)
tkinter.Button(t_frame, text='Exit', command=root.destroy).grid(row=0, column=0)
tkinter.Button(t_frame, text='Load', command=lambda: caller(b_frame)).grid(row=0, column=1)

root.mainloop()
PS: Pay attention about the column delimiter. The standard is a comma, but in other regions a semicolon is use. Just use the keyword-argument delimiter, to define the right delimiter: csv.reader(file_object, delimiter=';')
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#6
(Apr-15-2019, 12:51 AM)ichabod801 Wrote: When you say it's not reading anything, what exactly do you mean? Is it going to the error in the else clause no matter what, or is it just not outputting anything? Also, where did you move the sql code to when you added the if/else? And please post a few lines of your name file we can test with.

Sorry - I didn't copy the entire code. It's not going to the error clause, it's just not putting anything into the database, and if I ask it to print what's in the file, it doesn't print anything. Here's the code:

file = askopenfilename(parent = root)
if file.endswith('.csv'):
    with open(file, 'r') as names:
        titles = csv.reader(names)
        titles = list(titles)
        titles = titles[0]

        if titles == ['Name', 'Surname']:
            toDB = []
            reading = csv.DictReader(names)
            for row in reading:
                if (row['Name']).isalpha() and (row['Surname']).isalpha():
                    toDB.append((i['Name'], i['Surname']))
        else:
            root = tk.Tk()
            error = tk.Label(root, text = "Error: invalid titles. Must be 'Name' and 'Surname'.")
            error.pack()
            root.after(5000, lambda: root.destroy())
            root.mainloop()
            root = tk.Tk()
            root.geometry("0x0")
            dbChoice(root)

else:
    root.destroy()
    root = tk.Tk()
    error = tk.Label(root, text = "Invalid file: can only be a .csv file")
    error.pack()
    root.after(5000, lambda: root.destroy())
    root.mainloop()
    root = tk.Tk()
    root.geometry("0x0")
    dbChoice(root)

accounts = sqlite3.connect("accounts.db")
c = accounts.cursor()
c.executemany('''INSERT INTO Students (FirstName, Surname) VALUES (?, ?)''', toDB)
Reply
#7
I think the problem is that you are trying to go through names twice. You can only iterate through a file object once. On line 5, when you do titles = list(titles), you pull the entire file into titles. Then when you try to do for row in reading: on line 11, there's nothing left in names (both titles and reading are csv readers based off names).

Since you already have everything in titles after line 5, and you know the titles are ['Name', 'Surname'], you can just iterate over titles instead of creating a new reader. Of course, on line 6 you delete everything you read to just save the actual titles, so you need to save the rest of the data. You also need to change from iterating over a dict to iterating over a list:

        data = csv.reader(names)
        data = list(titles)
        titles = data[0]
 
        if titles == ['Name', 'Surname']:
            toDB = []
            for row in data[1:]:                                # [1:] skips over the title line
                if (row[0]).isalpha() and (row[1]).isalpha():   # Use list indexes, not dict keys
                    toDB.append((row[0], row[1]))               # You still had i here, it should have been row.
Craig "Ichabod" O'Brien - xenomind.com
I wish you happiness.
Recommended Tutorials: BBCode, functions, classes, text adventures
Reply
#8
Thank you - it works now!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  [split] formula for validating monetary values? kakos_k9 1 727 Dec-17-2022, 09:28 PM
Last Post: woooee
  Command error - cursor.executemany(comandoSQL,valoresInserir) TecInfo 2 1,303 Nov-18-2022, 01:57 PM
Last Post: TecInfo
  getting information from a text file Nickd12 8 3,157 Nov-17-2020, 01:29 AM
Last Post: bowlofred
  MERGE SQL in oracle executemany kaladin 0 3,046 Oct-12-2020, 11:48 AM
Last Post: kaladin
  Validating user input WJSwan 2 2,081 Jul-06-2020, 07:21 AM
Last Post: menator01
  Text file information retreval cel 4 2,465 Jun-04-2020, 02:21 AM
Last Post: cel
  Validating the functionality of the application rpalakodety 1 1,746 Dec-30-2019, 07:58 PM
Last Post: ndc85430
  Extracting information from a file lokhtar 6 2,851 Dec-09-2019, 09:44 PM
Last Post: snippsat
  Errors to get information of multiple files into a single file csv Clnprof 3 2,553 Aug-30-2019, 04:59 PM
Last Post: ThomasL
  Using executemany to import the data Sandy7771989 1 2,642 Jun-11-2019, 07:45 AM
Last Post: Sandy7771989

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020