Python Forum

Full Version: Validating information from .csv file before executemany
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi,

I'm currently having an issue with my code. I have an option for a user to choose a .csv file with names and surnames of their students and this will be entered into the database. The data goes into the database fine as long as there are 'Name' and 'Surname' fields in the .csv file. This is an issue as if there is an invalid name in the .csv file (e.g. something with a number in), it will be entered into the database anyway, even if it shouldn't be allowed. I can't figure out how to validate the information before it goes into the database. Here is my code:

file = askopenfilename(parent = root)
if file.endswith('.csv'):
    with open(file, 'r') as names:
        reading = csv.DictReader(names)
        toDB = [(i['Name'], i['Surname']) for i in reading]
accounts = sqlite3.connect("accounts.db")
c = accounts.cursor()
c.executemany('''INSERT INTO Students (FirstName, Surname) VALUES (?, ?)''', toDB)
Can anyone help, please? <3
Well, in general you would have some validation functions that return True is the name is valid, or False if it isn't. Then you would test on those before entering into the database:

file = askopenfilename(parent = root)
if file.endswith('.csv'):
    with open(file, 'r') as names:
        toDB = []
        reading = csv.DictReader(names)
        for line in reading:
            if valid_name(line['Name']) and valid_surname(line['Surname']):
               toDB.append((i['Name'], i['Surname']))
accounts = sqlite3.connect("accounts.db")
c = accounts.cursor()
c.executemany('''INSERT INTO Students (FirstName, Surname) VALUES (?, ?)''', toDB)
What goes into the functions valid_name and valid_surname is going to depend on how you define what a valid name is.
Thanks for the help, but now it's not reading anything from the file. I also needed to fix the issue that the program would stop working if a file was entered with the wrong headings, and my solution worked, but for some reason, now that I have both of the pieces together, csv.DictReader doesn't seem to read anything. It's probably something stupid, but I can't figure out why it won't work anymore. Any more help? Thank you.

    if file.endswith('.csv'):
        with open(file, 'r') as names:
            titles = csv.reader(names)
            titles = list(titles)
            titles = titles[0]

            if titles == ['Name', 'Surname']:
                toDB = []
                reading = csv.DictReader(names)
                for row in reading:
                    if (row['Name']).isalpha() and (row['Surname']).isalpha():
                        toDB.append((i['Name'], i['Surname']))
            else:
                root = tk.Tk()
                error = tk.Label(root, text = "Error: invalid titles. Must be 'Name' and 'Surname'.")
                error.pack()
                root.after(5000, lambda: root.destroy())
                root.mainloop()
                root = tk.Tk()
                root.geometry("0x0")
                dbChoice(root)
When you say it's not reading anything, what exactly do you mean? Is it going to the error in the else clause no matter what, or is it just not outputting anything? Also, where did you move the sql code to when you added the if/else? And please post a few lines of your name file we can test with.
Use functions to separate code.
There should be one function to read the data.
Another function can check the suffix.
A third function can call this and handle all bubbling up exceptions and print them.
Using pathlib gives you more abstraction.
The csv_reader is an Iterator. You can use next(csv_reader) to get the next values.
The first row is the header.

Then you can still use the same Iterator in a for-loop.
The Iterator knows it's own position and when it has to stop.

Example where I catch all Exceptions and show it in a messagebox:
import tkinter
import tkinter.messagebox
from pathlib import Path
import csv


def csv_reader(file: Path) -> list:
    """
    This function expects a path object.
    Return: a list with [[name, surname], ...]
    """
    result = []
    with file.open() as csv_fd:
        reader = csv.reader(csv_fd)
        titles = next(reader)
        if titles != ['Name', 'Surname']:
            raise ValueError(f'Wrong titles: {titles}')
        for row in reader:
            if not row: # row is empty, just continue
                print('Empty row')
                continue
            if len(row) != 2:
                raise ValueError(f'Wrong CSV-Format. Row has too many columns. 2 columns are needed: {row}')
            # if row has more or less then 2 elements, the following code is not
            # executed
            # you can decide if you raise an error or just continue,
            # if wrong number of elements is in one row
            name, surname = row
            if name.isalpha() and surname.isalpha():
                # you can append row or [name, surname] or (name, surname)
                print('append', row)
                result.append(row)
    return result


def collect_data(file):
    """
    This function checks if the extension is right.
    Then it returns the data for the databes
    """
    file = Path(file)
    if file.suffix.endswith('.csv'):
        data = csv_reader(file)
    else:
        raise ValueError('Only .csv file extension is allowed')
    return data


def caller(root):
    """
    A function which calls collect data and
    catches all Exceptions which bullbe up.
    """
    try:
        data = collect_data('test.csv')
    except Exception as e:
        print('Error:', e)
        tkinter.messagebox.showerror('Error', e)
    else:
        # kill all children from bottom frame
        [c.destroy() for c in tuple(root.children.values())]
        # during destroying the children (labels), the size of the dict is changing
        # this is the cause, why tuple is used
        for row, (name, surname) in enumerate(data):
            tkinter.Label(root, text=name).grid(row=row, column=0)
            tkinter.Label(root, text=surname).grid(row=row, column=1)


# the caller tries to get the data
# if an error happens, the general Exception block catches this error.
# There you can add error messages to your GUI



root = tkinter.Tk()
t_frame = tkinter.Frame(root)
b_frame = tkinter.Frame(root)
t_frame.pack()
b_frame.pack()
tkinter.Label(t_frame, text='Name').grid(row=1, column=0)
tkinter.Label(t_frame, text='Surname').grid(row=1, column=1)
tkinter.Button(t_frame, text='Exit', command=root.destroy).grid(row=0, column=0)
tkinter.Button(t_frame, text='Load', command=lambda: caller(b_frame)).grid(row=0, column=1)

root.mainloop()
PS: Pay attention about the column delimiter. The standard is a comma, but in other regions a semicolon is use. Just use the keyword-argument delimiter, to define the right delimiter: csv.reader(file_object, delimiter=';')
(Apr-15-2019, 12:51 AM)ichabod801 Wrote: [ -> ]When you say it's not reading anything, what exactly do you mean? Is it going to the error in the else clause no matter what, or is it just not outputting anything? Also, where did you move the sql code to when you added the if/else? And please post a few lines of your name file we can test with.

Sorry - I didn't copy the entire code. It's not going to the error clause, it's just not putting anything into the database, and if I ask it to print what's in the file, it doesn't print anything. Here's the code:

file = askopenfilename(parent = root)
if file.endswith('.csv'):
    with open(file, 'r') as names:
        titles = csv.reader(names)
        titles = list(titles)
        titles = titles[0]

        if titles == ['Name', 'Surname']:
            toDB = []
            reading = csv.DictReader(names)
            for row in reading:
                if (row['Name']).isalpha() and (row['Surname']).isalpha():
                    toDB.append((i['Name'], i['Surname']))
        else:
            root = tk.Tk()
            error = tk.Label(root, text = "Error: invalid titles. Must be 'Name' and 'Surname'.")
            error.pack()
            root.after(5000, lambda: root.destroy())
            root.mainloop()
            root = tk.Tk()
            root.geometry("0x0")
            dbChoice(root)

else:
    root.destroy()
    root = tk.Tk()
    error = tk.Label(root, text = "Invalid file: can only be a .csv file")
    error.pack()
    root.after(5000, lambda: root.destroy())
    root.mainloop()
    root = tk.Tk()
    root.geometry("0x0")
    dbChoice(root)

accounts = sqlite3.connect("accounts.db")
c = accounts.cursor()
c.executemany('''INSERT INTO Students (FirstName, Surname) VALUES (?, ?)''', toDB)
I think the problem is that you are trying to go through names twice. You can only iterate through a file object once. On line 5, when you do titles = list(titles), you pull the entire file into titles. Then when you try to do for row in reading: on line 11, there's nothing left in names (both titles and reading are csv readers based off names).

Since you already have everything in titles after line 5, and you know the titles are ['Name', 'Surname'], you can just iterate over titles instead of creating a new reader. Of course, on line 6 you delete everything you read to just save the actual titles, so you need to save the rest of the data. You also need to change from iterating over a dict to iterating over a list:

        data = csv.reader(names)
        data = list(titles)
        titles = data[0]
 
        if titles == ['Name', 'Surname']:
            toDB = []
            for row in data[1:]:                                # [1:] skips over the title line
                if (row[0]).isalpha() and (row[1]).isalpha():   # Use list indexes, not dict keys
                    toDB.append((row[0], row[1]))               # You still had i here, it should have been row.
Thank you - it works now!