Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Parsing CSVs...
#1
Hi All,

hope you are all well!

Need a little nugde...

I have a CSV, within which a couple of lines look like this...

CompanyName LTD, "[email protected],[email protected]", "12234,56678"

How would I select, the second email, from the second column? Or print each of the numbers in the 3rd column individually?

import csv


dataSetFile = "DemoDataSetForReports.csv"

with open(dataSetFile, 'r') as dataSet:
    csvReader = csv.reader(dataSet)

    for row in csvReader:

        print(row[2])
The above code prints : 12234,56678

I assume CSV is the correct module for doing this?

Any help would be appreciated!
Reply
#2
Yes, csv module is right for task at hand.

Some suggestions:
  • read file with newline='' (from documentation: "If newline='' is not specified, newlines embedded inside quoted fields will not be interpreted correctly, and on platforms that use \r\n linendings on write an extra \r will be added. It should always be safe to specify newline='', since the csv module does its own (universal) newline handling.")
  • if file has space after comma use skipinitialspace=True
  • remember that Python uses 0-based indexing

You need to have second item on a row (at index 1), split it at comma and get second item (at index 1).

So you can write:

import csv

with open("my_magic.csv", "r", newline="") as f:
    data = csv.reader(f, skipinitialspace=True)
    for row in data:
        print(row[1].split(',')[1])

# prints [email protected]
ChLenx79 likes this post
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply
#3
You sir, are a genius!!

Thanks so much!
Reply
#4
skipinitialspace=True is required if you have spaces after your separator character
Output:
separator v CompanyName LTD, "[email protected],[email protected]" ^ whitespace
If you don't skip the initial space(s) the csv reader misses the starting quote which messes up parsing the rest of the line. Instead of three columns like this.
Output:
CompanyName LTD [email protected], [email protected]" 12234,56678
you end up with 5
Output:
CompanyName LTD "[email protected] [email protected]" "12234 56678"
Such a subtle and not obvious difference that has a big effect.
The newline='' in:
with open("my_magic.csv", "r", newline="") as f:
removes newline characters (\n) from the end of lines read from the file. I don't think this is really needed here. The csv reader ignores the newline characters. Setting the newline to '' when writing a csv file prevents the csv writer writing a blank line after each row(two newline characters at the end of each row). I think that only happens on Windows

I would read the file like this:
import csv

with open("test.csv", "r") as file:
    data = csv.reader(file, skipinitialspace=True)
    for company, emails, numbers, in data:
        emails = emails.split(",")
        numbers = numbers.split(",")
        print(company, emails, numbers)
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020