Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Multiple csv files
#1
I have a project where I have to collate info from a number of different CSV files into one file as well as running some calculations on the data within them. The CSV files contain the results of various trials at a task for different individuals. Some of the data sits under column headings. Other things do not. For example personal info like name, age, gender and IP address. I am struggling with where to start with the task: i.e. whether to combine all of the CSV files into one file first then interrogate from there or extract the necessary data from each one and put into a new file as I go. If the latter is there a simple function to loop through multiple files in a folder? If the former, what's the best way to extract data which isn't 'multiple choice' such as name (which isn't assigned and identifier in the file)? I am not allowed to use any modules like panda etc. and should only use functions covered in class: variables, choices, data structures and loops. Any help greatly appreciated!!
Reply
#2
I take it you're allowed to use built-in modules, like os and csv? For a simple folder full of files, os.listdir will give you a list of the file names, and you can just load those ending in .csv. For a more complicated file structure, with sub-folders to search through, I would look at os.walk. You can read csv modules without the csv module, as long as they are basic files with no commas quoted out within fields. But I would recommend using the csv module if you can.

How you proceed is going to depend on what the structure of the data in the files is. Is each file data on different subjects or tasks? Is there data on the same subject in different files, at least such that rows would need to be combined rather than just appended?

If its a simple append of rows from one file to another, possibly with different column selections, I would just read the files one at a time, appending to the master file as you read them. If individual rows need information from more than one file, I would read them in, do the data manipulation in Python, and then write out the master file.
Craig "Ichabod" O'Brien - xenomind.com
I wish you happiness.
Recommended Tutorials: BBCode, functions, classes, text adventures
Reply
#3
No, unfortunately we can't use any of the modules. I think your suggestion about reading one file at a time will make most sense.

At the moment I'm struggling to find a way to loop through the multiple files and take info from a specific cell. I've managed to do this for one of the files ok using the code below but can't work out how to adapt this to loop through all of the files I need:

andrewfile = open("results/expAAndrew12062013164623.csv", "r")
alllines = andrewfile.readlines()
all_lines = alllines[0].strip(" ").split(",")
print(all_lines)

newfile = open("combinedfile.csv","a")
newfile.write("\n"+all_lines[0])

A lot of the other data has a restricted number of variants or sits under column headings so should be ok with those I think.
Reply
#4
If you don't have access to os, you are at least given a list of the file names right? Otherwise I don't see how you can do it.

This is a way to read a file in a loop:

with open("results/expAAndrew12062013164623.csv", "r") as andrewfile:
    for line in andrew file:
        fields = line.strip().split(',')
You could put that in a for loop that cycles through the file names and reads them all, assuming they have the same format.

Also, please post code in python tags. See the BBCode link in my signature below for instructions on how to do that.
Craig "Ichabod" O'Brien - xenomind.com
I wish you happiness.
Recommended Tutorials: BBCode, functions, classes, text adventures
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020