Python Forum

Full Version: Data manipulation code running but not functioning correctly
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello! I'm quite new to Python, but I'm learning to use it to manipulate some survey data for my PhD project. The data has been exported as one big .csv file, with a column for every question and a new row for each participant. Every question is number coded as follows: 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.1, 7.2, 7.3...17.18, 17.19, 17.20. I have tried to write some code that will extract all the ".1"s (i.e. 7.1, 8.1, 9.1 etc) from every row and write them to a separate .csv file, then do the same for all the ".2"s and so on. In every file, there should also be each participant's answers to the first six questions at the start of their respective row. The code I have written does run (so there are no syntax errors), and it does write all the individual files to the correct locations, but these files are all empty when opened. I'm not sure why the code I have written isn't populating the files with any data, and I'd really appreciate some help! It's probably not the neatest way to write what I want to do, but it should be easy enough to follow my logic and I'd be really grateful if anyone could point out where I'm going wrong! The code is posted below.

fn = 'File Location'
f = open(fn, 'r')
data = f.read()
outstr1 = ''
outstr2 = ''
outstr3 = ''
outstr4 = ''
outstr5 = ''
outstr6 = ''
outstr7 = ''
outstr8 = ''
outstr9 = ''
outstr10 = ''
outstr11 = ''
outstr12 = ''
outstr13 = ''
outstr14 = ''
outstr15 = ''
outstr16 = ''
outstr17 = ''
outstr18 = ''
outstr19 = ''
outstr20 = ''
data = data.split('\n')
headers = data[0]
data = data[1:]
for h in headers:
    h.split(',')

for d in data:
    row = d.split(',')
    row_dict = dict(zip(headers, row))
    for h in headers:
        if '1.' in h:
            outstr1 += row_dict[h] + ','
            outstr2 += row_dict[h] + ','
            outstr3 += row_dict[h] + ','
            outstr4 += row_dict[h] + ','
            outstr5 += row_dict[h] + ','
            outstr6 += row_dict[h] + ','
            outstr7 += row_dict[h] + ','
            outstr8 += row_dict[h] + ','
            outstr9 += row_dict[h] + ','
            outstr10 += row_dict[h] + ','
            outstr11 += row_dict[h] + ','
            outstr12 += row_dict[h] + ','
            outstr13 += row_dict[h] + ','
            outstr14 += row_dict[h] + ','
            outstr15 += row_dict[h] + ','
            outstr16 += row_dict[h] + ','
            outstr17 += row_dict[h] + ','
            outstr18 += row_dict[h] + ','
            outstr19 += row_dict[h] + ','
            outstr20 += row_dict[h] + ','
        elif '2.' in h:
            outstr1 += row_dict[h] + ','
            outstr2 += row_dict[h] + ','
            outstr3 += row_dict[h] + ','
            outstr4 += row_dict[h] + ','
            outstr5 += row_dict[h] + ','
            outstr6 += row_dict[h] + ','
            outstr7 += row_dict[h] + ','
            outstr8 += row_dict[h] + ','
            outstr9 += row_dict[h] + ','
            outstr10 += row_dict[h] + ','
            outstr11 += row_dict[h] + ','
            outstr12 += row_dict[h] + ','
            outstr13 += row_dict[h] + ','
            outstr14 += row_dict[h] + ','
            outstr15 += row_dict[h] + ','
            outstr16 += row_dict[h] + ','
            outstr17 += row_dict[h] + ','
            outstr18 += row_dict[h] + ','
            outstr19 += row_dict[h] + ','
            outstr20 += row_dict[h] + ','
        elif '3.' in h:
            outstr1 += row_dict[h] + ','
            outstr2 += row_dict[h] + ','
            outstr3 += row_dict[h] + ','
            outstr4 += row_dict[h] + ','
            outstr5 += row_dict[h] + ','
            outstr6 += row_dict[h] + ','
            outstr7 += row_dict[h] + ','
            outstr8 += row_dict[h] + ','
            outstr9 += row_dict[h] + ','
            outstr10 += row_dict[h] + ','
            outstr11 += row_dict[h] + ','
            outstr12 += row_dict[h] + ','
            outstr13 += row_dict[h] + ','
            outstr14 += row_dict[h] + ','
            outstr15 += row_dict[h] + ','
            outstr16 += row_dict[h] + ','
            outstr17 += row_dict[h] + ','
            outstr18 += row_dict[h] + ','
            outstr19 += row_dict[h] + ','
            outstr20 += row_dict[h] + ','
        elif '4.' in h:
            outstr1 += row_dict[h] + ','
            outstr2 += row_dict[h] + ','
            outstr3 += row_dict[h] + ','
            outstr4 += row_dict[h] + ','
            outstr5 += row_dict[h] + ','
            outstr6 += row_dict[h] + ','
            outstr7 += row_dict[h] + ','
            outstr8 += row_dict[h] + ','
            outstr9 += row_dict[h] + ','
            outstr10 += row_dict[h] + ','
            outstr11 += row_dict[h] + ','
            outstr12 += row_dict[h] + ','
            outstr13 += row_dict[h] + ','
            outstr14 += row_dict[h] + ','
            outstr15 += row_dict[h] + ','
            outstr16 += row_dict[h] + ','
            outstr17 += row_dict[h] + ','
            outstr18 += row_dict[h] + ','
            outstr19 += row_dict[h] + ','
            outstr20 += row_dict[h] + ','
        elif '5.' in h:
            outstr1 += row_dict[h] + ','
            outstr2 += row_dict[h] + ','
            outstr3 += row_dict[h] + ','
            outstr4 += row_dict[h] + ','
            outstr5 += row_dict[h] + ','
            outstr6 += row_dict[h] + ','
            outstr7 += row_dict[h] + ','
            outstr8 += row_dict[h] + ','
            outstr9 += row_dict[h] + ','
            outstr10 += row_dict[h] + ','
            outstr11 += row_dict[h] + ','
            outstr12 += row_dict[h] + ','
            outstr13 += row_dict[h] + ','
            outstr14 += row_dict[h] + ','
            outstr15 += row_dict[h] + ','
            outstr16 += row_dict[h] + ','
            outstr17 += row_dict[h] + ','
            outstr18 += row_dict[h] + ','
            outstr19 += row_dict[h] + ','
            outstr20 += row_dict[h] + ','
        elif '6.' in h:
            outstr1 += row_dict[h] + ','
            outstr2 += row_dict[h] + ','
            outstr3 += row_dict[h] + ','
            outstr4 += row_dict[h] + ','
            outstr5 += row_dict[h] + ','
            outstr6 += row_dict[h] + ','
            outstr7 += row_dict[h] + ','
            outstr8 += row_dict[h] + ','
            outstr9 += row_dict[h] + ','
            outstr10 += row_dict[h] + ','
            outstr11 += row_dict[h] + ','
            outstr12 += row_dict[h] + ','
            outstr13 += row_dict[h] + ','
            outstr14 += row_dict[h] + ','
            outstr15 += row_dict[h] + ','
            outstr16 += row_dict[h] + ','
            outstr17 += row_dict[h] + ','
            outstr18 += row_dict[h] + ','
            outstr19 += row_dict[h] + ','
            outstr20 += row_dict[h] + ','
        elif '.1' in h:
            outstr1 += row_dict[h] + ','
        elif '.2' in h:
            outstr2 += row_dict[h] + ','
        elif '.3' in h:
            outstr3 += row_dict[h] + ','
        elif '.4' in h:
            outstr4 += row_dict[h] + ','
        elif '.5' in h:
            outstr5 += row_dict[h] + ','
        elif '.6' in h:
            outstr6 += row_dict[h] + ','
        elif '.7' in h:
            outstr7 += row_dict[h] + ','
        elif '.8' in h:
            outstr8 += row_dict[h] + ','
        elif '.9' in h:
            outstr9 += row_dict[h] + ','
        elif '.10' in h:
            outstr10 += row_dict[h] + ','
        elif '.11' in h:
            outstr11 += row_dict[h] + ','
        elif '.12' in h:
            outstr12 += row_dict[h] + ','
        elif '.13' in h:
            outstr13 += row_dict[h] + ','
        elif '.14' in h:
            outstr14 += row_dict[h] + ','
        elif '.15' in h:
            outstr15 += row_dict[h] + ','
        elif '.16' in h:
            outstr16 += row_dict[h] + ','
        elif '.17' in h:
            outstr17 += row_dict[h] + ','
        elif '.18' in h:
            outstr18 += row_dict[h] + ','
        elif '.19' in h:
            outstr19 += row_dict[h] + ','
        elif '.20' in h:
            outstr20 += row_dict[h] + ','
    outstr1 += '\n'
    outstr2 += '\n'
    outstr3 += '\n'
    outstr4 += '\n'
    outstr5 += '\n'
    outstr6 += '\n'
    outstr7 += '\n'
    outstr8 += '\n'
    outstr9 += '\n'
    outstr10 += '\n'
    outstr11 += '\n'
    outstr12 += '\n'
    outstr13 += '\n'
    outstr14 += '\n'
    outstr15 += '\n'
    outstr16 += '\n'
    outstr17 += '\n'
    outstr18 += '\n'
    outstr19 += '\n'
    outstr20 += '\n'

fn1 = 'New File1 Location'
fn2 = 'New File2 Location'
fn3 = 'New File3 Location'
fn4 = 'New File4 Location'
fn5 = 'New File5 Location'
fn6 = 'New File6 Location'
fn7 = 'New File7 Location'
fn8 = 'New File8 Location'
fn9 = 'New File9 Location'
fn10 = 'New File10 Location'
fn11 = 'New File11 Location'
fn12 = 'New File12 Location'
fn13 = 'New File13 Location'
fn14 = 'New File14 Location'
fn15 = 'New File15 Location'
fn16 = 'New File16 Location'
fn17 = 'New File17 Location'
fn18 = 'New File18 Location'
fn19 = 'New File19 Location'
fn20 = 'New File20 Location'
f1 = open(fn1, 'w')
f1.write(outstr1)
f1.close()
f2 = open(fn2, 'w')
f2.write(outstr2)
f2.close()
f3 = open(fn3, 'w')
f3.write(outstr3)
f3.close()
f4 = open(fn4, 'w')
f4.write(outstr4)
f4.close()
f5 = open(fn5, 'w')
f5.write(outstr5)
f5.close()
f6 = open(fn6, 'w')
f6.write(outstr6)
f6.close()
f7 = open(fn7, 'w')
f7.write(outstr7)
f7.close()
f8 = open(fn8, 'w')
f8.write(outstr8)
f8.close()
f9 = open(fn9, 'w')
f9.write(outstr9)
f9.close()
f10 = open(fn10, 'w')
f10.write(outstr10)
f10.close()
f11 = open(fn11, 'w')
f11.write(outstr11)
f11.close()
f12 = open(fn12, 'w')
f12.write(outstr12)
f12.close()
f13 = open(fn13, 'w')
f13.write(outstr13)
f13.close()
f14 = open(fn14, 'w')
f14.write(outstr14)
f14.close()
f15 = open(fn15, 'w')
f15.write(outstr15)
f15.close()
f16 = open(fn16, 'w')
f16.write(outstr16)
f16.close()
f17 = open(fn17, 'w')
f17.write(outstr17)
f17.close()
f18 = open(fn18, 'w')
f18.write(outstr18)
f18.close()
f19 = open(fn19, 'w')
f19.write(outstr19)
f19.close()
f20 = open(fn20, 'w')
f20.write(outstr20)
f20.close()
Your code is really unclear and hard to follow. Use more descriptive variable names, and use comments to clarify what you are doing and why you are doing it.

Based on your description, I think you want something more like this:

first_six = []
point_ones = []
point_twos = []
...
for row in data:
    fields = row.split(',')
    first_six.append(fields[:6])
    row_ones = []
    row_twos = []
    ...
    for name, field in zip(headers, fields):
        if name.endswith('.1'):
            row_ones.append(field)
        if name.endswith('.2'):
            row_twos.append(field)
        ....
    point_ones.append(row_ones)
    point_twos.append(row_twos)
That will give you the first six fields for every row, and split out the .1, .2, and so on fields into point_foo lists. Each file would then be made with '\n'.join([','.join(six + extra) for six, extra in zip(first_six, point_one)])