Python Forum
How to split file by same values from column from imported CSV file?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to split file by same values from column from imported CSV file?
#1
I am really new into Python and I struggle with this problem. I have data like this in CSV file:

111_0,3005,QWE
111_0,3006,SDE
111_0,3006,LFR
111_1,3005,QWE
111_1,5345,JTR
112_0,3103,JPP
112_0,3343,PDK
113_0,2137,TRE
113_0,2137,OMG

What I want to do is to format and insert this data into another file in specified line, but it should split into as many files as many different values I have in the first column. So for example, there would be one file named 'file_111_0.txt' where would be only data from rows 1-3, another file named 'file_111_1.txt' with data from rows 4 and 5 and so on.

import csv
from csv import reader

data = []

with open('outputfile.txt', 'r') as output_file:
    lines = output_file.readlines()
    last = lines[-1]
with open('datafile.csv', 'r') as input_file:
    csv_reader = reader(input_file)
    for row in csv_reader:
        data.append(row)
     with open('output_file', 'w') as output_file:
        for i, line in enumerate(lines,1):
            if i == 10:
                output_file.writelines('something ' + data[row][1] + ' something2 ' + data[row][2])
            else:
                output_file.writelines(line)
This code above is basically what I made and it gives me this output:

line1
line2
line3
line4
line5
line6
line7
line8
line9
something 3005 something2 QWE
something 3006 something2 SDE
something 3006 something2 LFR
something 3005 something2 QWE
something 5345 something2 JTR
something 3103 something2 JPP
something 3343 something2 PDK
something 2137 something2 TRE
something 2137 something2 OMG

But what should I do, and what I don't know how to do, is to create another file for every different value from first column, so file 'file_111_0.txt' would look like this:

line1
line2
line3
line4
line5
line6
line7
line8
line9
something 3005 something2 QWE
something 3006 something2 SDE
something 3006 something2 LFR

I have been thinking about using pandas or dictionaries but I can't get this results
Reply
#2
to separate string on last ',' use (line 8) line.rsplit(',', 1)
this will give you a list with values split.

'111_1,5345,JTR'
>>> line.rsplit(',', 1)
['111_1,5345', 'JTR']
Reply
#3
You should articulate your plan in natural language and then translate into Python. In most general terms it's: 'read data from file to suitable data structure and write data from said structure into files'.

According to description filename should be first value before comma, second and third must be combined with some additional strings to form a line. Every file should start with 9 empty line.

What could be suitable data structure? It should be mutable as every row read from file mutates data. It can be list or dictionary. In case of former we access data using indices and in latte using names.

For example we can design dictionary where key is first value before comma and this will be out file name as well. Value of this key will be list of lists.

In Python it can look something like:

with open('data_source.csv', 'r') as f:
    out_data = dict()
    for line in f:
        name, *data = line.strip('\n').split(',')
        try:
            out_data[name].append(data)
        except KeyError:
            out_data[name] = [data]

for fname, record in out_data.items():
    with open(f'file_{fname}.txt', 'w') as out:
        print(9*'\n', file=out)
        for row in record:
            print(f'num {row[0]} code {row[1]}', file=out)
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply
#4
Here is an answer to your question on StackOverflow

"You can write different files by iterating over the unique data from the first field of your file. Look the example I just did:""


>>> raw_data = """111_0,3005,QWE
... 111_0,3006,SDE
... 111_0,3006,LFR
... 111_1,3005,QWE
... 111_1,5345,JTR
... 112_0,3103,JPP
... 112_0,3343,PDK 
... 113_0,2137,TRE
... 113_0,2137,OMG"""
>>> uniqs_from_field_one = set([ row.strip().split(',')[0] for row in raw_data.splitlines() ])
>>> for uniq in uniqs_from_field_one:
...     filename = 'file_{}.txt'.format(uniq)
...     with open(filename, 'w') as output_file:
...         print('writing the file {}'.format(filename))
... 
writing the file file_112_0.txt
writing the file file_113_0.txt
writing the file file_111_1.txt
writing the file file_111_0.txt
>>> 
$ ls file*txt
file_111_0.txt  file_111_1.txt  file_112_0.txt  file_113_0.txt
Reply
#5
(Mar-24-2022, 12:20 PM)codinglearner Wrote: Here is an answer to your question on StackOverflow

"You can write different files by iterating over the unique data from the first field of your file. Look the example I just did:""

This doesn't address OP problem as described. No data is written to files. Also this approach as it stands right now is not efficient - one iteration is needed to grab filenames, then another to get data to write into files.
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply
#6
(Mar-24-2022, 11:11 AM)perfringo Wrote: You should articulate your plan in natural language and then translate into Python. In most general terms it's: 'read data from file to suitable data structure and write data from said structure into files'.

According to description filename should be first value before comma, second and third must be combined with some additional strings to form a line. Every file should start with 9 empty line.

What could be suitable data structure? It should be mutable as every row read from file mutates data. It can be list or dictionary. In case of former we access data using indices and in latte using names.

For example we can design dictionary where key is first value before comma and this will be out file name as well. Value of this key will be list of lists.

In Python it can look something like:

with open('data_source.csv', 'r') as f:
    out_data = dict()
    for line in f:
        name, *data = line.strip('\n').split(',')
        try:
            out_data[name].append(data)
        except KeyError:
            out_data[name] = [data]

for fname, record in out_data.items():
    with open(f'file_{fname}.txt', 'w') as out:
        print(9*'\n', file=out)
        for row in record:
            print(f'num {row[0]} code {row[1]}', file=out)

I am sorry for such a poor explanation of my problem, it's a new field for me and I am being a little confused with all of it. Your code works almost exatly as I wanted to, but what I meant by starting inserting values to specifield line in text file is to open another file, what I called "output_file.txt" (and it is not empty) and there in specified line (for example line = 10) insert this data and save this file for every value in first columny of csv file as You did. And with opening it and spliting by these value is what I have the most problems.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Get an average of the unique values of a column with group by condition and assign it klllmmm 0 269 Feb-17-2024, 05:53 PM
Last Post: klllmmm
  Help copying a column from a csv to another file with some extras g0nz0uk 3 456 Feb-01-2024, 03:12 PM
Last Post: DeaD_EyE
  Converting column of values into muliple columns of counts highland44 0 252 Feb-01-2024, 12:48 AM
Last Post: highland44
  file open "file not found error" shanoger 8 1,087 Dec-14-2023, 08:03 AM
Last Post: shanoger
  Search Excel File with a list of values huzzug 4 1,216 Nov-03-2023, 05:35 PM
Last Post: huzzug
  Need to replace a string with a file (HTML file) tester_V 1 761 Aug-30-2023, 03:42 AM
Last Post: Larz60+
  How can I change the uuid name of a file to his original file? MaddoxMB 2 921 Jul-17-2023, 10:15 PM
Last Post: Pedroski55
  save values permanently in python (perhaps not in a text file)? flash77 8 1,205 Jul-07-2023, 05:44 PM
Last Post: flash77
  How to "tee" (=split) output to screen and into file? pstein 6 1,371 Jun-24-2023, 08:00 AM
Last Post: Gribouillis
  Loop through json file and reset values [SOLVED] AlphaInc 2 2,097 Apr-06-2023, 11:15 AM
Last Post: AlphaInc

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020