How to split file by same values from column from imported CSV file?

Paqqno · Mar-23-2022, 05:33 PM

I am really new into Python and I struggle with this problem. I have data like this in CSV file:

111_0,3005,QWE
111_0,3006,SDE
111_0,3006,LFR
111_1,3005,QWE
111_1,5345,JTR
112_0,3103,JPP
112_0,3343,PDK
113_0,2137,TRE
113_0,2137,OMG

What I want to do is to format and insert this data into another file in specified line, but it should split into as many files as many different values I have in the first column. So for example, there would be one file named 'file_111_0.txt' where would be only data from rows 1-3, another file named 'file_111_1.txt' with data from rows 4 and 5 and so on.

import csv
from csv import reader

data = []

with open('outputfile.txt', 'r') as output_file:
    lines = output_file.readlines()
    last = lines[-1]
with open('datafile.csv', 'r') as input_file:
    csv_reader = reader(input_file)
    for row in csv_reader:
        data.append(row)
     with open('output_file', 'w') as output_file:
        for i, line in enumerate(lines,1):
            if i == 10:
                output_file.writelines('something ' + data[row][1] + ' something2 ' + data[row][2])
            else:
                output_file.writelines(line)

This code above is basically what I made and it gives me this output:

line1
line2
line3
line4
line5
line6
line7
line8
line9
something 3005 something2 QWE
something 3006 something2 SDE
something 3006 something2 LFR
something 3005 something2 QWE
something 5345 something2 JTR
something 3103 something2 JPP
something 3343 something2 PDK
something 2137 something2 TRE
something 2137 something2 OMG

But what should I do, and what I don't know how to do, is to create another file for every different value from first column, so file 'file_111_0.txt' would look like this:

line1
line2
line3
line4
line5
line6
line7
line8
line9
something 3005 something2 QWE
something 3006 something2 SDE
something 3006 something2 LFR

I have been thinking about using pandas or dictionaries but I can't get this results

**Larz60+** · (This post was last modified: Mar-23-2022, 08:16 PM by Larz60+.)

to separate string on last ',' use (line 8) line.rsplit(',', 1)
this will give you a list with values split.

'111_1,5345,JTR'
>>> line.rsplit(',', 1)
['111_1,5345', 'JTR']

**perfringo** · (This post was last modified: Mar-24-2022, 12:45 PM by perfringo.)

You should articulate your plan in natural language and then translate into Python. In most general terms it's: 'read data from file to suitable data structure and write data from said structure into files'.

According to description filename should be first value before comma, second and third must be combined with some additional strings to form a line. Every file should start with 9 empty line.

What could be suitable data structure? It should be mutable as every row read from file mutates data. It can be list or dictionary. In case of former we access data using indices and in latte using names.

For example we can design dictionary where key is first value before comma and this will be out file name as well. Value of this key will be list of lists.

In Python it can look something like:

with open('data_source.csv', 'r') as f:
    out_data = dict()
    for line in f:
        name, *data = line.strip('\n').split(',')
        try:
            out_data[name].append(data)
        except KeyError:
            out_data[name] = [data]

for fname, record in out_data.items():
    with open(f'file_{fname}.txt', 'w') as out:
        print(9*'\n', file=out)
        for row in record:
            print(f'num {row[0]} code {row[1]}', file=out)

codinglearner · Mar-24-2022, 12:20 PM

Here is an answer to your question on StackOverflow

"You can write different files by iterating over the unique data from the first field of your file. Look the example I just did:""

>>> raw_data = """111_0,3005,QWE
... 111_0,3006,SDE
... 111_0,3006,LFR
... 111_1,3005,QWE
... 111_1,5345,JTR
... 112_0,3103,JPP
... 112_0,3343,PDK 
... 113_0,2137,TRE
... 113_0,2137,OMG"""
>>> uniqs_from_field_one = set([ row.strip().split(',')[0] for row in raw_data.splitlines() ])
>>> for uniq in uniqs_from_field_one:
...     filename = 'file_{}.txt'.format(uniq)
...     with open(filename, 'w') as output_file:
...         print('writing the file {}'.format(filename))
... 
writing the file file_112_0.txt
writing the file file_113_0.txt
writing the file file_111_1.txt
writing the file file_111_0.txt
>>> 
$ ls file*txt
file_111_0.txt  file_111_1.txt  file_112_0.txt  file_113_0.txt

**perfringo** · Mar-24-2022, 12:44 PM

(Mar-24-2022, 12:20 PM)codinglearner Wrote: Here is an answer to your question on StackOverflow

"You can write different files by iterating over the unique data from the first field of your file. Look the example I just did:""

This doesn't address OP problem as described. No data is written to files. Also this approach as it stands right now is not efficient - one iteration is needed to grab filenames, then another to get data to write into files.

Paqqno · (This post was last modified: Mar-24-2022, 05:25 PM by Paqqno.)

(Mar-24-2022, 11:11 AM)perfringo Wrote: You should articulate your plan in natural language and then translate into Python. In most general terms it's: 'read data from file to suitable data structure and write data from said structure into files'.

According to description filename should be first value before comma, second and third must be combined with some additional strings to form a line. Every file should start with 9 empty line.

What could be suitable data structure? It should be mutable as every row read from file mutates data. It can be list or dictionary. In case of former we access data using indices and in latte using names.

For example we can design dictionary where key is first value before comma and this will be out file name as well. Value of this key will be list of lists.

In Python it can look something like:
with open('data_source.csv', 'r') as f:
    out_data = dict()
    for line in f:
        name, *data = line.strip('\n').split(',')
        try:
            out_data[name].append(data)
        except KeyError:
            out_data[name] = [data]

for fname, record in out_data.items():
    with open(f'file_{fname}.txt', 'w') as out:
        print(9*'\n', file=out)
        for row in record:
            print(f'num {row[0]} code {row[1]}', file=out)

I am sorry for such a poor explanation of my problem, it's a new field for me and I am being a little confused with all of it. Your code works almost exatly as I wanted to, but what I meant by starting inserting values to specifield line in text file is to open another file, what I called "output_file.txt" (and it is not empty) and there in specified line (for example line = 10) insert this data and save this file for every value in first columny of csv file as You did. And with opening it and spliting by these value is what I have the most problems.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Replace values in Yaml file with value in dictionary	PelleH	1	2,188	Feb-11-2025, 09:51 AM Last Post: alexjordan
	How to write variable in a python file then import it in another python file?	tatahuft	4	963	Jan-01-2025, 12:18 AM Last Post: Skaperen
	JSON File - extract only the data in a nested array for CSV file	shwfgd	2	1,092	Aug-26-2024, 10:14 PM Last Post: shwfgd
	FileNotFoundError: [Errno 2] No such file or directory although the file exists	Arnibandyo	0	1,005	Aug-12-2024, 09:11 AM Last Post: Arnibandyo
	"[Errno 2] No such file or directory" (.py file)	IbrahimBennani	13	6,505	Jun-17-2024, 12:26 AM Last Post: AdamHensley
	Get an average of the unique values of a column with group by condition and assign it	klllmmm	0	2,011	Feb-17-2024, 05:53 PM Last Post: klllmmm
	Help copying a column from a csv to another file with some extras	g0nz0uk	3	1,765	Feb-01-2024, 03:12 PM Last Post: DeaD_EyE
	Converting column of values into muliple columns of counts	highland44	0	908	Feb-01-2024, 12:48 AM Last Post: highland44
	file open "file not found error"	shanoger	8	7,930	Dec-14-2023, 08:03 AM Last Post: shanoger
	Search Excel File with a list of values	huzzug	4	2,845	Nov-03-2023, 05:35 PM Last Post: huzzug

How to split file by same values from column from imported CSV file?

User Panel Messages

Announcements