Saving data from each row into separate txt

kiton · May-12-2017, 09:23 PM

Hello dear forum members,

I am seeking your help with the following task. I have a data frame, in which column -fips- contains a 5 digit number and column -text- contains some text. My goal is to save the contents of each -text- row in to a separate TXT file, such that the name of each TXT is the 5-digit number from the -fips- field.

E.g.:
fips text
01234 My text

The expected output should be 01234.txt contacting "My text"

I tried using the following code. But it does the job incorrectly: (a) names the TXT with index number, and (b) saves contents of the -fips- row. I realize something is wrong with "[1][1]" part but cannot figure out what.

import pandas as pd

for x in test.iterrows():
    pd.DataFrame([x[1][1]]).to_csv(str(x[1][0])+".txt", header=False, index=False)

Thank you in advance for your help.

wavic · (This post was last modified: May-13-2017, 06:50 AM by wavic.)

Why Pandas?

#pass all lines untils 'fips       txt'
with open('input_file.txt, 'r') as in_file:
    while in_file.readline().split()[0] != 'fips':
        pass
    
    # write the rest in separed files
    for line in in_file.readline():
        with open('{}.txt'.format(line.slit()[0]), 'w') as out_file:
            out_file.write(line)

kiton · Jul-04-2017, 06:45 AM

Dear wavic, I am thankful for the code you provided above. It worked fine for me during tests in the following form:

with open('file.csv', 'r') as in_file: #note it's a CSV
   while in_file.readline().split()[0] != 'FIPS':
       pass
    
   # write the rest in separed files
   for line in in_file.readline():
       with open('{}.txt'.format(line.slit()[0]), 'w') as out_file:
           out_file.write(line)

However, for some reason today it resulted in the following error:

Error:---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-61-11e3906ccb41> in <module>()
     1 #pass all lines untils 'fips       txt'
     2 with open('TW_13.csv', 'r') as in_file:
----> 3     while in_file.readline().split()[0] != 'FIPS':
     4         pass
     5 

IndexError: list index out of range

The csv looks like this:

FIPS text
0 1001 I screwed that last tweet upAnd im ready to get...
1 1003 Like, please, we're all dying to know how many...
2 1005 ": å???: I haven't felt this good in a while ?...
3 1007 I always do ya favored. That's romance no hope...
4 1009 ???????????????? never heard it put that way...
... ... ...
3224 72153 Good morning gangstass :O å??å« Just is... PE...
3225 78010 lol, I know but I hear it has a dance??She do...
3226 78020 it's on a camera other than an iPhoneeeee14, ...
3227 78030 I'm gonna meet BeyoncÌ?å© after the show or im...
3228 rows × 2 columns

Could you please help me out with this. Thankfully, kiton

***zivoni*** · Jul-04-2017, 08:43 AM

It seems that your file contains either empty line or line containing only whitespaces. For such line split() returns empty list that leads to index out of a range error.

You can save your split result as a variable and use if condition to ensure that its not empty (or use try/except).

DeaD_EyE · Jul-04-2017, 10:04 AM

Tested with your provided example (without the dots).

import os

with open('file.csv') as in_file:
    next(in_file) # throw away the head
    # this does work, because in_file is also an interator
    # calling next or iterating with a for loop gives you line by line
    # newline is not stripped of
    for line in in_file:
        if not line:
            continue
        first, *rest = line.split() # pay attention, here the newline is stripped off
        # you'll also loose all whitespaces when you're using split()
        # out_filename = first + '.txt'
        # out_filename = '{}.txt'.format(first)
        out_filename = f'{first}.txt' # requires Python 3.6 // format strings
        with open(out_filename, 'w') as out_file:
            # mode w overwrites files
            # if the first number is not unique, you'll lose data
            #
            # rest is a list, making a string with join
            # or you write the whole line including the
            # first part of filename, depends what you want
            to_write = ' '.join(rest) # joins the list with each one space between the elements
            out_file.write(to_write)
            out_file.write(os.linesep) # write a newline

kiton · (This post was last modified: Jul-04-2017, 06:15 PM by kiton.)

zivoni, DeaD_Eye, thank you for your feedback. So, I am trying the solution suggested by Dead_Eye and it seems to do the job. However, the issue is that the file name of each newly created txt appears to be as follows, e.g.:

1001,"I.txt
1003,"Like,.txt
1005,"""/.txt
1007,"I.txt

In other words, the first value (i.e., FIPS code) is correct, then there is a comma, then a quote, and then the very first word of the text. The content of the txt file is what is supposed to be except for that first word. What am I doing wrong? Wall

***zivoni*** · Jul-04-2017, 09:12 PM

I would guess it means that you have posted different "csv" file than you are actually processing...

Your posted output was probably pandas dataframe output (with line numbers and fields unquoted and separated by whitespaces), while your actual file seems to be real csv file with comma separator and quoted fields:

Output:"FIPS","text"
1001,"I screwed that last tweet upAnd im ready to get..."
1003,"Like, please, we're all dying to know how many..."
...

If you process such file with DeaD_Eye's script, you naturally would get strange file names (anything to first whitespace would be in file name) and shortened text (and Dead_Eye's script would not survive file with line consisting just of whitespaces, but that should not be problem in "well" formatted csv).

As it seems that you are working with real csv file, use csv module - something like

import csv

with open('file.csv') as csvfile:
   reader = csv.reader(csvfile)
   next(reader) # throw away first line
   for row in reader: # row is already list of splitted fields
       if len(row) == 2: # write out only lines that have exactly two fields - id and tweet?
           with open(row[0] + ".txt", "w") as outfile:
               outfile.write(row[1] + "\n")

If your file can contain more than two fields and you want them all, concatenate row[1:] into your output file (or use csv.writer and write csv file). And if you want to write out even "empty" tweets, then add check for len(row) == 1 writing empty file...

kiton · (This post was last modified: Jul-05-2017, 03:11 AM by kiton.)

zivoni, you are a mega mind -- thank you so much for your help. Your code initially resulted in the

Error:
Error: field larger than field limit (131072)

But I found a quick solution to that by running the following code

csv.field_size_limit(500 * 1024 * 1024)

and it worked perfectly. Whoohoo :)

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Saving data into xlxs colomwise	erdemath	0	1,402	Oct-02-2021, 09:01 AM Last Post: erdemath
	Data saving structure	JosefFilosopio	0	2,121	May-04-2019, 10:48 AM Last Post: JosefFilosopio
	Problem with saving data and loading data to mysql	kirito85	4	3,913	Feb-08-2019, 10:53 AM Last Post: kirito85
	Saving data into .mat (Matlab Fiile)	Tulo_Stone	0	2,387	Feb-28-2018, 12:57 AM Last Post: Tulo_Stone

Saving data from each row into separate txt

User Panel Messages

Announcements