Python Forum
Saving data from each row into separate txt
Thread Rating:
  • 1 Vote(s) - 2 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Saving data from each row into separate txt
#1
Hello dear forum members,

I am seeking your help with the following task. I have a data frame, in which column -fips- contains a 5 digit number and column -text- contains some text. My goal is to save the contents of each -text- row in to a separate TXT file, such that the name of each TXT is the 5-digit number from the -fips- field. 

E.g.:
fips          text
01234      My text

The expected output should be 01234.txt contacting "My text"

I tried using the following code. But it does the job incorrectly:  (a) names the TXT with index number, and (b) saves contents of the -fips- row. I realize something is wrong with "[1][1]" part but cannot figure out what.
import pandas as pd

for x in test.iterrows():
    pd.DataFrame([x[1][1]]).to_csv(str(x[1][0])+".txt", header=False, index=False)
Thank you in advance for your help.
Reply
#2
Why Pandas?

#pass all lines untils 'fips       txt'
with open('input_file.txt, 'r') as in_file:
    while in_file.readline().split()[0] != 'fips':
        pass
    
    # write the rest in separed files
    for line in in_file.readline():
        with open('{}.txt'.format(line.slit()[0]), 'w') as out_file:
            out_file.write(line)
        
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."
https://freedns.afraid.org
Reply
#3
Dear wavic, I am thankful for the code you provided above. It worked fine for me during tests in the following form:

with open('file.csv', 'r') as in_file: #note it's a CSV
   while in_file.readline().split()[0] != 'FIPS':
       pass
    
   # write the rest in separed files
   for line in in_file.readline():
       with open('{}.txt'.format(line.slit()[0]), 'w') as out_file:
           out_file.write(line)
However, for some reason today it resulted in the following error:

Error:
--------------------------------------------------------------------------- IndexError                                Traceback (most recent call last) <ipython-input-61-11e3906ccb41> in <module>()      1 #pass all lines untils 'fips       txt'      2 with open('TW_13.csv', 'r') as in_file: ----> 3     while in_file.readline().split()[0] != 'FIPS':      4         pass      5 IndexError: list index out of range
The csv looks like this:

FIPS text
0 1001 I screwed that last tweet upAnd im ready to get...
1 1003 Like, please, we're all dying to know how many...
2 1005 ": å???: I haven't felt this good in a while ?...
3 1007 I always do ya favored. That's romance no hope...
4 1009 ???????????????? never heard it put that way...
... ... ...
3224 72153 Good morning gangstass :O å??å« Just is... PE...
3225 78010 lol, I know but I hear it has a dance??She do...
3226 78020 it's on a camera other than an iPhoneeeee14, ...
3227 78030 I'm gonna meet BeyoncÌ?å© after the show or im...
3228 rows × 2 columns
Could you please help me out with this. Thankfully, kiton
Reply
#4
It seems that your file contains either empty line or line containing only whitespaces. For such line split() returns empty list that leads to index out of a range error.

You can save your split result as a variable and use if condition to ensure that its not empty (or use try/except).
Reply
#5
Tested with your provided example (without the dots).

import os

with open('file.csv') as in_file:
    next(in_file) # throw away the head
    # this does work, because in_file is also an interator
    # calling next or iterating with a for loop gives you line by line
    # newline is not stripped of
    for line in in_file:
        if not line:
            continue
        first, *rest = line.split() # pay attention, here the newline is stripped off
        # you'll also loose all whitespaces when you're using split()
        # out_filename = first + '.txt'
        # out_filename = '{}.txt'.format(first)
        out_filename = f'{first}.txt' # requires Python 3.6 // format strings
        with open(out_filename, 'w') as out_file:
            # mode w overwrites files
            # if the first number is not unique, you'll lose data
            #
            # rest is a list, making a string with join
            # or you write the whole line including the
            # first part of filename, depends what you want
            to_write = ' '.join(rest) # joins the list with each one space between the elements
            out_file.write(to_write)
            out_file.write(os.linesep) # write a newline
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#6
zivoni, DeaD_Eye, thank you for your feedback. So, I am trying the solution suggested by Dead_Eye and it seems to do the job. However, the issue is that the file name of each newly created txt appears to be as follows, e.g.:

1001,"I.txt
1003,"Like,.txt
1005,"""/.txt
1007,"I.txt

In other words, the first value (i.e., FIPS code) is correct, then there is a comma, then a quote, and then the very first word of the text. The content of the txt file is what is supposed to be except for that first word. What am I doing wrong? Wall
Reply
#7
I would guess it means that you have posted different "csv" file than you are actually processing...

Your posted output was probably pandas dataframe output (with line numbers and fields unquoted and separated by whitespaces), while your actual file seems to be real csv file with comma separator and quoted fields:
Output:
"FIPS","text" 1001,"I screwed that last tweet upAnd im ready to get..." 1003,"Like, please, we're all dying to know how many..." ...
If you process such file with DeaD_Eye's script, you naturally would get strange file names (anything to first whitespace would be in file name) and shortened text (and Dead_Eye's script would not survive file with line consisting just of whitespaces, but that should not be problem in "well" formatted csv).

As it seems that you are working with real csv file, use csv module - something like
import csv

with open('file.csv') as csvfile:
   reader = csv.reader(csvfile)
   next(reader) # throw away first line
   for row in reader: # row is already list of splitted fields
       if len(row) == 2: # write out only lines that have exactly two fields - id and tweet?
           with open(row[0] + ".txt", "w") as outfile:
               outfile.write(row[1] + "\n") 
If your file can contain more than two fields and you want them all, concatenate row[1:] into your output file (or use csv.writer and write csv file). And if you want to write out even "empty" tweets, then add check for len(row) == 1 writing empty file...
Reply
#8
zivoni, you are a mega mind -- thank you so much for your help. Your code initially resulted in the
Error:
Error: field larger than field limit (131072)
But I found a quick solution to that by running the following code
csv.field_size_limit(500 * 1024 * 1024)
and it worked perfectly. Whoohoo :)
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Saving data into xlxs colomwise erdemath 0 1,402 Oct-02-2021, 09:01 AM
Last Post: erdemath
  Data saving structure JosefFilosopio 0 2,121 May-04-2019, 10:48 AM
Last Post: JosefFilosopio
  Problem with saving data and loading data to mysql kirito85 4 3,913 Feb-08-2019, 10:53 AM
Last Post: kirito85
  Saving data into .mat (Matlab Fiile) Tulo_Stone 0 2,387 Feb-28-2018, 12:57 AM
Last Post: Tulo_Stone

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020