Converting .txt to .csv file

palladium · Jan-01-2021, 07:56 AM

(Jan-01-2021, 07:48 AM)SunWers Wrote:
(Jan-01-2021, 07:41 AM)buran Wrote: can you upload sample input file? Looking at the screenshot again, it is more likely a fixed-width file.

Sorry, crazy question.... how do I upload a file?

Just copy and paste a small section of the text file, using the output tags.

SunWers · (This post was last modified: Jan-01-2021, 08:04 AM by SunWers.)

(Jan-01-2021, 07:53 AM)palladium Wrote:
(Jan-01-2021, 07:31 AM)SunWers Wrote: Hi Palladium,

I just tried adding "sep = None" and this is the error message I get...

1
2
3

runfile('C:/Users/vhubb/Documents/Python Scripts/temp.py', wdir='C:/Users/vhubb/Documents/Python Scripts')
C:\Users\vhubb\Documents\Python Scripts\temp.py:37: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'.
df = pd.read_csv(in_filename, sep="None")

It's not an error message. Its a warning to let you know that the pandas engine is using a non default method to parse your file (the default is C engine which does not guessing delimiters; pandas is telling you that it has switched to the python engine which I understand is slower but has some built in support for guessing delimiters). If you add the keyword argument engine = 'python' as per the warning message, the message should disappear.

An error message will tell you what type of error it is (e.g. "TypeError" or "ValueError") followed by a message.

If you don't see anything else it means the process is successful. Print out the resulting dataframe to verify.

As buran pointed out you can use pandas' built in to_excel() method, which dumps the dataframe into an xlsx file format. It is customisable to some extent,b ut as (s)he astutely pointed out, given your complex needs openpyxl may be a better bet.

I checked the output file and it is still combining all columns into one. I also just uploaded the text file above.

**buran** · (This post was last modified: Jan-01-2021, 08:39 AM by buran.)

yes, it's fixed-width file.
If you want to read it with pandas - look at pandas.read_fwf

or just parse it your self (i.e. there is no much use of pandas if you just want to read it and then write to excel, without doing calcutions).
Also note that there is extra header line (the underlines of the header) which you will need to deal with in pandas.

        
              import pandas as pd
 
df = pd.read_fwf('mytxt.txt')
df.drop(index=0, inplace=True) # drop the underlines
print(df.head())

Output:      Cutfile Name            Start Time              End Time Total Time Status  ... Dry Run.1 Cut Speed Throughput Feed Rate Operator
1    BWW34580401 B  11/2/2020 6:39:37 AM  11/2/2020 6:40:05 AM    0:00:28      2  ...     0.000     0.000      0.000     0.000  Unknown
2    BWW34580401 B  11/2/2020 6:40:20 AM  11/2/2020 7:11:45 AM    0:31:25      3  ...     0.000   394.513    168.167    40.000  Unknown
3       PRECUT.$$$  11/2/2020 7:21:07 AM  11/2/2020 7:21:22 AM    0:00:15      3  ...     0.000   354.574    173.032    40.000  Unknown
4  LFRF0106 30D A2  11/2/2020 7:32:59 AM  11/2/2020 8:00:49 AM    0:27:50      3  ...     0.000   430.553    200.613    40.000  Unknown
5  LFRF0106 30D A1  11/2/2020 8:19:21 AM  11/2/2020 9:08:27 AM    0:49:06      3  ...     0.000   430.338    200.812    40.000  Unknown

[5 rows x 27 columns]

stevenroger · Jan-01-2021, 09:55 AM

why aren't you using the csv module to do this? your code seems to pick the first letter of the name to put it in the second column.

**buran** · (This post was last modified: Jan-01-2021, 03:05 PM by buran.)

(Jan-01-2021, 09:55 AM)stevenroger Wrote: why aren't you using the csv module to do this?

csv module will not do. It's a fixed-width file, not delimited (comma-, tab- or whatever).

Of course, as I mentioned, they can parse it themselves.

SunWers · Jan-02-2021, 01:22 AM

Thank You! I think what I will do is parse them myself and then go from there. I'm thinking there is a way to pull the files as .csv format. But I won't know for sure until I get back to work Monday. I could set the script up to be more automatic if the files were .csv format, correct?

Thanks again for everyone's help!

Hope everyone is having a Happy Holiday

-Veronica

(Jan-01-2021, 08:39 AM)buran Wrote: yes, it's fixed-width file.
If you want to read it with pandas - look at pandas.read_fwf

or just parse it your self (i.e. there is no much use of pandas if you just want to read it and then write to excel, without doing calcutions).
Also note that there is extra header line (the underlines of the header) which you will need to deal with in pandas.

1
2
3
4
5

import pandas as pd

df = pd.read_fwf('mytxt.txt')
df.drop(index=0, inplace=True) # drop the underlines
print(df.head())
Output:      Cutfile Name            Start Time              End Time Total Time Status  ... Dry Run.1 Cut Speed Throughput Feed Rate Operator
1    BWW34580401 B  11/2/2020 6:39:37 AM  11/2/2020 6:40:05 AM    0:00:28      2  ...     0.000     0.000      0.000     0.000  Unknown
2    BWW34580401 B  11/2/2020 6:40:20 AM  11/2/2020 7:11:45 AM    0:31:25      3  ...     0.000   394.513    168.167    40.000  Unknown
3       PRECUT.$$$  11/2/2020 7:21:07 AM  11/2/2020 7:21:22 AM    0:00:15      3  ...     0.000   354.574    173.032    40.000  Unknown
4  LFRF0106 30D A2  11/2/2020 7:32:59 AM  11/2/2020 8:00:49 AM    0:27:50      3  ...     0.000   430.553    200.613    40.000  Unknown
5  LFRF0106 30D A1  11/2/2020 8:19:21 AM  11/2/2020 9:08:27 AM    0:49:06      3  ...     0.000   430.338    200.812    40.000  Unknown

[5 rows x 27 columns]

**buran** · (This post was last modified: Jan-02-2021, 03:24 AM by buran.)

(Jan-02-2021, 01:22 AM)SunWers Wrote: I could set the script up to be more automatic if the files were .csv format, correct?

It will be "automatic" all the same. Just the code will be slightly different, e.g. pandas.read_fwf vs pandas.read_csv or if you parse yourself it will allow to use csv, but in any case it will not do big difference.

Given that pandas did good job reading the file, concentrate on writing to excel. You have 2 options - write to excel from pandas using to_excel, then format the file to look like what you want using other package like openpyxl or write each column from the dataframe in the template you created one by one. In any case you need to handle start and end time because date and time are one column, while you want them separate columns.

Pedroski55 · Jan-02-2021, 05:03 AM

A .csv file is a text file.

Something has to tell the csv reader what belongs in what column.

Below, the separator is a comma.

The comma was put there when I saved a Libre Office spreadsheet as .csv

Here is a peek:

Quote:Photo,phone,number,name,attendance,attn%,Final Score,points%,points,hwW16,cwW16,Week 16,hwW15,cwW15,Week 15,hwW14,cwW14,Week 14,hwW13,cwW13,Week 13,hwW12,cwW12,Week 12,hwW11,cwW11,Week 11,hwW10,cwW10,Week 10,hwW9,cwW9,Week 9
,,,,max Attendance,,,Max Score,425,,,,50,25,,50,25,,50,25,,50,25,,50,25,,0,25,,0,25,
,,,23名,8,,,,,,,,,,,,,,,,,,,,,,,,,,,,
Photo,phone,学号,姓名,出勤,出勤率,期末分数,得分率,得分,HW,CW,Week 16,HW,CW,Week 15,HW,CW,Week 14,HW,CW,Week 13,HW,CW,Week 12,HW,CW,Week 11,HW,CW,Week 10,HW,CW,Week 9
,,1921030146,聂林,0,0,0,0,0,,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
,,1921090147,郜阳杰,0,0,0,0,0,,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
,,1922190144,汪毓韬,0,0,0,0,0,,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
,,1923010312,张长乐,7,100,100,100,425,,,1,50,25,1,50,25,1,50,25,1,50,25,1,50,25,1,0,25,1,0,25,1

When I open this .csv file with Libre Office, LO looks for the separator. Can be tab, comma, dunno what.

This reads the above file and also can save the data I want, better than csv.reader() (I think):

        
              import pandas as pd
 
# Read the file
 
file1 = '/home/pedro/winter2020/20PY/dumpFiles/attendanceWinter2020_20PYlatestPyMade_test1.csv'
 
dg = pd.read_csv(file1, usecols=['number', 'name', 'Final Score'])
 
# write the file
 
file2 = '/home/pedro/winter2020/20PY/dumpFiles/panda_bear.csv'
 
dg.to_csv(file2, encoding='utf8', index=False, sep=',')

caleb_cruze · Jan-04-2021, 01:10 PM

How to Convert a TXT file to CSV

Open Excel and create a new spreadsheet.
Select the Data tab.
On the far right, click “Get External Data”, then select the “From Text” option.
Find the TXT file on your computer and click “Open”.
In the first step of the Import Wizard, select “Delimited”
Make sure to select that your data has headers (check the box) then click next.
In the second step, select “Tab” if you’re working with the Amazon All Products Report, or “Comma” if you’re using another file. The best way to tell is to take a look at the contents of the file and see how they’re divided. If there is just a bunch of space between one field and another, then Tab is the right choice. If you look at the file and there are commas after each entry, then Comma is the right choice.

Regards,
Caleb

**buran** · Jan-04-2021, 01:12 PM

@caleb_cruze, what does this have to do with OP question on how to automate report creation using python. Based on discussion so far she is perfectly good at doing the report manually :)

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Converting a json file to a dataframe with rows and columns	eyavuz21	13	14,616	Jan-29-2023, 03:59 PM Last Post: eyavuz21
	[split] Help- converting file with pyton script	eltomassito	6	4,726	Jul-02-2021, 05:29 PM Last Post: snippsat
	Help- converting file with pyton script	grinleon	3	3,445	Sep-23-2020, 11:48 AM Last Post: grinleon

Converting .txt to .csv file

User Panel Messages

Announcements