Python Forum
Can't read text file with pandas
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Can't read text file with pandas
#1
Hi.
I would like help to:
1-I need know why I cannot read a text file with pandas.
2-If possible read this file, get all rows that have C100 value and insert into new xls file.
Size file is 12MB
Link to my file:
https://drive.google.com/file/d/1MfkVJtb...sp=sharing

import pandas as pd

df = pd.read_csv("01Sped.txt", sep = "|", header=None)
print(df)
Erro:
Error:
Traceback (most recent call last): File "C:\Users\user\Downloads\readTxtFile.py", line 3, in <module> df = pd.read_csv("01Sped.txt", sep = "|", header=None) .................... File "pandas\_libs\parsers.pyx", line 899, in pandas._libs.parsers.TextReader.read File "pandas\_libs\parsers.pyx", line 914, in pandas._libs.parsers.TextReader._read_low_memory File "pandas\_libs\parsers.pyx", line 968, in pandas._libs.parsers.TextReader._read_rows File "pandas\_libs\parsers.pyx", line 955, in pandas._libs.parsers.TextReader._tokenize_rows File "pandas\_libs\parsers.pyx", line 2172, in pandas._libs.parsers.raise_parser_error pandas.errors.ParserError: Error tokenizing data. C error: Expected 17 fields in line 12001, saw 31
Thank you!!
Reply
#2
The error tells you exactly why you can't read the file. Line 12001 of the file has too many fields in it. You need to look at that line of the file and see what the issue is. If you post it here, and maybe the first few lines of the file for reference on what it's expecting, maybe we could help you figure that out.
Craig "Ichabod" O'Brien - xenomind.com
I wish you happiness.
Recommended Tutorials: BBCode, functions, classes, text adventures
Reply
#3
Hi

Before line 12001 this part of file has 8 fields, after line 12001 the part of file has 29 fields.
The first C100 start line 12001.
|C100|0|1|V0000015516|55|00|1|

Output:
|0500|16052000|03|A|00005|3120100001|DEVOLUÃıES DE VENDA| |0500|31082011|01|A|00005|1131700030|ICMS EM TRANSITO| |0500|11122002|03|A|00006|3320400002|COMBUST═VEIS E LUBRIFICANTES| |0500|25032003|03|A|00006|3320600010|MANUTENÃAO E REPARO DE IMËVEIS| |0500|11122002|03|A|00006|3321100014|REFEIÃıES E LANCHES| |0500|11122002|03|A|00006|3321100013|MANUTENÃ├O DE EQUIPAMENTOS DIVERSOS| |0500|11122002|03|A|00006|3321100017|MATERIAIS DE EMBALAGENS E ACONDICIONAMENTO| |0500|11122002|03|A|00006|3321100001|IMPRESSOS E MATERIAIS DE ESCRITËRIO| |0500|11012013|01|A|00005|1322100010|COMPRAS ENTREGA FUTURA| |0500|09032006|02|A|00005|2160800020|PROVIS├O DE DESP. C/ TELECOMUNICAÃıES| |0500|16082011|03|A|00006|3321004003|Despesas com telephone| |0990|11999| |C001|0| |C100|0|1|V0000015516|55|00|1|000093676|35141233078528000132550010000936761540175846|16122014|02012015|20512,32|1|0,00|0,00|20512,32|0|0,00|0,00|0,00 |0 ,00|0,00|0,00|0,00|0,00|0,00|0,00|0,00|0,00| |C110|000001|| |C170|001|0105282|AZUKON MR 30MG CPR 1X30|960|CX|8377,19|0,00|0|260|2403|2403/AA|0,00|0,00|0,00|0,00|0,00|0,00|0|||0,00|0,00|0,00||0,00|0,0000|0,000| 0 0000|0,00||0,00|0,0000|0,000|0,0000|0,00|1170199999| |C170|002|0106788|BETACARD PLUS 50MG CPR 1X30|48|CX|465,02|0,00|0|260|2403|2403/AA|0,00|0,00|0,00|0,00|0,00|0,00|0|||0,00|0,00|0,00||0,00|0,0000|0,00 0| 0,0000|0,00||0,00|0,0000|0,000|0,0000|0,00|1170199999| |C170|003|0104839|INDAPEN SR 1,5MG CPR 1X30|1008|BLT|8919,01|0,00|0|260|2403|2403/AA|0,00|0,00|0,00|0,00|0,00|0,00|0|||0,00|0,00|0,00||0,00|0,0000|0, 0 0|0,0000|0,00||0,00|0,0000|0,000|0,0000|0,00|1170199999| |C170|004|0110578|METTA SR 500MG CPR 1X30|36|CX|291,67|0,00|0|260|2403|2403/AA|0,00|0,00|0,00|0,00|0,00|0,00|0|||0,00|0,00|0,00||0,00|0,0000|0,000|0, 0 00|0,00||0,00|0,0000|0,000|0,0000|0,00|1170199999| |C170|005|0114043|PIOGLIT 30MG CPR 1X30|24|BLT|917,12|0,00|0|260|2403|2403/AA|0,00|0,00|0,00|0,00|0,00|0,00|0|||0,00|0,00|0,00||0,00|0,0000|0,000|0,0 0 0|0,00||0,00|0,0000|0,000|0,0000|0,00|1170199999| |C170|006|0116178|TORLOS H 50+12,5MG CPR REV 1X30|120|UN|1542,31|0,00|0|260|2403|2403/AA|0,00|0,00|0,00|0,00|0,00|0,00|0|||0,00|0,00|0,00||0,00|0,000 0| 0,000|0,0000|0,00||0,00|0,0000|0,000|0,0000|0,00|1170199999| |C190|260|2403|0,00|20512,32|0,00|0,00|0,00|0,00|0,00|0,00|| |C100|0|1|V0000015872|55|00|1|000006404|35141207768134000368550010000064041653776796|19122014|02012015|3628,30|1|1034,72|0,00|4663,02|0|0,00|0,00|0,0 0| 0,00|0,00|0,00|0,00|0,00|0,00|0,00|0,00|0,00| |C110|000001|| |C170|001|0120782|OMNIC OCAS 0,4MG CPR 1X60|50|CX|4663,02|1034,72|0|260|2403|2403/AA|0,00|0,00|0,00|0,00|0,00|0,00|0|||0,00|0,00|0,00||0,00|0,0000|0, 0 0|0,0000|0,00||0,00|0,0000|0,000|0,0000|0,00|1170199999| |C190|260|2403|0,00|3628,30|0,00|0,00|0,00|0,00|0,00|0,00|| |C100|0|1|V0000015095|55|00|1|000721941|35141257507378000365550010007219411052247922|18122014|02012015|1209,16|1|1500,44|0,00|2709,60|0|0,00|0,00|0,0 0| 0,00|0,00|0,00|0,00|0,00|0,00|0,00|0,00|0,00|
Reply
#4
Pandas isn't set up for that sort of thing, because you can't have a different number of columns in different rows of a dataframe. Perhaps someone more familiar with pandas.read_csv can correct me, but I don't see a way to assume extra columns and fill them with dummy values. Therefore you would need to see skiprows and nrows (see the pandas.read_csv docs) to load different sections of the file into different dataframes. If you need them to be one dataframe, you can combine them after they are loaded.
Craig "Ichabod" O'Brien - xenomind.com
I wish you happiness.
Recommended Tutorials: BBCode, functions, classes, text adventures
Reply
#5
I do different way, the problem is that I have 12 file to read.
After that I need split | (pipe) character.

Somone can help me read 12 files?

with open('01Sped.txt', 'r') as f:
    dados = f.read()

with open('02Sped.txt', 'r') as f:
    dados2 = f.read()

output = dados+dados2
with open("spedFiscal.csv", "a") as myfile:
    myfile.write(output)

print("Finished!")
Reply
#6
Hi
I came back here, because I still look for solution.

Why Excel do this but Pandas not?
If Pandas can't do that, is there a way to read this file like this in Python3?
Output:
|1|04122018|23122020|1|0|21,24 |0|1|1-500341|57|00|002||22926|32190114436310000168570020000229261000170629|02012019|08012019|0||2570,50|0|0|2570,50|2570,50|308,46|0|||3205002|3304557 |0|1|1-500341|57|00|002||22933|32190114436310000168570020000229331000180858|02012019|08012019|0||2570,50|0|0|2570,50|2570,50|308,46|0|||3205002|3304557 |0|1|1-500260|57|00|001||113344|35190101695000000116570010001133441000921178|03012019|16012019|0||3848|0|0|3848|3848|461,76|0|||3205002|3304557 |0|1|1-500145|57|00|001||2023317|32190105593147000156570010020233171051174677|03012019|09012019|0||313,46|0|0|313,46|313,46|37,62|0|||3205002|3205309 |0|1|1-500145|57|00|001||2023318|32190105593147000156570010020233181051174682|03012019|09012019|0||313,46|0|0|313,46|313,46|37,62|0|||3205002|3204609 |0|1|1-500145|57|00|001||2023319|32190105593147000156570010020233191051174701|03012019|09012019|0||313,46|0|0|313,46|313,46|37,62|0|||3205002|3200607 |0|1|1-500145|57|00|001||2023321|32190105593147000156570010020233211051174726|03012019|09012019|0||313,46|0|0|313,46|313,46|37,62|0|||3205002|3202801 |0|1|1-500145|57|00|001||2023322|32190105593147000156570010020233221051174731|03012019|09012019|0||313,46|0|0|313,46|313,46|37,62|0|||3205002|3201902
I try this but lost data after
import pandas as pd
import glob

allfiles = glob.glob(r'C:\Users\Me\Desktop\Projet_Txt\*.txt')
df = pd.concat((pd.read_csv(f, sep="|", header=None, encoding='ISO-8859-1', engine='python', error_bad_lines=False) for f in allfiles))

df.to_csv('resultado.csv', index=False, header=False)
Reply
#7
You can follow the instruction: Read_csv() to read a text file. Call pd. read_csv(file) with the path name of a text file as file to return a pd. read_csv() in the Pandas documentation.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  problem with opening csv file in pandas saratha 2 489 Jul-21-2020, 10:55 AM
Last Post: saratha
  how to solve the 'NO SUCH DIRECTORY OR FILE' in pandas, python MohammedSohail 10 1,004 May-08-2020, 07:45 AM
Last Post: nnk
  Read json array data by pandas vipinct 0 360 Apr-13-2020, 02:24 PM
Last Post: vipinct
  Can python read Marathi text files and summarize them? mcp111 0 315 Mar-18-2020, 08:58 AM
Last Post: mcp111
  Read file Into array with just $0d as Newline lastyle 5 747 Feb-03-2020, 11:58 PM
Last Post: lastyle
  getting trailing zeros with 1 during pandas read fullstop 1 994 Jan-05-2020, 04:01 PM
Last Post: ichabod801
  Read csv file from Yahoo Finance ian 3 1,294 Sep-22-2019, 06:47 AM
Last Post: ndc85430
  Read Text From Image Nitesh 0 12,999 Jul-25-2019, 02:35 PM
Last Post: Nitesh
  Read Nested JSON with pandas.io.json palo173 4 5,634 Apr-29-2019, 01:25 PM
Last Post: palo173
  read complex file with both pandas and not Diedro 1 927 Jan-29-2019, 05:26 PM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020