Python Forum
Can't read text file with pandas
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Can't read text file with pandas
#1
Hi.
I would like help to:
1-I need know why I cannot read a text file with pandas.
2-If possible read this file, get all rows that have C100 value and insert into new xls file.
Size file is 12MB
Link to my file:
https://drive.google.com/file/d/1MfkVJtb...sp=sharing

import pandas as pd

df = pd.read_csv("01Sped.txt", sep = "|", header=None)
print(df)
Erro:
Error:
Traceback (most recent call last): File "C:\Users\user\Downloads\readTxtFile.py", line 3, in <module> df = pd.read_csv("01Sped.txt", sep = "|", header=None) .................... File "pandas\_libs\parsers.pyx", line 899, in pandas._libs.parsers.TextReader.read File "pandas\_libs\parsers.pyx", line 914, in pandas._libs.parsers.TextReader._read_low_memory File "pandas\_libs\parsers.pyx", line 968, in pandas._libs.parsers.TextReader._read_rows File "pandas\_libs\parsers.pyx", line 955, in pandas._libs.parsers.TextReader._tokenize_rows File "pandas\_libs\parsers.pyx", line 2172, in pandas._libs.parsers.raise_parser_error pandas.errors.ParserError: Error tokenizing data. C error: Expected 17 fields in line 12001, saw 31
Thank you!!
Reply
#2
The error tells you exactly why you can't read the file. Line 12001 of the file has too many fields in it. You need to look at that line of the file and see what the issue is. If you post it here, and maybe the first few lines of the file for reference on what it's expecting, maybe we could help you figure that out.
Craig "Ichabod" O'Brien - xenomind.com
I wish you happiness.
Recommended Tutorials: BBCode, functions, classes, text adventures
Reply
#3
Hi

Before line 12001 this part of file has 8 fields, after line 12001 the part of file has 29 fields.
The first C100 start line 12001.
|C100|0|1|V0000015516|55|00|1|

Output:
|0500|16052000|03|A|00005|3120100001|DEVOLUÃıES DE VENDA| |0500|31082011|01|A|00005|1131700030|ICMS EM TRANSITO| |0500|11122002|03|A|00006|3320400002|COMBUST═VEIS E LUBRIFICANTES| |0500|25032003|03|A|00006|3320600010|MANUTENÃAO E REPARO DE IMËVEIS| |0500|11122002|03|A|00006|3321100014|REFEIÃıES E LANCHES| |0500|11122002|03|A|00006|3321100013|MANUTENÃ├O DE EQUIPAMENTOS DIVERSOS| |0500|11122002|03|A|00006|3321100017|MATERIAIS DE EMBALAGENS E ACONDICIONAMENTO| |0500|11122002|03|A|00006|3321100001|IMPRESSOS E MATERIAIS DE ESCRITËRIO| |0500|11012013|01|A|00005|1322100010|COMPRAS ENTREGA FUTURA| |0500|09032006|02|A|00005|2160800020|PROVIS├O DE DESP. C/ TELECOMUNICAÃıES| |0500|16082011|03|A|00006|3321004003|Despesas com telephone| |0990|11999| |C001|0| |C100|0|1|V0000015516|55|00|1|000093676|35141233078528000132550010000936761540175846|16122014|02012015|20512,32|1|0,00|0,00|20512,32|0|0,00|0,00|0,00 |0 ,00|0,00|0,00|0,00|0,00|0,00|0,00|0,00|0,00| |C110|000001|| |C170|001|0105282|AZUKON MR 30MG CPR 1X30|960|CX|8377,19|0,00|0|260|2403|2403/AA|0,00|0,00|0,00|0,00|0,00|0,00|0|||0,00|0,00|0,00||0,00|0,0000|0,000| 0 0000|0,00||0,00|0,0000|0,000|0,0000|0,00|1170199999| |C170|002|0106788|BETACARD PLUS 50MG CPR 1X30|48|CX|465,02|0,00|0|260|2403|2403/AA|0,00|0,00|0,00|0,00|0,00|0,00|0|||0,00|0,00|0,00||0,00|0,0000|0,00 0| 0,0000|0,00||0,00|0,0000|0,000|0,0000|0,00|1170199999| |C170|003|0104839|INDAPEN SR 1,5MG CPR 1X30|1008|BLT|8919,01|0,00|0|260|2403|2403/AA|0,00|0,00|0,00|0,00|0,00|0,00|0|||0,00|0,00|0,00||0,00|0,0000|0, 0 0|0,0000|0,00||0,00|0,0000|0,000|0,0000|0,00|1170199999| |C170|004|0110578|METTA SR 500MG CPR 1X30|36|CX|291,67|0,00|0|260|2403|2403/AA|0,00|0,00|0,00|0,00|0,00|0,00|0|||0,00|0,00|0,00||0,00|0,0000|0,000|0, 0 00|0,00||0,00|0,0000|0,000|0,0000|0,00|1170199999| |C170|005|0114043|PIOGLIT 30MG CPR 1X30|24|BLT|917,12|0,00|0|260|2403|2403/AA|0,00|0,00|0,00|0,00|0,00|0,00|0|||0,00|0,00|0,00||0,00|0,0000|0,000|0,0 0 0|0,00||0,00|0,0000|0,000|0,0000|0,00|1170199999| |C170|006|0116178|TORLOS H 50+12,5MG CPR REV 1X30|120|UN|1542,31|0,00|0|260|2403|2403/AA|0,00|0,00|0,00|0,00|0,00|0,00|0|||0,00|0,00|0,00||0,00|0,000 0| 0,000|0,0000|0,00||0,00|0,0000|0,000|0,0000|0,00|1170199999| |C190|260|2403|0,00|20512,32|0,00|0,00|0,00|0,00|0,00|0,00|| |C100|0|1|V0000015872|55|00|1|000006404|35141207768134000368550010000064041653776796|19122014|02012015|3628,30|1|1034,72|0,00|4663,02|0|0,00|0,00|0,0 0| 0,00|0,00|0,00|0,00|0,00|0,00|0,00|0,00|0,00| |C110|000001|| |C170|001|0120782|OMNIC OCAS 0,4MG CPR 1X60|50|CX|4663,02|1034,72|0|260|2403|2403/AA|0,00|0,00|0,00|0,00|0,00|0,00|0|||0,00|0,00|0,00||0,00|0,0000|0, 0 0|0,0000|0,00||0,00|0,0000|0,000|0,0000|0,00|1170199999| |C190|260|2403|0,00|3628,30|0,00|0,00|0,00|0,00|0,00|0,00|| |C100|0|1|V0000015095|55|00|1|000721941|35141257507378000365550010007219411052247922|18122014|02012015|1209,16|1|1500,44|0,00|2709,60|0|0,00|0,00|0,0 0| 0,00|0,00|0,00|0,00|0,00|0,00|0,00|0,00|0,00|
Reply
#4
Pandas isn't set up for that sort of thing, because you can't have a different number of columns in different rows of a dataframe. Perhaps someone more familiar with pandas.read_csv can correct me, but I don't see a way to assume extra columns and fill them with dummy values. Therefore you would need to see skiprows and nrows (see the pandas.read_csv docs) to load different sections of the file into different dataframes. If you need them to be one dataframe, you can combine them after they are loaded.
Craig "Ichabod" O'Brien - xenomind.com
I wish you happiness.
Recommended Tutorials: BBCode, functions, classes, text adventures
Reply
#5
I do different way, the problem is that I have 12 file to read.
After that I need split | (pipe) character.

Somone can help me read 12 files?

with open('01Sped.txt', 'r') as f:
    dados = f.read()

with open('02Sped.txt', 'r') as f:
    dados2 = f.read()

output = dados+dados2
with open("spedFiscal.csv", "a") as myfile:
    myfile.write(output)

print("Finished!")
Reply
#6
Hi
I came back here, because I still look for solution.

Why Excel do this but Pandas not?
If Pandas can't do that, is there a way to read this file like this in Python3?
Output:
|1|04122018|23122020|1|0|21,24 |0|1|1-500341|57|00|002||22926|32190114436310000168570020000229261000170629|02012019|08012019|0||2570,50|0|0|2570,50|2570,50|308,46|0|||3205002|3304557 |0|1|1-500341|57|00|002||22933|32190114436310000168570020000229331000180858|02012019|08012019|0||2570,50|0|0|2570,50|2570,50|308,46|0|||3205002|3304557 |0|1|1-500260|57|00|001||113344|35190101695000000116570010001133441000921178|03012019|16012019|0||3848|0|0|3848|3848|461,76|0|||3205002|3304557 |0|1|1-500145|57|00|001||2023317|32190105593147000156570010020233171051174677|03012019|09012019|0||313,46|0|0|313,46|313,46|37,62|0|||3205002|3205309 |0|1|1-500145|57|00|001||2023318|32190105593147000156570010020233181051174682|03012019|09012019|0||313,46|0|0|313,46|313,46|37,62|0|||3205002|3204609 |0|1|1-500145|57|00|001||2023319|32190105593147000156570010020233191051174701|03012019|09012019|0||313,46|0|0|313,46|313,46|37,62|0|||3205002|3200607 |0|1|1-500145|57|00|001||2023321|32190105593147000156570010020233211051174726|03012019|09012019|0||313,46|0|0|313,46|313,46|37,62|0|||3205002|3202801 |0|1|1-500145|57|00|001||2023322|32190105593147000156570010020233221051174731|03012019|09012019|0||313,46|0|0|313,46|313,46|37,62|0|||3205002|3201902
I try this but lost data after
import pandas as pd
import glob

allfiles = glob.glob(r'C:\Users\Me\Desktop\Projet_Txt\*.txt')
df = pd.concat((pd.read_csv(f, sep="|", header=None, encoding='ISO-8859-1', engine='python', error_bad_lines=False) for f in allfiles))

df.to_csv('resultado.csv', index=False, header=False)
Reply
#7
You can follow the instruction: Read_csv() to read a text file. Call pd. read_csv(file) with the path name of a text file as file to return a pd. read_csv() in the Pandas documentation.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Pandas read csv file in 'date/time' chunks MorganSamage 4 1,629 Feb-13-2023, 11:24 AM
Last Post: MorganSamage
Smile How to further boost the data read write speed using pandas tjk9501 1 1,215 Nov-14-2022, 01:46 PM
Last Post: jefsummers
  How to import an xml file to Pandas sjhazard 0 2,312 Jun-08-2021, 08:19 PM
Last Post: sjhazard
  Read json array data by pandas vipinct 0 1,887 Apr-13-2020, 02:24 PM
Last Post: vipinct
  Can python read Marathi text files and summarize them? mcp111 0 1,765 Mar-18-2020, 08:58 AM
Last Post: mcp111
  Read file Into array with just $0d as Newline lastyle 5 3,279 Feb-03-2020, 11:58 PM
Last Post: lastyle
  getting trailing zeros with 1 during pandas read fullstop 1 3,544 Jan-05-2020, 04:01 PM
Last Post: ichabod801
  Read csv file from Yahoo Finance ian 3 4,586 Sep-22-2019, 06:47 AM
Last Post: ndc85430
  Read Text From Image Nitesh 0 15,015 Jul-25-2019, 02:35 PM
Last Post: Nitesh
  Read Nested JSON with pandas.io.json palo173 4 9,502 Apr-29-2019, 01:25 PM
Last Post: palo173

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020