Python Forum
Can't read text file with pandas - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: Can't read text file with pandas (/thread-18590.html)



Can't read text file with pandas - zinho - May-23-2019

Hi.
I would like help to:
1-I need know why I cannot read a text file with pandas.
2-If possible read this file, get all rows that have C100 value and insert into new xls file.
Size file is 12MB
Link to my file:
https://drive.google.com/file/d/1MfkVJtbrNhwEdXRluAZNnixqRlOSVgQc/view?usp=sharing

import pandas as pd

df = pd.read_csv("01Sped.txt", sep = "|", header=None)
print(df)
Erro:
Error:
Traceback (most recent call last): File "C:\Users\user\Downloads\readTxtFile.py", line 3, in <module> df = pd.read_csv("01Sped.txt", sep = "|", header=None) .................... File "pandas\_libs\parsers.pyx", line 899, in pandas._libs.parsers.TextReader.read File "pandas\_libs\parsers.pyx", line 914, in pandas._libs.parsers.TextReader._read_low_memory File "pandas\_libs\parsers.pyx", line 968, in pandas._libs.parsers.TextReader._read_rows File "pandas\_libs\parsers.pyx", line 955, in pandas._libs.parsers.TextReader._tokenize_rows File "pandas\_libs\parsers.pyx", line 2172, in pandas._libs.parsers.raise_parser_error pandas.errors.ParserError: Error tokenizing data. C error: Expected 17 fields in line 12001, saw 31
Thank you!!


RE: Can't read text file with pandas - ichabod801 - May-23-2019

The error tells you exactly why you can't read the file. Line 12001 of the file has too many fields in it. You need to look at that line of the file and see what the issue is. If you post it here, and maybe the first few lines of the file for reference on what it's expecting, maybe we could help you figure that out.


RE: Can't read text file with pandas - zinho - May-23-2019

Hi

Before line 12001 this part of file has 8 fields, after line 12001 the part of file has 29 fields.
The first C100 start line 12001.
|C100|0|1|V0000015516|55|00|1|

Output:
|0500|16052000|03|A|00005|3120100001|DEVOLUÃıES DE VENDA| |0500|31082011|01|A|00005|1131700030|ICMS EM TRANSITO| |0500|11122002|03|A|00006|3320400002|COMBUST═VEIS E LUBRIFICANTES| |0500|25032003|03|A|00006|3320600010|MANUTENÃAO E REPARO DE IMËVEIS| |0500|11122002|03|A|00006|3321100014|REFEIÃıES E LANCHES| |0500|11122002|03|A|00006|3321100013|MANUTENÃ├O DE EQUIPAMENTOS DIVERSOS| |0500|11122002|03|A|00006|3321100017|MATERIAIS DE EMBALAGENS E ACONDICIONAMENTO| |0500|11122002|03|A|00006|3321100001|IMPRESSOS E MATERIAIS DE ESCRITËRIO| |0500|11012013|01|A|00005|1322100010|COMPRAS ENTREGA FUTURA| |0500|09032006|02|A|00005|2160800020|PROVIS├O DE DESP. C/ TELECOMUNICAÃıES| |0500|16082011|03|A|00006|3321004003|Despesas com telephone| |0990|11999| |C001|0| |C100|0|1|V0000015516|55|00|1|000093676|35141233078528000132550010000936761540175846|16122014|02012015|20512,32|1|0,00|0,00|20512,32|0|0,00|0,00|0,00 |0 ,00|0,00|0,00|0,00|0,00|0,00|0,00|0,00|0,00| |C110|000001|| |C170|001|0105282|AZUKON MR 30MG CPR 1X30|960|CX|8377,19|0,00|0|260|2403|2403/AA|0,00|0,00|0,00|0,00|0,00|0,00|0|||0,00|0,00|0,00||0,00|0,0000|0,000| 0 0000|0,00||0,00|0,0000|0,000|0,0000|0,00|1170199999| |C170|002|0106788|BETACARD PLUS 50MG CPR 1X30|48|CX|465,02|0,00|0|260|2403|2403/AA|0,00|0,00|0,00|0,00|0,00|0,00|0|||0,00|0,00|0,00||0,00|0,0000|0,00 0| 0,0000|0,00||0,00|0,0000|0,000|0,0000|0,00|1170199999| |C170|003|0104839|INDAPEN SR 1,5MG CPR 1X30|1008|BLT|8919,01|0,00|0|260|2403|2403/AA|0,00|0,00|0,00|0,00|0,00|0,00|0|||0,00|0,00|0,00||0,00|0,0000|0, 0 0|0,0000|0,00||0,00|0,0000|0,000|0,0000|0,00|1170199999| |C170|004|0110578|METTA SR 500MG CPR 1X30|36|CX|291,67|0,00|0|260|2403|2403/AA|0,00|0,00|0,00|0,00|0,00|0,00|0|||0,00|0,00|0,00||0,00|0,0000|0,000|0, 0 00|0,00||0,00|0,0000|0,000|0,0000|0,00|1170199999| |C170|005|0114043|PIOGLIT 30MG CPR 1X30|24|BLT|917,12|0,00|0|260|2403|2403/AA|0,00|0,00|0,00|0,00|0,00|0,00|0|||0,00|0,00|0,00||0,00|0,0000|0,000|0,0 0 0|0,00||0,00|0,0000|0,000|0,0000|0,00|1170199999| |C170|006|0116178|TORLOS H 50+12,5MG CPR REV 1X30|120|UN|1542,31|0,00|0|260|2403|2403/AA|0,00|0,00|0,00|0,00|0,00|0,00|0|||0,00|0,00|0,00||0,00|0,000 0| 0,000|0,0000|0,00||0,00|0,0000|0,000|0,0000|0,00|1170199999| |C190|260|2403|0,00|20512,32|0,00|0,00|0,00|0,00|0,00|0,00|| |C100|0|1|V0000015872|55|00|1|000006404|35141207768134000368550010000064041653776796|19122014|02012015|3628,30|1|1034,72|0,00|4663,02|0|0,00|0,00|0,0 0| 0,00|0,00|0,00|0,00|0,00|0,00|0,00|0,00|0,00| |C110|000001|| |C170|001|0120782|OMNIC OCAS 0,4MG CPR 1X60|50|CX|4663,02|1034,72|0|260|2403|2403/AA|0,00|0,00|0,00|0,00|0,00|0,00|0|||0,00|0,00|0,00||0,00|0,0000|0, 0 0|0,0000|0,00||0,00|0,0000|0,000|0,0000|0,00|1170199999| |C190|260|2403|0,00|3628,30|0,00|0,00|0,00|0,00|0,00|0,00|| |C100|0|1|V0000015095|55|00|1|000721941|35141257507378000365550010007219411052247922|18122014|02012015|1209,16|1|1500,44|0,00|2709,60|0|0,00|0,00|0,0 0| 0,00|0,00|0,00|0,00|0,00|0,00|0,00|0,00|0,00|



RE: Can't read text file with pandas - ichabod801 - May-23-2019

Pandas isn't set up for that sort of thing, because you can't have a different number of columns in different rows of a dataframe. Perhaps someone more familiar with pandas.read_csv can correct me, but I don't see a way to assume extra columns and fill them with dummy values. Therefore you would need to see skiprows and nrows (see the pandas.read_csv docs) to load different sections of the file into different dataframes. If you need them to be one dataframe, you can combine them after they are loaded.


RE: Can't read text file with pandas - zinho - May-23-2019

I do different way, the problem is that I have 12 file to read.
After that I need split | (pipe) character.

Somone can help me read 12 files?

with open('01Sped.txt', 'r') as f:
    dados = f.read()

with open('02Sped.txt', 'r') as f:
    dados2 = f.read()

output = dados+dados2
with open("spedFiscal.csv", "a") as myfile:
    myfile.write(output)

print("Finished!")



RE: Can't read text file with pandas - zinho - May-15-2020

Hi
I came back here, because I still look for solution.

Why Excel do this but Pandas not?
If Pandas can't do that, is there a way to read this file like this in Python3?
Output:
|1|04122018|23122020|1|0|21,24 |0|1|1-500341|57|00|002||22926|32190114436310000168570020000229261000170629|02012019|08012019|0||2570,50|0|0|2570,50|2570,50|308,46|0|||3205002|3304557 |0|1|1-500341|57|00|002||22933|32190114436310000168570020000229331000180858|02012019|08012019|0||2570,50|0|0|2570,50|2570,50|308,46|0|||3205002|3304557 |0|1|1-500260|57|00|001||113344|35190101695000000116570010001133441000921178|03012019|16012019|0||3848|0|0|3848|3848|461,76|0|||3205002|3304557 |0|1|1-500145|57|00|001||2023317|32190105593147000156570010020233171051174677|03012019|09012019|0||313,46|0|0|313,46|313,46|37,62|0|||3205002|3205309 |0|1|1-500145|57|00|001||2023318|32190105593147000156570010020233181051174682|03012019|09012019|0||313,46|0|0|313,46|313,46|37,62|0|||3205002|3204609 |0|1|1-500145|57|00|001||2023319|32190105593147000156570010020233191051174701|03012019|09012019|0||313,46|0|0|313,46|313,46|37,62|0|||3205002|3200607 |0|1|1-500145|57|00|001||2023321|32190105593147000156570010020233211051174726|03012019|09012019|0||313,46|0|0|313,46|313,46|37,62|0|||3205002|3202801 |0|1|1-500145|57|00|001||2023322|32190105593147000156570010020233221051174731|03012019|09012019|0||313,46|0|0|313,46|313,46|37,62|0|||3205002|3201902
I try this but lost data after
import pandas as pd
import glob

allfiles = glob.glob(r'C:\Users\Me\Desktop\Projet_Txt\*.txt')
df = pd.concat((pd.read_csv(f, sep="|", header=None, encoding='ISO-8859-1', engine='python', error_bad_lines=False) for f in allfiles))

df.to_csv('resultado.csv', index=False, header=False)



RE: Can't read text file with pandas - azajali43 - May-24-2020

You can follow the instruction: Read_csv() to read a text file. Call pd. read_csv(file) with the path name of a text file as file to return a pd. read_csv() in the Pandas documentation.