Python Forum
Read Multiples Text Files get specific lines based criteria
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Read Multiples Text Files get specific lines based criteria
#1
How can I read multiples text files get specific lines by criteria a save into CSV master file

I try this
import glob
import pandas as pd

lista = []

try:
    for f_name in glob.glob(r'C:\Users\zinho\Desktop\Projet_Txt'):
        if f_name.endswith('.txt'):
            with open(f_name, 'r') as f:
                for i in f:
                    if i[:6] == '|D100|':
                        lista.append(i)

except UnicodeDecodeError:
    pass

df = pd.DataFrame(lista)
df.to_csv('Master.csv', index=False, header=False)
Note: I can't use pandas to read, becouse everything that tray fail.
Reply
#2
Look into os.walk and readlines and split
I welcome all feedback.
The only dumb question, is one that doesn't get asked.
My Github
How to post code using bbtags


Reply
#3
I don't undstand, at home I find way to solve my problem, but when I test this code at work does't work.

After run this code (change filepaths to my job computer), don't show D100 lines, showing nothing.

import glob
import pandas as pd

filepaths = glob.glob("/home/zinho/Downloads/*.txt")
lista = []

'''
Faz a leitura de vários arquivos txt
Copia as linhas baseado no critério |D100| e salva em um csv

'''

try:
    for fp in filepaths:
        with open(fp, 'r') as f:
            lin = f.readlines()
            for cnt in lin:
                if cnt[:6] == '|D100|':
                    lista.append(cnt)

# Essa linha cuida do final do arquivo com caracteres estrnhos
except UnicodeDecodeError:
    pass

df = pd.DataFrame(lista)
df.to_csv('/home/zinho/Downloads/Master.csv', index=False, header=False)
Reply
#4
This works for me
# /usr/bin/env python3

import os

for root, dirs, files in os.walk('./', topdown=True):

    for name in files:
        if 'txt' in name:
            with open(name, 'r') as lines:
                for line in lines:
                    for word in line.split():
                        if '|D100|' in word:
                            print(line) 
Output:
this is some text with |D100| text with |D100| still more |D100| one more text with |D100|
Directory structure
Output:
├── another.txt ├── my.txt └── walk.py Contents of my.txt this is some text with |D100| this text does not have it more text text with |D100| Contents of another.txt This is another file text More here still more |D100| one more text with |D100|
I welcome all feedback.
The only dumb question, is one that doesn't get asked.
My Github
How to post code using bbtags


Reply
#5
I try your code, but the result it's same of mine, I mean nothing is show

My text letter file as exemple has 70K rows, I take peace of it
Output:
|C590|090|1253|0|710,44|0|0|0|0|0|| |C500|0|1|1-600281|06|00||||25878154|23042019|25042019|15309,16|0|15309,16|0|0|0|0|0|0|0||252,60|1163,50||| |C590|090|1253|0|15309,16|0|0|0|0|0|| |C990|885509| |D001|0| |D100|0|1|1-500145|57|00|001||2097819|32190405593147000156570010020978191053226735|01042019|08042019|0||313,46|0|0|313,46|313,46|37,62|0|||3205002|3205309| |D190|000|1353|12|313,46|313,46|37,62|0|| |D100|0|1|1-500145|57|00|001||2097820|32190405593147000156570010020978201053226752|01042019|08042019|0||313,46|0|0|313,46|313,46|37,62|0|||3205002|3204609| |D190|000|1353|12|313,46|313,46|37,62|0|| |D100|0|1|1-500145|57|00|001||2097821|32190405593147000156570010020978211053226768|01042019|08042019|0||313,46|0|0|313,46|313,46|37,62|0|||3205002|3200607| |D190|000|1353|12|313,46|313,46|37,62|0|| |D100|0|1|1-500145|57|00|001||2097822|32190405593147000156570010020978221053226773|01042019|08042019|0||313,46|0|0|313,46|313,46|37,62|0|||3205002|3202801| |D190|000|1353|12|313,46|313,46|37,62|0|| |D100|0|1|1-500145|57|00|001||2097823|32190405593147000156570010020978231053226789|01042019|08042019|0||313,46|0|0|313,46|313,46|37,62|0|||3205002|3201902| |D190|000|1353|12|313,46|313,46|37,62|0|| |D100|0|1|1-500145|57|00|001||2097824|32190405593147000156570010020978241053226794|01042019|08042019|0||313,46|0|0|313,46|313,46|37,62|0|||3205002|3202405| |D190|000|1353|12|313,46|313,46|37,62|0|| |D100|0|1|1-500145|57|00|001||2097825|32190405593147000156570010020978251053226805|01042019|08042019|0||313,46|0|0|313,46|313,46|37,62|0|||3205002|3205200| |D190|000|1353|12|313,46|313,46|37,62|0|| |D100|0|1|1-500145|57|00|001||2097826|32190405593147000156570010020978261053226810|01042019|08042019|0||313,46|0|0|313,46|313,46|37,62|0|||3205002|3201308| |D190|000|1353|12|313,46|313,46|37,62|0|| |D100|0|1|1-500145|57|00|001||2097827|32190405593147000156570010020978271053226826|01042019|08042019|0||313,46|0|0|313,46|313,46|37,62|0|||3205002|3201407| |D190|000|1353|12|313,46|313,46|37,62|0|| |D100|0|1|1-500145|57|00|001||2097828|32190405593147000156570010020978281053226831|01042019|08042019|0||313,46|0|0|313,46|313,46|37,62|0|||3205002|3202207|


After run your code
Output:
Python 3.8.2 (tags/v3.8.2:7b3ab59, Feb 25 2020, 22:45:29) [MSC v.1916 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license()" for more information. >>> = RESTART: C:/Users/zinho/Documents/Projetos_Python/gettext_V2.py >>>
My original file is here
https://santacruzdistribuidora-my.sharep...g?e=GWdqxr
Reply
#6
Hi

Finaly a solve this.
import glob, os
import pandas as pd

path = 'C:\\Users\\zinho\\Desktop\\Projet_Txt\\*.txt'
f_names = glob.glob(path)
lista = []


for file in f_names:
    try:
        with open(file, 'r') as f:
            
            try:
                for line in f:
                    if line[:6] == '|D100|':
                        lista.append(line)

            except UnicodeDecodeError:
                pass

    except IOError as exc:
        if exc.errno != errno.EISDIR:
            raise


df = pd.DataFrame(lista)
df.to_csv('C:\\Users\\zinho\\Downloads\\Master.csv', index=False, header=False)
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Copy Paste excel files based on the first letters of the file name Viento 2 346 Feb-07-2024, 12:24 PM
Last Post: Viento
  filtering a list of dictionary as per given criteria jss 5 597 Dec-23-2023, 08:47 AM
Last Post: Gribouillis
  Move Files based on partial Match mohamedsalih12 2 744 Sep-20-2023, 07:38 PM
Last Post: snippsat
  Color a table cell based on specific text Creepy 11 1,824 Jul-27-2023, 02:48 PM
Last Post: deanhystad
  Making a question answering chatbot based on the files I upload into python. Joejones 1 1,149 May-19-2023, 03:09 PM
Last Post: deanhystad
  Read text file, modify it then write back Pavel_47 5 1,499 Feb-18-2023, 02:49 PM
Last Post: deanhystad
  python print all files which contain specific word in it mg24 5 1,188 Jan-27-2023, 11:20 AM
Last Post: snippsat
  How to read in mulitple files efficiently garynewport 3 844 Jan-27-2023, 10:44 AM
Last Post: DeaD_EyE
  python move specific files from source to destination including duplicates mg24 3 1,050 Jan-21-2023, 04:21 AM
Last Post: deanhystad
  azure TTS from text files to mp3s mutantGOD 2 1,637 Jan-17-2023, 03:20 AM
Last Post: mutantGOD

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020