Python Forum
Read Multiples Text Files get specific lines based criteria
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Read Multiples Text Files get specific lines based criteria
#1
How can I read multiples text files get specific lines by criteria a save into CSV master file

I try this
import glob
import pandas as pd

lista = []

try:
    for f_name in glob.glob(r'C:\Users\zinho\Desktop\Projet_Txt'):
        if f_name.endswith('.txt'):
            with open(f_name, 'r') as f:
                for i in f:
                    if i[:6] == '|D100|':
                        lista.append(i)

except UnicodeDecodeError:
    pass

df = pd.DataFrame(lista)
df.to_csv('Master.csv', index=False, header=False)
Note: I can't use pandas to read, becouse everything that tray fail.
Reply
#2
Look into os.walk and readlines and split
I welcome all feedback.
The only dumb question, is one that doesn't get asked.
Gaming Collection
Homepage
my-python
Reply
#3
I don't undstand, at home I find way to solve my problem, but when I test this code at work does't work.

After run this code (change filepaths to my job computer), don't show D100 lines, showing nothing.

import glob
import pandas as pd

filepaths = glob.glob("/home/zinho/Downloads/*.txt")
lista = []

'''
Faz a leitura de vários arquivos txt
Copia as linhas baseado no critério |D100| e salva em um csv

'''

try:
    for fp in filepaths:
        with open(fp, 'r') as f:
            lin = f.readlines()
            for cnt in lin:
                if cnt[:6] == '|D100|':
                    lista.append(cnt)

# Essa linha cuida do final do arquivo com caracteres estrnhos
except UnicodeDecodeError:
    pass

df = pd.DataFrame(lista)
df.to_csv('/home/zinho/Downloads/Master.csv', index=False, header=False)
Reply
#4
This works for me
# /usr/bin/env python3

import os

for root, dirs, files in os.walk('./', topdown=True):

    for name in files:
        if 'txt' in name:
            with open(name, 'r') as lines:
                for line in lines:
                    for word in line.split():
                        if '|D100|' in word:
                            print(line) 
Output:
this is some text with |D100| text with |D100| still more |D100| one more text with |D100|
Directory structure
Output:
├── another.txt ├── my.txt └── walk.py Contents of my.txt this is some text with |D100| this text does not have it more text text with |D100| Contents of another.txt This is another file text More here still more |D100| one more text with |D100|
I welcome all feedback.
The only dumb question, is one that doesn't get asked.
Gaming Collection
Homepage
my-python
Reply
#5
I try your code, but the result it's same of mine, I mean nothing is show

My text letter file as exemple has 70K rows, I take peace of it
Output:
|C590|090|1253|0|710,44|0|0|0|0|0|| |C500|0|1|1-600281|06|00||||25878154|23042019|25042019|15309,16|0|15309,16|0|0|0|0|0|0|0||252,60|1163,50||| |C590|090|1253|0|15309,16|0|0|0|0|0|| |C990|885509| |D001|0| |D100|0|1|1-500145|57|00|001||2097819|32190405593147000156570010020978191053226735|01042019|08042019|0||313,46|0|0|313,46|313,46|37,62|0|||3205002|3205309| |D190|000|1353|12|313,46|313,46|37,62|0|| |D100|0|1|1-500145|57|00|001||2097820|32190405593147000156570010020978201053226752|01042019|08042019|0||313,46|0|0|313,46|313,46|37,62|0|||3205002|3204609| |D190|000|1353|12|313,46|313,46|37,62|0|| |D100|0|1|1-500145|57|00|001||2097821|32190405593147000156570010020978211053226768|01042019|08042019|0||313,46|0|0|313,46|313,46|37,62|0|||3205002|3200607| |D190|000|1353|12|313,46|313,46|37,62|0|| |D100|0|1|1-500145|57|00|001||2097822|32190405593147000156570010020978221053226773|01042019|08042019|0||313,46|0|0|313,46|313,46|37,62|0|||3205002|3202801| |D190|000|1353|12|313,46|313,46|37,62|0|| |D100|0|1|1-500145|57|00|001||2097823|32190405593147000156570010020978231053226789|01042019|08042019|0||313,46|0|0|313,46|313,46|37,62|0|||3205002|3201902| |D190|000|1353|12|313,46|313,46|37,62|0|| |D100|0|1|1-500145|57|00|001||2097824|32190405593147000156570010020978241053226794|01042019|08042019|0||313,46|0|0|313,46|313,46|37,62|0|||3205002|3202405| |D190|000|1353|12|313,46|313,46|37,62|0|| |D100|0|1|1-500145|57|00|001||2097825|32190405593147000156570010020978251053226805|01042019|08042019|0||313,46|0|0|313,46|313,46|37,62|0|||3205002|3205200| |D190|000|1353|12|313,46|313,46|37,62|0|| |D100|0|1|1-500145|57|00|001||2097826|32190405593147000156570010020978261053226810|01042019|08042019|0||313,46|0|0|313,46|313,46|37,62|0|||3205002|3201308| |D190|000|1353|12|313,46|313,46|37,62|0|| |D100|0|1|1-500145|57|00|001||2097827|32190405593147000156570010020978271053226826|01042019|08042019|0||313,46|0|0|313,46|313,46|37,62|0|||3205002|3201407| |D190|000|1353|12|313,46|313,46|37,62|0|| |D100|0|1|1-500145|57|00|001||2097828|32190405593147000156570010020978281053226831|01042019|08042019|0||313,46|0|0|313,46|313,46|37,62|0|||3205002|3202207|


After run your code
Output:
Python 3.8.2 (tags/v3.8.2:7b3ab59, Feb 25 2020, 22:45:29) [MSC v.1916 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license()" for more information. >>> = RESTART: C:/Users/zinho/Documents/Projetos_Python/gettext_V2.py >>>
My original file is here
https://santacruzdistribuidora-my.sharep...g?e=GWdqxr
Reply
#6
Hi

Finaly a solve this.
import glob, os
import pandas as pd

path = 'C:\\Users\\zinho\\Desktop\\Projet_Txt\\*.txt'
f_names = glob.glob(path)
lista = []


for file in f_names:
    try:
        with open(file, 'r') as f:
            
            try:
                for line in f:
                    if line[:6] == '|D100|':
                        lista.append(line)

            except UnicodeDecodeError:
                pass

    except IOError as exc:
        if exc.errno != errno.EISDIR:
            raise


df = pd.DataFrame(lista)
df.to_csv('C:\\Users\\zinho\\Downloads\\Master.csv', index=False, header=False)
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Trying to read specific lines from a file Laplace12 1 85 Jun-18-2021, 02:17 PM
Last Post: Larz60+
  Extract specific sentences from text file Bubly 3 248 May-31-2021, 06:55 PM
Last Post: Larz60+
  Matching two files based on a spited elements tester_V 5 247 May-30-2021, 07:49 PM
Last Post: tester_V
  More elegant way to remove time from text lines. Pedroski55 6 552 Apr-25-2021, 03:18 PM
Last Post: perfringo
  Moving specific files then unzipping/decompressing christophereccles 2 234 Apr-24-2021, 04:25 AM
Last Post: ndc85430
  A text-based game [SOLVED] Gameri1 6 506 Apr-20-2021, 02:26 PM
Last Post: buran
  how to connect mysql from txt 1 line goes good but not all lines in text kingceasarr 4 456 Mar-24-2021, 05:45 AM
Last Post: buran
  Increment text files output and limit contains Kaminsky 1 403 Jan-30-2021, 06:58 PM
Last Post: bowlofred
  read logfile between two specific strings FelixReiter 6 516 Jan-04-2021, 02:26 PM
Last Post: FelixReiter
  Winning/Losing Message Error in Text based Game kdr87 2 468 Dec-14-2020, 12:25 AM
Last Post: bowlofred

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020