Extracting information from a file

lokhtar · Dec-06-2019, 03:39 PM

Hello guys,

I have a directory with a lot of text files, I need to loop through them and extract a certain section from them. The Text files are formatted in a standard way, in this way:

Quote:ABC: this is a text

SECTION 2: this is more text

ANOTHER SECTION: blah blah blah. This is another section

SECTION 4:
SECTION 5:
* A list
* Another list. I need this

YET ANOTHER SECTION: A bunch. of sentences. exist here.

OTHER FINDINGS: None.

FINAL
THIS IS NOT IMPORTANT

What I need to do, is to extract the "section 5" portion of the text. I know the split method, but splitting the file by ":", and then further splitting by "*" - doesn't quite seem right:

import glob

#list of all the text files
path = "reports/*.txt"


file_id=0

#loop through files, one at a time
for file_name in glob.glob(path):
    file_id += 1
    
    with open (file_name, 'rt') as myfile:
        current_file = myfile.read()
        
    section_list = current_file.split(':')
    for list_section in section_list:
        further_split = list_section.split('*')
        for x in further_split:
            print("An item in list :" + str(further_split))

Is there a more elegant/better way to get to what I need? What I am really after is that within the section that I care about, I want to loop through each of the subsections, which are delineated by "*" and work with those strings.

I would appreciate any help!

**Larz60+** · Dec-06-2019, 05:40 PM

please attach a small test file

lokhtar · Dec-06-2019, 06:07 PM

It won't let me upload files on this forum for some reason, so I uploaded it on the website: http://s000.tinyupload.com/download.php?...4983370453

**Larz60+** · (This post was last modified: Dec-06-2019, 09:13 PM by Larz60+.)

You need 5 posts then you will be able to upload, so should be able to do so soon, same applies to editing your post.
I used your link this time.

I'll get back as soon as I get a chance to examine code.

**Larz60+** · Dec-06-2019, 09:41 PM

This will get you started. I switched to pathlib (requires python 3.6 or newer) rather than Glob as it is OOP
and then just printer each line without modification.

You can add your parsing back in from here

import os
from pathlib import Path


def read_files():
    # assure that starting path is script path
    os.chdir(os.path.abspath(os.path.dirname(__file__)))

    file_id = 0

    #list of all the text files
    scriptpath = Path('.')
    reportpath = scriptpath / 'reports'
    # print(f"working directory: {reportpath.resolve()}")

    # Get list of text files
    textfiles = [filename for filename in reportpath.iterdir() \
        if filename.is_file() and filename.suffix == '.txt']

    print()

    for filename  in textfiles: 
    # #loop through files, one at a time
    # for file_name in glob.glob(path):
        file_id += 1
        
        with filename.open() as myfile:
            for line in myfile:
                line = line.strip()
                print(f"{line}")
        # section_list = current_file.split(':')
        # for list_section in section_list:
        #     further_split = list_section.split('*')
        #     for x in further_split:
        #         print("An item in list :" + str(further_split))

if __name__ == '__main__':
    read_files()

output:

Output:ABC: this is a text file.  I need to do stuff with it.

SECTION 2: this is more text

ANOTHER SECTION: blah blah blah. I have lots of information here.

SECTION 4:
SECTION 5:
* A list
* Another list. I need this

YET ANOTHER SECTION: A bunch. of sentences. exist here.
* Another list. I don't need this

OTHER FINDINGS: None.

FINAL
THIS IS NOT IMPORTANT

lokhtar · Dec-09-2019, 05:57 PM

Thank you!

***snippsat*** · (This post was last modified: Dec-09-2019, 09:44 PM by snippsat.)

Output:ABC: this is a text file.  I need to do stuff with it.

SECTION 2: this is more text

ANOTHER SECTION: blah blah blah. I have lots of information here.

SECTION 4:
SECTION 5:
* A list
* Another list. I need this

SECTION 6:
* A bunch. of sentences. exist here.

A example with one way to parse lines in SECTION 5.

flag = 1
with open('sample_file.txt') as f:
    for line in f:
        if line.startswith('SECTION 5:'):
            flag = 0
            #next(f) # Will skip first line in SECTION 5
        if line.startswith('SECTION 6:'):
            flag = 1
        if not flag and not line.startswith('SECTION 5:'):
           print(line.strip())

Output:* A list
* Another list. I need this

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Extracting the correct data from a CSV file	S2G	6	2,042	Jun-03-2024, 04:50 PM Last Post: snippsat
	Extracting specific file from an archive	tester_V	4	2,959	Jan-29-2024, 06:41 PM Last Post: tester_V
	Extracting Specific Lines from text file based on content.	jokerfmj	8	5,934	Mar-28-2022, 03:38 PM Last Post: snippsat
	Extracting information from .xlsx files	hobbyist	0	2,241	Jan-06-2021, 07:20 PM Last Post: hobbyist
	getting information from a text file	Nickd12	8	4,751	Nov-17-2020, 01:29 AM Last Post: bowlofred
	Extracting data based on specific patterns in a text file	K11	1	2,950	Aug-28-2020, 09:00 AM Last Post: Gribouillis
	Extracting CSV from Excel file	3_14ThonUser	5	4,019	May-11-2020, 05:37 PM Last Post: buran
	Extracting hole-coordinates from a STEP-file	Saksa	1	3,804	Jan-20-2020, 04:24 PM Last Post: Larz60+
	Errors to get information of multiple files into a single file csv	Clnprof	3	3,540	Aug-30-2019, 04:59 PM Last Post: ThomasL
	Validating information from .csv file before executemany	mzmingle	7	7,251	Apr-15-2019, 01:40 PM Last Post: mzmingle

Extracting information from a file

User Panel Messages

Announcements