Python Forum
Extracting information from a file
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Extracting information from a file
#1
Hello guys,

I have a directory with a lot of text files, I need to loop through them and extract a certain section from them. The Text files are formatted in a standard way, in this way:

Quote:ABC: this is a text
 
SECTION 2: this is more text
 
ANOTHER SECTION: blah blah blah. This is another section
 
SECTION 4:
SECTION 5:
* A list
* Another list. I need this
 
YET ANOTHER SECTION: A bunch. of sentences. exist here.
 
OTHER FINDINGS: None.
 
FINAL
THIS IS NOT IMPORTANT

What I need to do, is to extract the "section 5" portion of the text. I know the split method, but splitting the file by ":", and then further splitting by "*" - doesn't quite seem right:

import glob

#list of all the text files
path = "reports/*.txt"


file_id=0

#loop through files, one at a time
for file_name in glob.glob(path):
    file_id += 1
    
    with open (file_name, 'rt') as myfile:
        current_file = myfile.read()
        
    section_list = current_file.split(':')
    for list_section in section_list:
        further_split = list_section.split('*')
        for x in further_split:
            print("An item in list :" + str(further_split))
Is there a more elegant/better way to get to what I need? What I am really after is that within the section that I care about, I want to loop through each of the subsections, which are delineated by "*" and work with those strings.

I would appreciate any help!
Reply
#2
please attach a small test file
Reply
#3
It won't let me upload files on this forum for some reason, so I uploaded it on the website: http://s000.tinyupload.com/download.php?...4983370453
Reply
#4
You need 5 posts then you will be able to upload, so should be able to do so soon, same applies to editing your post.
I used your link this time.

I'll get back as soon as I get a chance to examine code.
Reply
#5
This will get you started. I switched to pathlib (requires python 3.6 or newer) rather than Glob as it is OOP
and then just printer each line without modification.

You can add your parsing back in from here
import os
from pathlib import Path


def read_files():
    # assure that starting path is script path
    os.chdir(os.path.abspath(os.path.dirname(__file__)))

    file_id = 0

    #list of all the text files
    scriptpath = Path('.')
    reportpath = scriptpath / 'reports'
    # print(f"working directory: {reportpath.resolve()}")

    # Get list of text files
    textfiles = [filename for filename in reportpath.iterdir() \
        if filename.is_file() and filename.suffix == '.txt']

    print()

    for filename  in textfiles: 
    # #loop through files, one at a time
    # for file_name in glob.glob(path):
        file_id += 1
        
        with filename.open() as myfile:
            for line in myfile:
                line = line.strip()
                print(f"{line}")
        # section_list = current_file.split(':')
        # for list_section in section_list:
        #     further_split = list_section.split('*')
        #     for x in further_split:
        #         print("An item in list :" + str(further_split))

if __name__ == '__main__':
    read_files()
output:
Output:
ABC: this is a text file. I need to do stuff with it. SECTION 2: this is more text ANOTHER SECTION: blah blah blah. I have lots of information here. SECTION 4: SECTION 5: * A list * Another list. I need this YET ANOTHER SECTION: A bunch. of sentences. exist here. * Another list. I don't need this OTHER FINDINGS: None. FINAL THIS IS NOT IMPORTANT
Reply
#6
Thank you!
Reply
#7
Output:
ABC: this is a text file. I need to do stuff with it. SECTION 2: this is more text ANOTHER SECTION: blah blah blah. I have lots of information here. SECTION 4: SECTION 5: * A list * Another list. I need this SECTION 6: * A bunch. of sentences. exist here.
A example with one way to parse lines in SECTION 5.
flag = 1
with open('sample_file.txt') as f:
    for line in f:
        if line.startswith('SECTION 5:'):
            flag = 0
            #next(f) # Will skip first line in SECTION 5
        if line.startswith('SECTION 6:'):
            flag = 1
        if not flag and not line.startswith('SECTION 5:'):
           print(line.strip())
Output:
* A list * Another list. I need this
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Extracting specific file from an archive tester_V 4 423 Jan-29-2024, 06:41 PM
Last Post: tester_V
  Extracting Specific Lines from text file based on content. jokerfmj 8 2,856 Mar-28-2022, 03:38 PM
Last Post: snippsat
  Extracting information from .xlsx files hobbyist 0 1,576 Jan-06-2021, 07:20 PM
Last Post: hobbyist
  getting information from a text file Nickd12 8 3,156 Nov-17-2020, 01:29 AM
Last Post: bowlofred
  Extracting data based on specific patterns in a text file K11 1 2,177 Aug-28-2020, 09:00 AM
Last Post: Gribouillis
  Text file information retreval cel 4 2,465 Jun-04-2020, 02:21 AM
Last Post: cel
  Extracting CSV from Excel file 3_14ThonUser 5 2,690 May-11-2020, 05:37 PM
Last Post: buran
  Extracting hole-coordinates from a STEP-file Saksa 1 2,725 Jan-20-2020, 04:24 PM
Last Post: Larz60+
  Errors to get information of multiple files into a single file csv Clnprof 3 2,552 Aug-30-2019, 04:59 PM
Last Post: ThomasL
  Validating information from .csv file before executemany mzmingle 7 4,354 Apr-15-2019, 01:40 PM
Last Post: mzmingle

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020