Python Forum
How to parse and group hierarchical list items from an unindented string in Python?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to parse and group hierarchical list items from an unindented string in Python?
#2
Just using Input1 as text, I tried the code below. First I saved Input1 as input.txt

For better formatting maybe you should use the Python module docx, which has more options.

I doubt if all your text files have the same style, so you may need some more re expressions! That depends on what is in your other texts.

import re

# you can do the following with Python but do it like this for now for testing
# copy Input1 text from sample_data_strings.txt and paste it into a text editor
# remove " at both ends, just ' remains
# put [ at the beginning and ] at the end
# now you have a list.
# call the list l
# Paste the list into your Python IDE: l = ['Opening: ... players.']
# save the list l
# savepath = '/home/pedro/myPython/re/text_files/unformatted_text1.txt'
# with open(savepath, 'w') as output: output.writelines(l)
# now you have your text as lines of text, open with readlines()

# path to the unformatted lines of text from above
Input1 = '/home/pedro/myPython/re/text_files/unformatted_text1.txt'
# get the text as a list
with open(Input1, 'r') as infile:
    text_list = infile.readlines()

# have a look
for line in text_list:
    print(line)

# patterns to look for
p = re.compile(r'([A-Za-z]+:)') # get text followed by : like: Opening:
endline = re.compile(r'(:)$') # get end of line :
num = re.compile(r'(\d+\.)') # get a number or numbers followed by .
numsub = re.compile(r'(\w\.)') # get number subpoints \w followed by .
subsub = re.compile(r'(- )') # get the number subpoint marker -

if __name__ == "__main__":
    # match only looks at the beginning of each line
    # search looks through the whole line
    # re.compile(r'(:)$') finds : at the end of a line, if : is there
    for i in range(len(text_list)):
        res = p.match(text_list[i]) 
        end = endline.search(text_list[i])
        number = num.match(text_list[i])
        numsubp = numsub.match(text_list[i])
        subp = sub.match(text_list[i])    
        if res and end:
            text_list[i] = '  ' + text_list[i]
        elif res and not end:
            text_list[i] = '  ' + text_list[i]
        elif number:
            text_list[i] = '\t' + text_list[i]
        elif numsubp:
            text_list[i] = '\t  ' + text_list[i]
        elif subp:
            text_list[i] = '\t\t ' + text_list[i]
        elif i == len(text_list) - 1:
            text_list[i] = '\n\t' + text_list[i]

    savepath = '/home/pedro/myPython/re/text_files/formatted_text1.txt'       
    with open(savepath, 'w') as output:
        output.writelines(text_list)
You can tweak and change the elifs to get what you want. Add more re expressions for different text parts!
Reply


Messages In This Thread
RE: How to parse and group hierarchical list items from an unindented string in Python? - by Pedroski55 - May-23-2024, 05:39 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Collisions for items in a list Idents 3 2,473 Apr-06-2021, 03:48 PM
Last Post: deanhystad
  Removing items in a list cap510 3 2,497 Nov-01-2020, 09:53 PM
Last Post: cap510
  Help with Recursive solution,list items gianniskampanakis 8 3,874 Feb-28-2020, 03:36 PM
Last Post: gianniskampanakis
  Removing items from list slackerman73 8 4,675 Dec-13-2019, 05:39 PM
Last Post: Clunk_Head
  Find 'greater than' items in list johneven 2 4,650 Apr-05-2019, 07:22 AM
Last Post: perfringo
  How to add items within a list Mrocks22 2 2,815 Nov-01-2018, 08:46 PM
Last Post: Mrocks22
  How to keep duplicates and remove all other items in list? student8 1 5,095 Oct-28-2017, 05:52 AM
Last Post: heiner55
  Help printing any items that contains a keyword from a list Liquid_Ocelot 13 93,106 May-06-2017, 10:41 PM
Last Post: ichabod801

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020