Python Forum
Extracting data from multiple txt files
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Extracting data from multiple txt files
#1
Hello, is there any way to extract certain lines from multiple txt files that
contain dialogues? I need to extract lines based on person's name, i.e. Rich’s lines

Rich: I'm going to attend a concert on Saturday.
Do you have any special plans?
Peter: No, I'm going to relax. What did you do last weekend?
Rich: Last weekend, I went to visit my friends in San Francisco. What did you do?
Peter: I played soccer with some friends.
Reply
#2
Yes. You probably would want a list of the roles. Then you could split the string at the colon and check if it's the right role. You would need a tracking variable to note that you are on the right role to get the following lines (like the second one in your example).

Note that we're not going to write your code for you, but we'll be happy to help you fix your code if you show it to us and clearly tell us what's wrong. But what I said above should get you started.
Craig "Ichabod" O'Brien - xenomind.com
I wish you happiness.
Recommended Tutorials: BBCode, functions, classes, text adventures
Reply
#3
ok, thank you
Reply
#4
There is a mistake in the code. The following lines without name, are not added.

You can use regex to detect, if there is one word in front of the colon.
Then it's a name. What is with names, which consist of two words?
If it's the case, you have to change the regex to: ^([\w ]+): (.*)

import re
import io


def parse(text):
    last_name = ''
    result = []
    with_name = re.compile(r'^(\w+): (.*)')
    for line in text:
        match = with_name.search(line)

        if match:
            last_name = match.group(1)
            msg = match.group(2)
        else:
            msg += ' ' + line
            continue

        if last_name:
            result.append((last_name, msg))
    return result


def parse_files(files):
    result = []
    for file in files:
        with open(file) as fd:
            result += parse(fd)
    return result



def printer(parsed_data):
    for name, message in parsed_data:
        print(name, '->', message)


text = """Rich: I'm going to attend a concert on Saturday.
Do you have any special plans?
Peter: No, I'm going to relax. What did you do last weekend?
Rich: Last weekend, I went to visit my friends in San Francisco. What did you do?
Peter: I played soccer with some friends."""


out = parse(io.StringIO(text)) # using the string as a file
printer(out)
Output:
Rich -> I'm going to attend a concert on Saturday. Peter -> No, I'm going to relax. What did you do last weekend? Rich -> Last weekend, I went to visit my friends in San Francisco. What did you do? Peter -> I played soccer with some friends.

I'll look later for it. Maybe you find a solution.
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#5
I came back, no one fixed my broken code :-/
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#6
Thank you for the answer, actually it is a bit more complicated than I expected because there are more than 200 persons in the files and there is first and last name and more than one lines for everyone. I need to extract lines to separate file for every person of what he/she says and give to each file the corresponding name.
For example:

One of multiple input txt files
1.Rich Smith: I'm going to attend a concert on Saturday.
Do you have any special plans?
2.Peter Aderson : No, I'm going to relax. What did you do last weekend?
3.Rich Smith: Last weekend, I went to visit my friends in San Francisco. What did you do?
4.Peter Aderson: I played soccer with some friends.
5.Mary Sarah: Hello
6.John Daisa: hi

I need to have for output:
Rich.txt
Rich Smith: I'm going to attend a concert on Saturday
Rich Smith: Last weekend, I went to visit my friends in San Francisco. What did you do?

Peter.txt
Peter Aderson: No, I'm going to relax. What did you do last weekend?
Peter Aderson: I played soccer with some friends.

Mary.txt
Mary Sarah: Hello

John.txt
John Daisa: hi

I am thinking to have another file with the list of names and somehow to check it with the dialogue files but still it is complicated.
I am beginner in python, and this looks impossible to me.
Reply
#7
Are there newlines in dialogue text?
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply
#8
Yes there are two more, but I think it doesn't make a difference
because in the real file there are more than 200. This is why I
think the solution is a bit complicated, how do I check all of them
every time in order to extract the right text for everyone?
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Is it possible to extract 1 or 2 bits of data from MS project files? cubangt 8 1,036 Feb-16-2024, 12:02 AM
Last Post: deanhystad
  python convert multiple files to multiple lists MCL169 6 1,521 Nov-25-2023, 05:31 AM
Last Post: Iqratech
  splitting file into multiple files by searching for string AlphaInc 2 877 Jul-01-2023, 10:35 PM
Last Post: Pedroski55
  script to calculate data in csv-files ledgreve 0 1,090 May-19-2023, 07:24 AM
Last Post: ledgreve
  Merging multiple csv files with same X,Y,Z in each Auz_Pete 3 1,145 Feb-21-2023, 04:21 AM
Last Post: Auz_Pete
  unittest generates multiple files for each of my test case, how do I change to 1 file zsousa 0 955 Feb-15-2023, 05:34 PM
Last Post: zsousa
  Find duplicate files in multiple directories Pavel_47 9 3,062 Dec-27-2022, 04:47 PM
Last Post: deanhystad
  Extracting Data into Columns using pdfplumber arvin 17 5,490 Dec-17-2022, 11:59 AM
Last Post: arvin
  SQL Alchemy help to extract sql data into csv files mg24 1 1,761 Sep-30-2022, 04:43 PM
Last Post: Larz60+
  Load multiple Jason data in one Data Frame vijays3 6 1,535 Aug-12-2022, 05:17 PM
Last Post: vijays3

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020