![]() |
Extracting data from multiple txt files - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: Extracting data from multiple txt files (/thread-20646.html) |
Extracting data from multiple txt files - Emmanouil - Aug-23-2019 Hello, is there any way to extract certain lines from multiple txt files that contain dialogues? I need to extract lines based on person's name, i.e. Rich’s lines Rich: I'm going to attend a concert on Saturday. Do you have any special plans? Peter: No, I'm going to relax. What did you do last weekend? Rich: Last weekend, I went to visit my friends in San Francisco. What did you do? Peter: I played soccer with some friends. RE: Extracting data from multiple txt files - ichabod801 - Aug-23-2019 Yes. You probably would want a list of the roles. Then you could split the string at the colon and check if it's the right role. You would need a tracking variable to note that you are on the right role to get the following lines (like the second one in your example). Note that we're not going to write your code for you, but we'll be happy to help you fix your code if you show it to us and clearly tell us what's wrong. But what I said above should get you started. RE: Extracting data from multiple txt files - Emmanouil - Aug-23-2019 ok, thank you RE: Extracting data from multiple txt files - DeaD_EyE - Aug-24-2019 There is a mistake in the code. The following lines without name, are not added. You can use regex to detect, if there is one word in front of the colon. Then it's a name. What is with names, which consist of two words? If it's the case, you have to change the regex to: ^([\w ]+): (.*) import re import io def parse(text): last_name = '' result = [] with_name = re.compile(r'^(\w+): (.*)') for line in text: match = with_name.search(line) if match: last_name = match.group(1) msg = match.group(2) else: msg += ' ' + line continue if last_name: result.append((last_name, msg)) return result def parse_files(files): result = [] for file in files: with open(file) as fd: result += parse(fd) return result def printer(parsed_data): for name, message in parsed_data: print(name, '->', message) text = """Rich: I'm going to attend a concert on Saturday. Do you have any special plans? Peter: No, I'm going to relax. What did you do last weekend? Rich: Last weekend, I went to visit my friends in San Francisco. What did you do? Peter: I played soccer with some friends.""" out = parse(io.StringIO(text)) # using the string as a file printer(out)
I'll look later for it. Maybe you find a solution. RE: Extracting data from multiple txt files - DeaD_EyE - Aug-24-2019 I came back, no one fixed my broken code :-/ RE: Extracting data from multiple txt files - Emmanouil - Aug-24-2019 Thank you for the answer, actually it is a bit more complicated than I expected because there are more than 200 persons in the files and there is first and last name and more than one lines for everyone. I need to extract lines to separate file for every person of what he/she says and give to each file the corresponding name. For example: One of multiple input txt files 1.Rich Smith: I'm going to attend a concert on Saturday. Do you have any special plans? 2.Peter Aderson : No, I'm going to relax. What did you do last weekend? 3.Rich Smith: Last weekend, I went to visit my friends in San Francisco. What did you do? 4.Peter Aderson: I played soccer with some friends. 5.Mary Sarah: Hello 6.John Daisa: hi I need to have for output: Rich.txt Rich Smith: I'm going to attend a concert on Saturday Rich Smith: Last weekend, I went to visit my friends in San Francisco. What did you do? Peter.txt Peter Aderson: No, I'm going to relax. What did you do last weekend? Peter Aderson: I played soccer with some friends. Mary.txt Mary Sarah: Hello John.txt John Daisa: hi I am thinking to have another file with the list of names and somehow to check it with the dialogue files but still it is complicated. I am beginner in python, and this looks impossible to me. RE: Extracting data from multiple txt files - perfringo - Aug-25-2019 Are there newlines in dialogue text? RE: Extracting data from multiple txt files - Emmanouil - Aug-25-2019 Yes there are two more, but I think it doesn't make a difference because in the real file there are more than 200. This is why I think the solution is a bit complicated, how do I check all of them every time in order to extract the right text for everyone? |