Extracting Text - Printable Version

Extracting Text - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: Extracting Text (/thread-35429.html)

Extracting Text - standenman - Nov-01-2021

I am trying to create some functionality that will harvest text date from a text file. The text file is basically a series of questions and answers. So I have a lot of this:

Does the patient abuse alcohol?
YES

Does the patient see the doctor as directed?
YES

Does the patient have suicidal ideation?
NO

These questions are not numbers, and some have a ":" after the question, others not. Ideally I would like to be able to "flag" answers that are significant. So, an answer of "NO" for alcohol abuse is just not significant to my analysis, but a "NO" to the second question (failing to see a doctor) is significant. In other words, there is a default answer to each question that means basically, this is not significant. Any ideas? Thanks.

RE: Extracting Text - Gribouillis - Nov-01-2021

You showed us the input, what should the output look like?

RE: Extracting Text - standenman - Nov-01-2021

Perhaps part of my problems! I want to use this data to prepare a word document, a report. So as I said, some of the answers are very signi
ficant, others not so much. So I guess if the individual answered "YES" to suicidal ideation, I would like that noted in the report.

(Nov-01-2021, 08:59 PM)Gribouillis Wrote: You showed us the input, what should the output look like?

RE: Extracting Text - Gribouillis - Nov-01-2021

Why not use

filtered = [(question, answer) for question, answer in series if is_significant(question, answer)]

RE: Extracting Text - standenman - Nov-01-2021

Sorry, I am not following how that would play out specifically.

RE: Extracting Text - Gribouillis - Nov-01-2021

I mean

Write a function that reads the file and produces a sequence of pairs (question, answer)
Write a function is_significant(question, answer) that returns True or False depending on the answer being or not significant for that question.
Use the above list comprehension to remove the superfluous answers and keep only the significant ones.