Python Forum
Extract specific sentences from text file
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Extract specific sentences from text file
#1
Hi all,
I am a beginner in python. How to extract sentences, which have specific set of words (or combination of words), from a text file. For example, the text file contains the following text.

"Extracorporeal therapies have been used to remove toxins from the body for over 50 years and have a greater role than ever before in the treatment of poisonings. Improvements in technology have resulted in increased efficacy of removing drugs and other toxins with hemodialysis, and newer extracorporeal therapy modalities have expanded the role of extracorporeal supportive care of poisoned patients. However, despite these changes, for at least the past three decades the most frequently dialyzed poisons remain salicylates, toxic alcohols, and lithium; in addition, the extracorporeal treatment of choice for therapeutic removal of nearly all poisonings remains intermittent hemodialysis. For the clinician, consideration of extracorporeal therapy in the treatment of a poisoning depends upon the characteristics of toxins amenable to extracorporeal removal (e.g., molecular mass, volume of distribution, protein binding), choice of extracorporeal treatment modality for a given poisoning, and when the benefit of the procedure justifies additive risk. Given the relative rarity of poisonings treated with extracorporeal therapies, the level of evidence for extracorporeal treatment of poisoning is not robust; however, extracorporeal treatment of a number of individual toxins have been systematically reviewed within the current decade by the Extracorporeal Treatment in Poisoning workgroup, which has published treatment recommendations with an improved evidence base. Some of these recommendations are discussed, as well as management of a small number of relevant poisonings where extracorporeal therapy use may be considered."

Task: I want to extract sentences which have these three words in it: extracorporeal, therapy/therapies, treatment

Output: Below are the three sentences which contains above three words:

Extracorporeal therapies have been used to remove toxins from the body for over 50 years and have a greater role than ever before in the treatment of poisonings.

For the clinician, consideration of extracorporeal therapy in the treatment of a poisoning depends upon the characteristics of toxins amenable to extracorporeal removal (e.g., molecular mass, volume of distribution, protein binding), choice of extracorporeal treatment modality for a given poisoning, and when the benefit of the procedure justifies additive risk

Given the relative rarity of poisonings treated with extracorporeal therapies, the level of evidence for extracorporeal treatment of poisoning is not robust; however, extracorporeal treatment of a number of individual toxins have been systematically reviewed within the current decade by the Extracorporeal Treatment in Poisoning workgroup, which has published treatment recommendations with an improved evidence base.
Reply
#2
What have you tried? show code.
Reply
#3
(May-31-2021, 03:29 PM)Larz60+ Wrote: What have you tried? show code.

Hi Larz60,

I have absolutely no idea how to do this or which library to use. I only know how to do this for single word. I don't know for combination of words. I would greatly appreciate if you could give some idea then i can try and come back with my code which i have tried.

Appreciate your help.
Reply
#4
start by spliting the file into sentences.

for example:
mydoc = "On the other hand, we denounce with righteous indignation and " \
    "dislike men who are so beguiled and demoralized by the charms of " \
    "pleasure of the moment, so blinded by desire, that they cannot foresee " \
    "the pain and trouble that are bound to ensue; and equal blame belongs to " \
    "those who fail in their duty through weakness of will, which is the same " \
    "as saying through shrinking from toil and pain. These cases are " \
    "perfectly simple and easy to distinguish. In a free hour, when our " \
    "power of choice is untrammelled and when nothing prevents our being " \
    "able to do what we like best, every pleasure is to be welcomed and " \
    "every pain avoided. But in certain circumstances and owing to the claims " \
    "of duty or the obligations of business it will frequently occur that " \
    "pleasures have to be repudiated and annoyances accepted. The wise man " \
    "therefore always holds in these matters to this principle of selection: " \
    "he rejects pleasures to secure other greater pleasures, or else he " \
    "endures pains to avoid worse pains."

sentences = mydoc.strip().split('.')

for n, sentence in enumerate(sentences):
    sentence = sentence.strip()
    if len(sentence):
        print(f"\nsentence {n}: {sentence}")
Next, search for all sentences that contain all three words and that's it

This produces:
Output:
sentence 0: On the other hand, we denounce with righteous indignation and dislike men who are so beguiled and demoralized by the charms of pleasure of the moment, so blinded by desire, that they cannot foresee the pain and trouble that are bound to ensue; and equal blame belongs to those who fail in their duty through weakness of will, which is the same as saying through shrinking from toil and pain sentence 1: These cases are perfectly simple and easy to distinguish sentence 2: In a free hour, when our power of choice is untrammelled and when nothing prevents our being able to do what we like best, every pleasure is to be welcomed and every pain avoided sentence 3: But in certain circumstances and owing to the claims of duty or the obligations of business it will frequently occur that pleasures have to be repudiated and annoyances accepted sentence 4: The wise man therefore always holds in these matters to this principle of selection: he rejects pleasures to secure other greater pleasures, or else he endures pains to avoid worse pains
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Extracting specific file from an archive tester_V 4 499 Jan-29-2024, 06:41 PM
Last Post: tester_V
  Color a table cell based on specific text Creepy 11 1,957 Jul-27-2023, 02:48 PM
Last Post: deanhystad
  Extract file only (without a directory it is in) from ZIPIP tester_V 1 978 Jan-23-2023, 04:56 AM
Last Post: deanhystad
Thumbs Up Need to compare the Excel file name with a directory text file. veeran1991 1 1,111 Dec-15-2022, 04:32 PM
Last Post: Larz60+
  Reading Specific Rows In a CSV File finndude 3 973 Dec-13-2022, 03:19 PM
Last Post: finndude
  seaching for a library: nondeterministic letter manipulation in sentences Myron 2 918 Dec-05-2022, 03:53 PM
Last Post: Myron
  extract only text strip byte array Pir8Radio 7 2,921 Nov-29-2022, 10:24 PM
Last Post: Pir8Radio
  Extract only certain text which are needed Calli 26 5,838 Oct-10-2022, 03:58 PM
Last Post: deanhystad
  Using locationtagger to extract locations found in a specific country/region lord_of_cinder 1 1,267 Oct-04-2022, 12:46 AM
Last Post: Larz60+
  How to extract specific data from .SRC (note pad file) Shinny_Shin 2 1,262 Jul-27-2022, 12:31 PM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020