Python Forum

Full Version: regex pattern to extract relevant sentences
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi All,

I am looking to extract sentences which contains combination of given words and separated by "n" number of words between them. When i run the below code the output gives only two sentences, however, there are four sentences which satisfy the given regex pattern.

What i observed is when the regex pattern is matched two or more times in a given sentence, it is not given in the output. I am not sure how to fix this issue. Can anyone please let me know how to get the "Desired Output" which is shown below.

import re
txt = "The present disclosure is directed to an electrosurgical pencil with integrated ligasure tweezers. In accordance with one aspect of the present disclosure the electrosurgical pencils includes an elongated housing having an open distal end and including an actuator operatively associated therewith. First and second jaws members extend distally through the open distal end of the elongated housing and are transitionable between a closed position and an open position upon actuation of an actuator. One or both of the jaw members are configured to treat tissue with monopolar energy and both jaw members are configured to treat tissue with bipolar energy. One or more switches are operably coupled to a controller disposed in the housing and configured to activate the first and second jaw members to treat tissue with monopolar and bipolar energy. FIGS. 17-22 show an inner shaft 2310 that includes one or more levers 2316 attached at a fulcrum point 2318 for assisting in opening jaw members 2330, 2340. Lever 2316 may extend through housing 2200 and may be pivotably mounted to housing 2200 at a pivot point 2315 such that when a physician actuates lever 2316 at an end 2319, lever 2316 pivots about pivot point 2315 and applies force to inner shaft 2310 at fulcrum point 2318 for opening and closing jaw members 2330 and 2340. Lever 2316 allows a physician to generate additional force at fulcrum point 2318 for opening jaw members 2330, 2340."
sentences = txt.strip().split('.')
for n, sentence in enumerate(sentences):
    sentence = sentence.strip()
    if len(sentence):
        reg_compiler = re.compile(r'\b(jaw[a-z]+|electrosurgical|pencil[a-z]+)(?:\W+\w+){1,15}?\W+(monopolar|bipolar|open[a-z]+|hous[a-z]+)\b')
        rel_sent = reg_compiler.search(sentence)
        if rel_sent:
           print(f"\n{sentence}")
Code Output

Output:
In accordance with one aspect of the present disclosure the electrosurgical pencils includes an elongated housing having an open distal end and including an actuator operatively associated therewith First and second jaws members extend distally through the open distal end of the elongated housing and are transitionable between a closed position and an open position upon actuation of an actuator
Desired Output which i want:

Output:
In accordance with one aspect of the present disclosure the electrosurgical pencils includes an elongated housing having an open distal end and including an actuator operatively associated therewith First and second jaws members extend distally through the open distal end of the elongated housing and are transitionable between a closed position and an open position upon actuation of an actuator One or both of the jaw members are configured to treat tissue with monopolar energy and both jaw members are configured to treat tissue with bipolar energy One or more switches are operably coupled to a controller disposed in the housing and configured to activate the first and second jaw members to treat tissue with monopolar and bipolar energy
I replaced jaw[a-z]+ with jaw[a-z]* and it seems to work better.
(Jul-05-2021, 08:00 PM)Gribouillis Wrote: [ -> ]I replaced jaw[a-z]+ with jaw[a-z]* and it seems to work better.

Thank you so much @Gribouillis