Identifying keywords in text - Printable Version

Identifying keywords in text - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Identifying keywords in text (/thread-36772.html)

Identifying keywords in text - drchips - Mar-28-2022

Hi,

I'm a teacher and I'd like to create a program with my students which analyses a text file to find two specific words and then outputs all the text between these words.

Could anyone point me in the direction of any articles or guidance that might help us achieve this please?

Or any advice would be much appreciated, thanks.

Jon

RE: Identifying keywords in text - menator01 - Mar-28-2022

re is one way that comes to mind.

RE: Identifying keywords in text - Pedroski55 - Mar-29-2022

Haha funny request!

Menator is right, re is probably best, but I find re hard. Need to study more.

This is just an example using only simple tools.

# I suppose the word order matters
firstword = 'peck'
secondword = 'peppers'
mystring = 'Peter Piper peppers picked a peck of pickled 某个东西 peppers. Where\'s peppers the peck of pickled 四川 peppers Peter Piper picked?'

# get split the string on firstword, gives you a list
mylist = mystring.split(firstword)
# after splitting on firstword, the phrases we are interested in begin with a space
# now find phrases which begin with a space and contain the second secondword
# split on secondword and save the first element of the new list
myphrases = []
for phrase in mylist:
    if phrase[0] == ' ':
        newlist = phrase.split(secondword)
        # get rid of leading and trailing whitespace
        result = newlist[0].strip()
        myphrases.append(result)

print('found text between', '"' + firstword + '"','and ', '"' + secondword + '"', len(myphrases), 'times')
for p in myphrases:
    print(p)

RE: Identifying keywords in text - drchips - Mar-29-2022

Thank you for your help.

RE: Identifying keywords in text - Pedroski55 - Mar-29-2022

Look here for re help.

Probably best to use re, just, I find it hard to grasp!

RE: Identifying keywords in text - menator01 - Mar-29-2022

Example of using re

import re

mystring = 'Peter Piper peppers picked a peck of pickled peppers. Where\'s peppers the peck of dozens of pickled peppers Peter Piper picked?'

findit = re.search(r'peck(.*?)peppers', mystring).group(1)

print(f'One occurance  -> {findit}')

findit = re.findall(r'(?:peck)(.*?)(?:peppers)', mystring)

print(f'Multiple occurances - > {findit}')

Output:One occurance  ->  of pickled 
Multiple occurances - > [' of pickled ', ' of dozens of pickled ']

RE: Identifying keywords in text - snippsat - Mar-29-2022

Pedroski55 code work fine.
A advice is to look into f-string🧐 as your line 19 is not nice.
It's also easy to make mistake with that approach,as you do with on whitespace to much.

print('found text between', '"' + firstword + '"','and ', '"' + secondword + '"', len(myphrases), 'times')
# With f-string
print(f'found text between "{firstword}" and "{secondword}" {len(myphrases)} times')

Output:found text between "brown" and  "lazy" 1 times
found text between "brown" and "lazy" 1 times

The regex work fine menator01.
Could add to regex to also remove whitespace,but just strip() will fix it easier.

>>> import re
>>> 
>>> text = 'The quick brown fox jumps over the lazy dog'
>>> result = re.search(r'quick(.*?)jumps', text)
>>> result.group(1)
' brown fox '
>>> # Fix whitespace
>>> result.group(1).strip()
'brown fox'