Python Forum
Identifying keywords in text
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Identifying keywords in text
#1
Hi,

I'm a teacher and I'd like to create a program with my students which analyses a text file to find two specific words and then outputs all the text between these words.

Could anyone point me in the direction of any articles or guidance that might help us achieve this please?

Or any advice would be much appreciated, thanks.

Jon
Reply
#2
re is one way that comes to mind.
I welcome all feedback.
The only dumb question, is one that doesn't get asked.
My Github
How to post code using bbtags


Reply
#3
Haha funny request!

Menator is right, re is probably best, but I find re hard. Need to study more.

This is just an example using only simple tools.

# I suppose the word order matters
firstword = 'peck'
secondword = 'peppers'
mystring = 'Peter Piper peppers picked a peck of pickled 某个东西 peppers. Where\'s peppers the peck of pickled 四川 peppers Peter Piper picked?'

# get split the string on firstword, gives you a list
mylist = mystring.split(firstword)
# after splitting on firstword, the phrases we are interested in begin with a space
# now find phrases which begin with a space and contain the second secondword
# split on secondword and save the first element of the new list
myphrases = []
for phrase in mylist:
    if phrase[0] == ' ':
        newlist = phrase.split(secondword)
        # get rid of leading and trailing whitespace
        result = newlist[0].strip()
        myphrases.append(result)

print('found text between', '"' + firstword + '"','and ', '"' + secondword + '"', len(myphrases), 'times')
for p in myphrases:
    print(p)
snippsat likes this post
Reply
#4
Thank you for your help.
Reply
#5
Look here for re help.

Probably best to use re, just, I find it hard to grasp!
Reply
#6
Example of using re

import re

mystring = 'Peter Piper peppers picked a peck of pickled peppers. Where\'s peppers the peck of dozens of pickled peppers Peter Piper picked?'

findit = re.search(r'peck(.*?)peppers', mystring).group(1)

print(f'One occurance  -> {findit}')

findit = re.findall(r'(?:peck)(.*?)(?:peppers)', mystring)

print(f'Multiple occurances - > {findit}')
Output:
One occurance -> of pickled Multiple occurances - > [' of pickled ', ' of dozens of pickled ']
snippsat likes this post
I welcome all feedback.
The only dumb question, is one that doesn't get asked.
My Github
How to post code using bbtags


Reply
#7
Pedroski55 code work fine.
A advice is to look into f-string🧐 as your line 19 is not nice.
It's also easy to make mistake with that approach,as you do with on whitespace to much.
print('found text between', '"' + firstword + '"','and ', '"' + secondword + '"', len(myphrases), 'times')
# With f-string
print(f'found text between "{firstword}" and "{secondword}" {len(myphrases)} times')
Output:
found text between "brown" and "lazy" 1 times found text between "brown" and "lazy" 1 times
The regex work fine menator01.
Could add to regex to also remove whitespace,but just strip() will fix it easier.
>>> import re
>>> 
>>> text = 'The quick brown fox jumps over the lazy dog'
>>> result = re.search(r'quick(.*?)jumps', text)
>>> result.group(1)
' brown fox '
>>> # Fix whitespace
>>> result.group(1).strip()
'brown fox'
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Identifying if the program I have is python and then decompiling jpnyc 7 2,346 Jun-02-2022, 10:16 PM
Last Post: jpnyc
  trying to put a a filter on identifying a straight CompleteNewb 1 1,664 Dec-01-2021, 11:11 PM
Last Post: CompleteNewb
  Customize Python Keywords for IDLE alanvers 0 2,005 Apr-03-2021, 10:56 AM
Last Post: alanvers
  could someone explain keywords, marks, and function DrKatherineThuyMiller 14 4,551 Jul-23-2020, 07:14 PM
Last Post: DrKatherineThuyMiller
  Identifying string success flag graham23s 4 3,121 Aug-14-2019, 09:27 PM
Last Post: graham23s
  identifying a dictionary with an attribute? Skaperen 7 3,796 Oct-04-2018, 05:48 AM
Last Post: Skaperen
  Can I search from Python, automatically and randomly generated keywords in Google? xX_Prophit_Xx 0 2,315 Sep-07-2018, 04:43 PM
Last Post: xX_Prophit_Xx
  Identifying only specific words in a string GilbyScarChest 2 2,707 Aug-08-2018, 03:22 AM
Last Post: GilbyScarChest
  Identifying the value of all adjacent elements in an array JoeB 2 8,645 Nov-23-2017, 05:10 PM
Last Post: JoeB
  Identifying object types microphone_head 5 4,470 Oct-01-2017, 02:04 PM
Last Post: buran

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020