Identifying keywords in text - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: Identifying keywords in text (/thread-36772.html) |
Identifying keywords in text - drchips - Mar-28-2022 Hi, I'm a teacher and I'd like to create a program with my students which analyses a text file to find two specific words and then outputs all the text between these words. Could anyone point me in the direction of any articles or guidance that might help us achieve this please? Or any advice would be much appreciated, thanks. Jon RE: Identifying keywords in text - menator01 - Mar-28-2022 re is one way that comes to mind. RE: Identifying keywords in text - Pedroski55 - Mar-29-2022 Haha funny request! Menator is right, re is probably best, but I find re hard. Need to study more. This is just an example using only simple tools. # I suppose the word order matters firstword = 'peck' secondword = 'peppers' mystring = 'Peter Piper peppers picked a peck of pickled 某个东西 peppers. Where\'s peppers the peck of pickled 四川 peppers Peter Piper picked?' # get split the string on firstword, gives you a list mylist = mystring.split(firstword) # after splitting on firstword, the phrases we are interested in begin with a space # now find phrases which begin with a space and contain the second secondword # split on secondword and save the first element of the new list myphrases = [] for phrase in mylist: if phrase[0] == ' ': newlist = phrase.split(secondword) # get rid of leading and trailing whitespace result = newlist[0].strip() myphrases.append(result) print('found text between', '"' + firstword + '"','and ', '"' + secondword + '"', len(myphrases), 'times') for p in myphrases: print(p) RE: Identifying keywords in text - drchips - Mar-29-2022 Thank you for your help. RE: Identifying keywords in text - Pedroski55 - Mar-29-2022 Look here for re help. Probably best to use re, just, I find it hard to grasp! RE: Identifying keywords in text - menator01 - Mar-29-2022 Example of using re import re mystring = 'Peter Piper peppers picked a peck of pickled peppers. Where\'s peppers the peck of dozens of pickled peppers Peter Piper picked?' findit = re.search(r'peck(.*?)peppers', mystring).group(1) print(f'One occurance -> {findit}') findit = re.findall(r'(?:peck)(.*?)(?:peppers)', mystring) print(f'Multiple occurances - > {findit}')
RE: Identifying keywords in text - snippsat - Mar-29-2022 Pedroski55 code work fine. A advice is to look into f-string🧐 as your line 19 is not nice. It's also easy to make mistake with that approach,as you do with on whitespace to much. print('found text between', '"' + firstword + '"','and ', '"' + secondword + '"', len(myphrases), 'times') # With f-string print(f'found text between "{firstword}" and "{secondword}" {len(myphrases)} times') The regex work fine menator01.Could add to regex to also remove whitespace,but just strip() will fix it easier.>>> import re >>> >>> text = 'The quick brown fox jumps over the lazy dog' >>> result = re.search(r'quick(.*?)jumps', text) >>> result.group(1) ' brown fox ' >>> # Fix whitespace >>> result.group(1).strip() 'brown fox' |