ATTENTION!!! THIS IS A COMPLETELY NEW AND DIFFERENT POST FROM THE PREVIOUS ONE HERE, BECAUSE I INADVERTENTLY ERASED THE PREVIOUS ONE WHILE TRYING TO ADD SOME COMMENTS!!!
I'm going to try to write again more or less what I had posted before with some added modifications ...
I'm going to explain, step by step, once again, how I installed
BeautifulSoup. It is better to use
pip:
1) I went to:
https://pypi.org/project/beautifulsoup4/
where on top of the page, you can see
beautifulsoup4 4.8.0, and underneath that,
pip install beautifulsoup4. Beside that, you can see
an icon looking like two papers sheets, one on top of the other. If you click on this icon, you copy the command
pip install beautifulsoup4 to your clipboard. (I prefer to do it this way, just in case I mistype something).
2) Now, on
Windows cmd prompt, paste
pip install beautifulsoup4 (or type directly if you prefer,
pip install beautifulsoup4) and press the "
ENTER" (or "
RETURN") key and wait till the installation of
beautifulsoup4 is completed:
3) Now,
as the previous program I had here, didn't seem to look for the desired word, I looked for more information, and on this site:
https://stackoverflow.com/questions/3339...n/33397525
I found
an example program written in some Python 2. I adapted it to
Python 3, and also made some changes to small bits that produced errors before.
Now, no warnings or errors are produced. Forget about the previous program I had here, and use this other one. Copy and save this modified program, from the original one on that site:
# beautifulSoup_test_03.py
#
from bs4 import BeautifulSoup
import re
data = '''
<html>
<body>
<div>today is a sunny day</div>
<div>I love when it's sunny outside</div>
Call me sunny
<div>sunny is a cool word sunny</div>
</body>
</html>
'''
searched_word = 'sunny'
soup = BeautifulSoup(data, 'html.parser')
results = soup.body.find_all(string=re.compile('.*{0}.*'.format(searched_word)), recursive=True)
print('Found the word "{0}" {1} times\n'.format(searched_word, len(results)))
for content in results:
words = content.split()
for index, word in enumerate(words):
# If the content contains the search word twice or more this will fire for each occurrence
if word == searched_word:
print('Whole content: "{0}"'.format(content))
before = None
after = None
# Check if it's a first word
if index != 0:
before = words[index-1]
# Check if it's a last word
if index != len(words)-1:
after = words[index+1]
print('\tWord before: "{0}", word after: "{1}"'.format(before, after))
4) Now, when I execute it, it produces the following output:
Output:
Found the word "sunny" 4 times
Whole content: "today is a sunny day"
Word before: "a", word after: "day"
Whole content: "I love when it's sunny outside"
Word before: "it's", word after: "outside"
Whole content: "
Call me sunny
"
Word before: "me", word after: "None"
Whole content: "sunny is a cool word sunny"
Word before: "None", word after: "is"
Whole content: "sunny is a cool word sunny"
Word before: "word", word after: "None"
>>>
I hope it helps.
All the best,