Python Forum

Full Version: question: finding multiple strings within string
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I am trying to figure out how to do the following. I have a string "djfk83dhfdog83748djfk83dhfcat83748djfk83dhfmonkey83748djfk83dhfhuman83748" and I want to be able to pull out the words: dog, cat, monkey and human into three separate strings. Each of these words are surrounded on either side by the characters "djfk83dhf" and "83748". How would I do this with Python 3?
You could use a regex.
>>> s = "djfk83dhfdog83748djfk83dhfcat83748djfk83dhfmonkey83748djfk83dhfhuman83748"
>>> import re
>>> re.findall(r"djfk83dhf(.+?)83748", s)
['dog', 'cat', 'monkey', 'human']
I have another question. Lets say you have a slightly different setup. For example, lets say the first set of characters is instead <a href="/quote/ and the last character is ? such that the string would now read <a href="/quote/dog?<a href="/quote/cat?<a href="/quote/monkey?<a href="/quote/human?. How would you get the words: dog, cat, monkey, and human out of this? I tried with the code provided and it doesn't seem to work as I would have expected. The code I tested is:

import re
g2='<a href="/quote/dog?<a href="/quote/cat?<a href="/quote/monkey?<a href="/quote/human?'
words=re.findall("<a href=\"/quote/(.+?)?",g2)
And it returns words=['d', 'c', 'm', 'h']. Why isn't this working in this case?
>>> words=re.findall(r'<a href="/quote/([^?]+)\?',g2)
>>> words
['dog', 'cat', 'monkey', 'human']
>>>
Also if have real html and not a mess like this with some href trow in,then should use a parser.
from bs4 import BeautifulSoup

html = '''\
<div class='animals'>
  <a href="https://en.wikipedia.org/wiki/Dog">dog</a>
  <a href="https://en.wikipedia.org/wiki/Cat">cat</a>
</div>'''

soup = BeautifulSoup(html, 'lxml')
print([tag.text for tag in soup.find_all('a')])
Output:
['dog', 'cat']