Python Forum

I am trying to figure out how to do the following. I have a string "djfk83dhfdog83748djfk83dhfcat83748djfk83dhfmonkey83748djfk83dhfhuman83748" and I want to be able to pull out the words: dog, cat, monkey and human into three separate strings. Each of these words are surrounded on either side by the characters "djfk83dhf" and "83748". How would I do this with Python 3?

You could use a regex.

>>> s = "djfk83dhfdog83748djfk83dhfcat83748djfk83dhfmonkey83748djfk83dhfhuman83748"
>>> import re
>>> re.findall(r"djfk83dhf(.+?)83748", s)
['dog', 'cat', 'monkey', 'human']

I have another question. Lets say you have a slightly different setup. For example, lets say the first set of characters is instead <a href="/quote/ and the last character is ? such that the string would now read <a href="/quote/dog?<a href="/quote/cat?<a href="/quote/monkey?<a href="/quote/human?. How would you get the words: dog, cat, monkey, and human out of this? I tried with the code provided and it doesn't seem to work as I would have expected. The code I tested is:

import re
g2='<a href="/quote/dog?<a href="/quote/cat?<a href="/quote/monkey?<a href="/quote/human?'
words=re.findall("<a href=\"/quote/(.+?)?",g2)

And it returns words=['d', 'c', 'm', 'h']. Why isn't this working in this case?

>>> words=re.findall(r'<a href="/quote/([^?]+)\?',g2)
>>> words
['dog', 'cat', 'monkey', 'human']
>>>

Also if have real html and not a mess like this with some href trow in,then should use a parser.

from bs4 import BeautifulSoup

html = '''\
<div class='animals'>
  <a href="https://en.wikipedia.org/wiki/Dog">dog</a>
  <a href="https://en.wikipedia.org/wiki/Cat">cat</a>
</div>'''

soup = BeautifulSoup(html, 'lxml')
print([tag.text for tag in soup.find_all('a')])

Output:
['dog', 'cat']

djf123

bowlofred

djf123

anbu23

snippsat