question: finding multiple strings within string - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: question: finding multiple strings within string (/thread-26852.html) |
question: finding multiple strings within string - djf123 - May-15-2020 I am trying to figure out how to do the following. I have a string "djfk83dhfdog83748djfk83dhfcat83748djfk83dhfmonkey83748djfk83dhfhuman83748" and I want to be able to pull out the words: dog, cat, monkey and human into three separate strings. Each of these words are surrounded on either side by the characters "djfk83dhf" and "83748". How would I do this with Python 3? RE: question: finding multiple strings within string - bowlofred - May-15-2020 You could use a regex. >>> s = "djfk83dhfdog83748djfk83dhfcat83748djfk83dhfmonkey83748djfk83dhfhuman83748" >>> import re >>> re.findall(r"djfk83dhf(.+?)83748", s) ['dog', 'cat', 'monkey', 'human'] RE: question: finding multiple strings within string - djf123 - May-16-2020 I have another question. Lets say you have a slightly different setup. For example, lets say the first set of characters is instead <a href="/quote/ and the last character is ? such that the string would now read <a href="/quote/dog?<a href="/quote/cat?<a href="/quote/monkey?<a href="/quote/human?. How would you get the words: dog, cat, monkey, and human out of this? I tried with the code provided and it doesn't seem to work as I would have expected. The code I tested is: import re g2='<a href="/quote/dog?<a href="/quote/cat?<a href="/quote/monkey?<a href="/quote/human?' words=re.findall("<a href=\"/quote/(.+?)?",g2)And it returns words=['d', 'c', 'm', 'h']. Why isn't this working in this case? RE: question: finding multiple strings within string - anbu23 - May-16-2020 >>> words=re.findall(r'<a href="/quote/([^?]+)\?',g2) >>> words ['dog', 'cat', 'monkey', 'human'] >>> RE: question: finding multiple strings within string - snippsat - May-16-2020 Also if have real html and not a mess like this with some href trow in,then should use a parser. from bs4 import BeautifulSoup html = '''\ <div class='animals'> <a href="https://en.wikipedia.org/wiki/Dog">dog</a> <a href="https://en.wikipedia.org/wiki/Cat">cat</a> </div>''' soup = BeautifulSoup(html, 'lxml') print([tag.text for tag in soup.find_all('a')])
|