Python Forum

Full Version: using regex wildcard Beautiful Soup
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I'm trying to match a div tag that may look like:
Output:
<div id="" name="">
or it may look like
Output:
<a name="diddlysquat"></a>
or
Output:
<a name="12358"></a>
Note second one doesn't have id, so can't use that

I don't want to maths the other 1000 divs on the page
I know how to do it with selenium using xpath, but since I'm almost done with this app, and so far have done it all with BeautifulSoup, would
like to do this with soup as well.
I think I'm close, but no cigar yet.

the problem is matching an empty name field (and I want that empty field to be part of the match).
I have tried:

divs = base_anchor.find_all("div", {"name" : re.compile('*')})
# or
divs = base_anchor.find_all('div', name_=lambda s:s.startswith(''))
# or
divs = base_anchor.find_all('div', {'name: ''})
all fail
I don't know why anyone would want to go through the mess that is BS api, but according to the docs, this should work:
soup.find_all('div', 'name')
EDIT:
Installed BS to test, and it turns out that doesn't work (for whatever reason), but all of these do:
soup.find_all('div', {'name': True})
soup.find_all('div', {'name': re.compile('.*')})
soup.find_all('div', name_=lambda x: True)
(Sep-27-2018, 07:55 PM)stranac Wrote: [ -> ]soup.find_all('div', {'name': True})
i didnt know that this would work. Wink I always used re.compile method.
I'm lucky enough never to have actually had to use BS, but I'm a black belt at google-fu...
stranac,

Thanks so much, This is one I didn't try, and it works like a charm!
I owe you a cigar!
It easy to forget that CSS Selector aslo work.
from bs4 import BeautifulSoup

html = '''\
<div id="" name="" >something</div>
<a name="run">fast</a>
<a foo="12358"></a>
<p id="" name="poo">Super poo</p>
<div id="" bar="">something</div>
<div id="" name="power">something</div>'''

soup = BeautifulSoup(html, 'lxml')
print(soup.select('[name]'))
#print(soup.select('div[name]'))
#print(soup.select('[name^="p"]'))
Another one I didn't try.
This is good for the saying 'Sometimes you can't see the forest because of the trees!'