Python Forum
using regex wildcard Beautiful Soup
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
using regex wildcard Beautiful Soup
#1
I'm trying to match a div tag that may look like:
Output:
<div id="" name="">
or it may look like
Output:
<a name="diddlysquat"></a>
or
Output:
<a name="12358"></a>
Note second one doesn't have id, so can't use that

I don't want to maths the other 1000 divs on the page
I know how to do it with selenium using xpath, but since I'm almost done with this app, and so far have done it all with BeautifulSoup, would
like to do this with soup as well.
I think I'm close, but no cigar yet.

the problem is matching an empty name field (and I want that empty field to be part of the match).
I have tried:

divs = base_anchor.find_all("div", {"name" : re.compile('*')})
# or
divs = base_anchor.find_all('div', name_=lambda s:s.startswith(''))
# or
divs = base_anchor.find_all('div', {'name: ''})
all fail
Reply
#2
I don't know why anyone would want to go through the mess that is BS api, but according to the docs, this should work:
soup.find_all('div', 'name')
EDIT:
Installed BS to test, and it turns out that doesn't work (for whatever reason), but all of these do:
soup.find_all('div', {'name': True})
soup.find_all('div', {'name': re.compile('.*')})
soup.find_all('div', name_=lambda x: True)
Reply
#3
(Sep-27-2018, 07:55 PM)stranac Wrote: soup.find_all('div', {'name': True})
i didnt know that this would work. Wink I always used re.compile method.
Recommended Tutorials:
Reply
#4
I'm lucky enough never to have actually had to use BS, but I'm a black belt at google-fu...
Reply
#5
stranac,

Thanks so much, This is one I didn't try, and it works like a charm!
I owe you a cigar!
Reply
#6
It easy to forget that CSS Selector aslo work.
from bs4 import BeautifulSoup

html = '''\
<div id="" name="" >something</div>
<a name="run">fast</a>
<a foo="12358"></a>
<p id="" name="poo">Super poo</p>
<div id="" bar="">something</div>
<div id="" name="power">something</div>'''

soup = BeautifulSoup(html, 'lxml')
print(soup.select('[name]'))
#print(soup.select('div[name]'))
#print(soup.select('[name^="p"]'))
Reply
#7
Another one I didn't try.
This is good for the saying 'Sometimes you can't see the forest because of the trees!'
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Beautiful Soup - access a rating value in a class KatMac 1 3,420 Apr-16-2021, 01:27 PM
Last Post: snippsat
  *Beginner* web scraping/Beautiful Soup help 7ken8 2 2,561 Jan-28-2021, 04:26 PM
Last Post: 7ken8
  Help: Beautiful Soup - Parsing HTML table ironfelix717 2 2,623 Oct-01-2020, 02:19 PM
Last Post: snippsat
  Beautiful Soup (suddenly) doesn't get full webpage html j.crater 8 16,392 Jul-11-2020, 04:31 PM
Last Post: j.crater
  Requests-HTML vs Beautiful Soup - How to Choose? robin73 0 3,780 Jun-23-2020, 02:53 PM
Last Post: robin73
  looking for direction - scrappy, crawler, beautiful soup Sly_Corn 2 2,403 Mar-17-2020, 03:17 PM
Last Post: Sly_Corn
  Beautiful soup truncates results jonesjoz 4 3,800 Mar-09-2020, 06:04 PM
Last Post: jonesjoz
  Beautiful soup and tags starter_student 11 6,054 Jul-08-2019, 03:41 PM
Last Post: starter_student
  Beautiful Soup find_all() kirito85 2 3,311 Jun-14-2019, 02:17 AM
Last Post: kirito85
  [split] Using beautiful soup to get html attribute value moski 6 6,222 Jun-03-2019, 04:24 PM
Last Post: moski

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020