Python Forum

Full Version: Strange BS4 Scraping Issue
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
from bs4 import BeautifulSoup
import re
import requests

url = "http://www.foxnews.com/politics/2018/01/14/trump-says-newspaper-misquoted-him-on-kim-jung-un-points-to-audio-tape.html"
scrape = requests.get(url, headers={"user-agent": "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36"})
html = scrape.content
soup = BeautifulSoup(html, 'html.parser')

z = soup.find_all(text=re.compile(r"livefyre_comment_stream", re.IGNORECASE))

print(z)
I'm having this strange problem where BS4 isn't showing any results..

If you open the URL manually and check the source code you can find "livefyre_comment_stream"

Can someone explain where I'm going wrong?

Solved.

id=re.compile
What is "livefyre_comment_stream"? A class, an id?

The text attribute is to search in the text of the already rendered webpage. What you see in the browser.

If "livefyre_comment_stream" is a class for example:
z = soup.sellect(".livefyre_comment_stream")
z is not descriptive variable name.