Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Strange BS4 Scraping Issue
#1
from bs4 import BeautifulSoup
import re
import requests

url = "http://www.foxnews.com/politics/2018/01/14/trump-says-newspaper-misquoted-him-on-kim-jung-un-points-to-audio-tape.html"
scrape = requests.get(url, headers={"user-agent": "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36"})
html = scrape.content
soup = BeautifulSoup(html, 'html.parser')

z = soup.find_all(text=re.compile(r"livefyre_comment_stream", re.IGNORECASE))

print(z)
I'm having this strange problem where BS4 isn't showing any results..

If you open the URL manually and check the source code you can find "livefyre_comment_stream"

Can someone explain where I'm going wrong?

Solved.

id=re.compile
Reply
#2
What is "livefyre_comment_stream"? A class, an id?

The text attribute is to search in the text of the already rendered webpage. What you see in the browser.

If "livefyre_comment_stream" is a class for example:
z = soup.sellect(".livefyre_comment_stream")
z is not descriptive variable name.
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."
https://freedns.afraid.org
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Size scraping issue scrapemasta 0 327 Feb-09-2024, 10:26 AM
Last Post: scrapemasta
  Strange phenomena with Amazon_dot_com scraping Pavel_47 9 3,433 Jan-22-2021, 10:37 AM
Last Post: pjkaka
Thumbs Up Issue facing while scraping the data from different websites in single script. Balamani 1 2,077 Oct-20-2020, 09:56 AM
Last Post: Larz60+
  POST request with form data issue web scraping hoff1022 1 2,649 Aug-14-2020, 10:25 AM
Last Post: kashcode
  Strange BS4 Problem While Scraping RSS Feeds digitalmatic7 3 4,201 Feb-15-2018, 03:18 AM
Last Post: Larz60+
  an issue with bs4 scraping komarek 5 4,577 Oct-11-2017, 06:31 PM
Last Post: buran

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020