Python Forum
Beautiful Soup (suddenly) doesn't get full webpage html
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Beautiful Soup (suddenly) doesn't get full webpage html
#5
(Jul-11-2020, 11:28 AM)j.crater Wrote: Your code returns all the HTML contents of the page, if I print the soup. Is the main factor here 2 seconds sleep, which allows the Javascript to execute completely before parsing the HTML? However,
The 2-seconds sleep has nothing to about this just there for safety(to make sure all page has loaded) can comment it out and it still work.
It's Selenium that's that's important part here.
In link Web-scraping part-2.
snippsat Wrote:JavaScript is used all over the web because it's unique position to run in Browser(client side).
This can make it more difficult to do parsing,
because Requests/bs4/lxml can not get all that's is executed/rendered bye JavaScript.

There are way to overcome this,gone use Selenium

When you just parse with Requests and BS,you will not get the executed JavaScript but only the raw content.
Then you will not at all find as example this tag soup.find('a', id="video-title")
Because getting raw JavaScript back.
It will be in a script tag,here a clean up version bye deleting a lot get where title is.
<script>
    window["ytInitialData"] .... = "title":{"runs":[{"text":"Learn Python - Full Course for Beginners [Tutorial]"}],"accessibility":{"accessibilityData":{"label":"Learn Python "viewCountText":{"simpleText":"Sett 16 184 859 ganger"},.....
    window["ytInitialPlayerResponse"] = null;
    if (window.ytcsi) {window.ytcsi.tick("pdr", null, '');}
</script>
To parse this raw JavaScript is almost impossible that's why use Selenium to get the executed JavaScript back.
Reply


Messages In This Thread
RE: Beautiful Soup (suddenly) doesn't get full webpage html - by snippsat - Jul-11-2020, 12:28 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Selenium suddenly fails to find element Pavel_47 3 6,338 Sep-04-2022, 11:06 AM
Last Post: Pavel_47
Lightbulb Python Obstacles | Kung-Fu | Full File HTML Document Scrape and Store it in MariaDB BrandonKastning 5 2,924 Dec-29-2021, 02:26 AM
Last Post: BrandonKastning
  Beautiful Soup - access a rating value in a class KatMac 1 3,479 Apr-16-2021, 01:27 PM
Last Post: snippsat
  HTML multi select HTML listbox with Flask/Python rfeyer 0 4,653 Mar-14-2021, 12:23 PM
Last Post: rfeyer
  *Beginner* web scraping/Beautiful Soup help 7ken8 2 2,627 Jan-28-2021, 04:26 PM
Last Post: 7ken8
  Help: Beautiful Soup - Parsing HTML table ironfelix717 2 2,703 Oct-01-2020, 02:19 PM
Last Post: snippsat
  Requests-HTML vs Beautiful Soup - How to Choose? robin73 0 3,833 Jun-23-2020, 02:53 PM
Last Post: robin73
  Python3 + BeautifulSoup4 + lxml (HTML -> CSV) - How to loop to next HTML/new CSV Row BrandonKastning 0 2,379 Mar-22-2020, 06:10 AM
Last Post: BrandonKastning
  looking for direction - scrappy, crawler, beautiful soup Sly_Corn 2 2,469 Mar-17-2020, 03:17 PM
Last Post: Sly_Corn
  Beautiful soup truncates results jonesjoz 4 3,897 Mar-09-2020, 06:04 PM
Last Post: jonesjoz

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020