Beautiful Soup (suddenly) doesn't get full webpage html

***snippsat*** · (This post was last modified: Jul-11-2020, 12:28 PM by snippsat.)

(Jul-11-2020, 11:28 AM)j.crater Wrote: Your code returns all the HTML contents of the page, if I print the soup. Is the main factor here 2 seconds sleep, which allows the Javascript to execute completely before parsing the HTML? However,

The 2-seconds sleep has nothing to about this just there for safety(to make sure all page has loaded) can comment it out and it still work.
It's Selenium that's that's important part here.
In link Web-scraping part-2.

snippsat Wrote:JavaScript is used all over the web because it's unique position to run in Browser(client side).
This can make it more difficult to do parsing,
because Requests/bs4/lxml can not get all that's is executed/rendered bye JavaScript.

There are way to overcome this,gone use Selenium

When you just parse with Requests and BS,you will not get the executed JavaScript but only the raw content.
Then you will not at all find as example this tag soup.find('a', id="video-title")
Because getting raw JavaScript back.
It will be in a script tag,here a clean up version bye deleting a lot get where title is.

<script>
    window["ytInitialData"] .... = "title":{"runs":[{"text":"Learn Python - Full Course for Beginners [Tutorial]"}],"accessibility":{"accessibilityData":{"label":"Learn Python "viewCountText":{"simpleText":"Sett 16 184 859 ganger"},.....
    window["ytInitialPlayerResponse"] = null;
    if (window.ytcsi) {window.ytcsi.tick("pdr", null, '');}
</script>

To parse this raw JavaScript is almost impossible that's why use Selenium to get the executed JavaScript back.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Selenium suddenly fails to find element	Pavel_47	3	6,338	Sep-04-2022, 11:06 AM Last Post: Pavel_47
	Python Obstacles \| Kung-Fu \| Full File HTML Document Scrape and Store it in MariaDB	BrandonKastning	5	2,924	Dec-29-2021, 02:26 AM Last Post: BrandonKastning
	Beautiful Soup - access a rating value in a class	KatMac	1	3,479	Apr-16-2021, 01:27 PM Last Post: snippsat
	HTML multi select HTML listbox with Flask/Python	rfeyer	0	4,653	Mar-14-2021, 12:23 PM Last Post: rfeyer
	Beginner web scraping/Beautiful Soup help	7ken8	2	2,627	Jan-28-2021, 04:26 PM Last Post: 7ken8
	Help: Beautiful Soup - Parsing HTML table	ironfelix717	2	2,703	Oct-01-2020, 02:19 PM Last Post: snippsat
	Requests-HTML vs Beautiful Soup - How to Choose?	robin73	0	3,833	Jun-23-2020, 02:53 PM Last Post: robin73
	Python3 + BeautifulSoup4 + lxml (HTML -> CSV) - How to loop to next HTML/new CSV Row	BrandonKastning	0	2,379	Mar-22-2020, 06:10 AM Last Post: BrandonKastning
	looking for direction - scrappy, crawler, beautiful soup	Sly_Corn	2	2,469	Mar-17-2020, 03:17 PM Last Post: Sly_Corn
	Beautiful soup truncates results	jonesjoz	4	3,897	Mar-09-2020, 06:04 PM Last Post: jonesjoz

Beautiful Soup (suddenly) doesn't get full webpage html

User Panel Messages

Announcements