Python Forum

Full Version: urllib.request
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I am trying to find a way to count the number of character that appears on the screen as the user would see it. My code gives me a little bit of information but how do I get everything off the page.

import requests 
from bs4 import BeautifulSoup 
    
doc = "https://youtu.be/jrPII7KfYx0"
res = requests.get(doc)    
soup = BeautifulSoup(res.content, "html.parser")   
tag = soup.body  
for string in tag.strings: 
    print(string)
I'm guessing there is Javascript on the page that YouTube loads and you wont see on requests. That being said, try using some headers so that YouTube doesn't redirect your connection or block your request altogether.

import requests 
from bs4 import BeautifulSoup 
    
headers = {"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36"}
doc = "https://youtu.be/jrPII7KfYx0"
res = requests.get(doc, headers=headers)    
soup = BeautifulSoup(res.content, "html.parser")   
tag = soup.body  
for string in tag.strings: 
    print(string)
use selenium when you have javascript to interpret,
see:
web scraping part 1
web scraping part 2