Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
urllib.request
#1
I am trying to find a way to count the number of character that appears on the screen as the user would see it. My code gives me a little bit of information but how do I get everything off the page.

import requests 
from bs4 import BeautifulSoup 
    
doc = "https://youtu.be/jrPII7KfYx0"
res = requests.get(doc)    
soup = BeautifulSoup(res.content, "html.parser")   
tag = soup.body  
for string in tag.strings: 
    print(string)
buran write Dec-16-2020, 06:24 AM:
Please, use proper tags when post code, traceback, output, etc. This time I have added tags for you.
See BBcode help for more info.
Reply
#2
I'm guessing there is Javascript on the page that YouTube loads and you wont see on requests. That being said, try using some headers so that YouTube doesn't redirect your connection or block your request altogether.

import requests 
from bs4 import BeautifulSoup 
    
headers = {"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36"}
doc = "https://youtu.be/jrPII7KfYx0"
res = requests.get(doc, headers=headers)    
soup = BeautifulSoup(res.content, "html.parser")   
tag = soup.body  
for string in tag.strings: 
    print(string)
Reply
#3
use selenium when you have javascript to interpret,
see:
web scraping part 1
web scraping part 2
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  urllib can't find "parse" rjdegraff42 6 2,149 Jul-24-2023, 05:28 PM
Last Post: deanhystad
  how can I correct the Bad Request error on my curl request tomtom 8 5,058 Oct-03-2021, 06:32 AM
Last Post: tomtom
  Prevent urllib.request from using my local proxy spacedog 0 2,871 Apr-24-2021, 08:55 PM
Last Post: spacedog
  urllib.request.ProxyHandler works with bad proxy spacedog 0 5,915 Apr-24-2021, 08:02 AM
Last Post: spacedog
  Need help with XPath using requests,time,urllib.request and BeautifulSoup spacedog 3 2,843 Apr-24-2021, 02:48 AM
Last Post: bowlofred
  Help with urllib.request Brian177 2 2,868 Apr-21-2021, 01:58 PM
Last Post: Brian177
  Cannot open url link using urllib.request Askic 5 6,673 Oct-25-2020, 04:56 PM
Last Post: Askic
  urllib is not a package traceback cc26 3 5,392 Aug-28-2020, 09:34 AM
Last Post: snippsat
  ImportError: cannot import name 'Request' from 'request' abhishek81py 1 3,924 Jun-18-2020, 08:07 AM
Last Post: buran
  get file by proxy and header using urllib.request.urlretrieve randyjack 0 2,254 Mar-12-2020, 09:22 AM
Last Post: randyjack

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020