Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
urllib.request
#1
I am trying to find a way to count the number of character that appears on the screen as the user would see it. My code gives me a little bit of information but how do I get everything off the page.

1
2
3
4
5
6
7
8
9
import requests
from bs4 import BeautifulSoup
     
res = requests.get(doc)   
soup = BeautifulSoup(res.content, "html.parser")  
tag = soup.body 
for string in tag.strings:
    print(string)
buran write Dec-16-2020, 06:24 AM:
Please, use proper tags when post code, traceback, output, etc. This time I have added tags for you.
See BBcode help for more info.
Reply
#2
I'm guessing there is Javascript on the page that YouTube loads and you wont see on requests. That being said, try using some headers so that YouTube doesn't redirect your connection or block your request altogether.

1
2
3
4
5
6
7
8
9
10
import requests
from bs4 import BeautifulSoup
     
headers = {"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36"}
res = requests.get(doc, headers=headers)   
soup = BeautifulSoup(res.content, "html.parser")  
tag = soup.body 
for string in tag.strings:
    print(string)
Reply
#3
use selenium when you have javascript to interpret,
see:
web scraping part 1
web scraping part 2
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  urllib can't find "parse" rjdegraff42 6 6,385 Jul-24-2023, 05:28 PM
Last Post: deanhystad
  how can I correct the Bad Request error on my curl request tomtom 8 7,147 Oct-03-2021, 06:32 AM
Last Post: tomtom
  Prevent urllib.request from using my local proxy spacedog 0 3,754 Apr-24-2021, 08:55 PM
Last Post: spacedog
  urllib.request.ProxyHandler works with bad proxy spacedog 0 7,236 Apr-24-2021, 08:02 AM
Last Post: spacedog
  Need help with XPath using requests,time,urllib.request and BeautifulSoup spacedog 3 3,761 Apr-24-2021, 02:48 AM
Last Post: bowlofred
  Help with urllib.request Brian177 2 3,741 Apr-21-2021, 01:58 PM
Last Post: Brian177
  Cannot open url link using urllib.request Askic 5 8,615 Oct-25-2020, 04:56 PM
Last Post: Askic
  urllib is not a package traceback cc26 3 7,852 Aug-28-2020, 09:34 AM
Last Post: snippsat
  ImportError: cannot import name 'Request' from 'request' abhishek81py 1 5,150 Jun-18-2020, 08:07 AM
Last Post: buran
  get file by proxy and header using urllib.request.urlretrieve randyjack 0 2,895 Mar-12-2020, 09:22 AM
Last Post: randyjack

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020