Python Forum

Full Version: Simple newbie Q
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I'm new to python so hope this is really simple.

import bs4 as bs
import urllib .request

saurce = urllib .request.urlopen("").read()
soup = bs.BeautifulSoup(saurce,'lxml')


I thought this returns all the text on the page. However it doesn't, it returns the text in the header and foot of the page. Much different than following the video using a different page. What should I be doig differently to get the text in the main part of the page?

Many thanks
Here's a super simple example using requests (use instead of urllib2)

to get requests package:
pip install requests
import requests
from bs4 import BeautifulSoup
response = requests.get('')
if response.status_code == 200:
    soup = BeautifulSoup(response.text, 'lxml')
    print('Problem downloading status code: {}'.format(response.status_code))
Probably the page content is generated by JS. So urllib can't do anything here. Test the script with a webpage that is static. This one for example:
Thanks. The code didn't do the trick so will explode the link provided.
I tried it by myself. It's difficult because the page is rendered by JavaScript and the CSS-Selector I tried, did not bring any results.

There is a lib, which is using BeautifulSoap, but with a better API:
There you have the method html.render() which should start a hidden Chronium instance in the background, to render the JavaScripts.