Python Forum

I'm having a hard time deciphering the requests module documentation, so is there a way to do this in one statement:

// assume I've imported requests and defined variable url with a string value representing
// a valid url

raw_object = requests.get(url)
extracted_text = raw_object.text

Also, when I use dir() on a requests.response object, I get a list of attributes and methods. Does that list vary depending on the data retrieved from the webpage accessed by requests.get(), or on different operating systems the running Python is installed on, provided that the version of Python is the same?

The returned by get() method object is a class with methods and attributes.

This is why you have to use .text or .content attribute to get the source of the page.

One line:

page = requests.get('http://python-forum.io').content

The .content is the response in bytes while the .text is in Unicode. The other methods and attributes you see when you dir() the response object have quite descriptive names so you get the picture.

(Jan-28-2018, 08:43 AM)league55 Wrote: [ -> ]Does that list vary depending on the data retrieved from the webpage accessed by requests.get()

Now the list is always the same,it's build as a package wheel.
The list do not change based on OS or external source.

A typical example.
Requests is always getting getting the correct encoding that web site use.
So sending content not text to BS is okay(no need to encode two times),as BS will detected that's utf-8 trough use of Unicode, Dammit

from bs4 import BeautifulSoup
import requests

url = 'https://www.python.org/'
url_get = requests.get(url)
print(url_get.encoding)
soup = BeautifulSoup(url_get.content, 'lxml')
print(soup.select('head > title')[0].text)

Output:utf-8
Welcome to Python.org

league55

wavic

snippsat