Get html body of URL - Printable Version

Get html body of URL - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html)
+--- Thread: Get html body of URL (/thread-28765.html)

Get html body of URL - rama27 - Aug-02-2020

Hi,
I have a following issue. I would like to get HTML body of a webpage. I am beginning with Python, so to be clear - I need the same output as I get in Google chrome console using jquery command: $("body").html()

I tried:

import requests
url = "myurl"
r = requests.get(url)
r.text

but it gave me something different. Thanks for help!

RE: Get html body of URL - Larz60+ - Aug-02-2020

you did use an actual url, not "myurl", correct?
You should use meaningful names, not single letters.
You should also check the status code to make sure you actually downloaded the page:

import requests


url = "https://google.com"
response = requests.get(url)
if response.status_code == 200:
    print(response.text)
else:
    print("Could not find url: {url}")

RE: Get html body of URL - buran - Aug-03-2020

this code will give you nothing (unless you run it in interactive mode - i.e. line by line).
In any case, the source you get, may be different if page uses javascript to populate the content of the page.

RE: Get html body of URL - rama27 - Aug-03-2020

Hi both, thanks for your replies!

@Larz60+ I checked it, and the status code is really 200.

import requests
url = "https://www.sreality.cz/hledani/pronajem/byty/praha?velikost=1%2B1,2%2Bkk&stavba=cihlova&patro-od=2&patro-do=100&razeni=nejlevnejsi"

r = requests.get(url)

r.status_code   #yes, status code == 200
r.text

@buran - I am not sure, how do you mean it. How can I get the HTML body, if the page uses js?

RE: Get html body of URL - Larz60+ - Aug-03-2020

you need to check the status code in the code, why have a computer otherwise!
include an if/else statement as I showed in post 2

RE: Get html body of URL - buran - Aug-03-2020

(Aug-03-2020, 09:00 AM)rama27 Wrote: @buran - I am not sure, how do you mean it. How can I get the HTML body, if the page uses js?

it does.
one way is to use tool like selenium
the other option is to examine the request being made. e.g. there is one link
https://www.sreality.cz/api/cs/v2/estates?building_type_search=2&category_main_cb=1&category_sub_cb=3%7C4&category_type_cb=2&floor_number=2%7C100&locality_region_id=10&per_page=20&sort=1&tms=1596450616863
it returns json for first 20 properties, but you need to better research what information is contained and how to retrieve next batch
e.g. there is page in the next address
https://www.sreality.cz/api/cs/v2/estates?category_main_cb=1&category_sub_cb=3&category_type_cb=2&locality_region_id=10&page=2&per_page=20&tms=1596451160636
the idea in this case is to replicate the requests made by the page and parse the json you get

RE: Get html body of URL - snippsat - Aug-03-2020

Here a example with Selenium,it's not the most easy page to start with if new to this.
If you can find in info in the json return as @buran show,
then that is fine and fast way as it only requires Requests with a get call and catch response .json().

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
import time

#--| Setup
options = Options()
#options.add_argument("--headless")
#options.add_argument("--window-size=1980,1020")
browser = webdriver.Chrome(executable_path=r'C:\cmder\bin\chromedriver.exe', options=options)
#--| Parse or automation
url = "https://www.sreality.cz/hledani/pronajem/byty/praha?velikost=1%2B1"
browser.get(url)
time.sleep(3)
# Use BeautifulSoup
soup = BeautifulSoup(browser.page_source, 'lxml')
title = soup.find('h1', class_="page-title list-title ng-binding")
print(title.text)
print('-' * 40)
# Use Selenium
info = browser.find_elements_by_xpath("//div[@class='dir-property-list']//div[1]//div[1]//div[1]")
print(info[0].text)

Output:Byty 1+1 k pronájmu Praha
----------------------------------------
Pronájem bytu 1+kk 35 m²
Praha 5 - Smíchov
14 000 Kč za měsíc