Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Get html body of URL
#1
Hi,
I have a following issue. I would like to get HTML body of a webpage. I am beginning with Python, so to be clear - I need the same output as I get in Google chrome console using jquery command: $("body").html()

I tried:
import requests
url = "myurl"
r = requests.get(url)
r.text
but it gave me something different. Thanks for help!
Reply
#2
you did use an actual url, not "myurl", correct?
You should use meaningful names, not single letters.
You should also check the status code to make sure you actually downloaded the page:
import requests


url = "https://google.com"
response = requests.get(url)
if response.status_code == 200:
    print(response.text)
else:
    print("Could not find url: {url}")
Reply
#3
this code will give you nothing (unless you run it in interactive mode - i.e. line by line).
In any case, the source you get, may be different if page uses javascript to populate the content of the page.
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#4
Hi both, thanks for your replies!

@Larz60+ I checked it, and the status code is really 200.

import requests
url = "https://www.sreality.cz/hledani/pronajem/byty/praha?velikost=1%2B1,2%2Bkk&stavba=cihlova&patro-od=2&patro-do=100&razeni=nejlevnejsi"

r = requests.get(url)

r.status_code   #yes, status code == 200
r.text
@buran - I am not sure, how do you mean it. How can I get the HTML body, if the page uses js?
Reply
#5
you need to check the status code in the code, why have a computer otherwise!
include an if/else statement as I showed in post 2
Reply
#6
(Aug-03-2020, 09:00 AM)rama27 Wrote: @buran - I am not sure, how do you mean it. How can I get the HTML body, if the page uses js?
it does.
one way is to use tool like selenium
the other option is to examine the request being made. e.g. there is one link
https://www.sreality.cz/api/cs/v2/estate...6450616863
it returns json for first 20 properties, but you need to better research what information is contained and how to retrieve next batch
e.g. there is page in the next address
https://www.sreality.cz/api/cs/v2/estate...6451160636
the idea in this case is to replicate the requests made by the page and parse the json you get
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#7
Here a example with Selenium,it's not the most easy page to start with if new to this.
If you can find in info in the json return as @buran show,
then that is fine and fast way as it only requires Requests with a get call and catch response .json().
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
import time

#--| Setup
options = Options()
#options.add_argument("--headless")
#options.add_argument("--window-size=1980,1020")
browser = webdriver.Chrome(executable_path=r'C:\cmder\bin\chromedriver.exe', options=options)
#--| Parse or automation
url = "https://www.sreality.cz/hledani/pronajem/byty/praha?velikost=1%2B1"
browser.get(url)
time.sleep(3)
# Use BeautifulSoup
soup = BeautifulSoup(browser.page_source, 'lxml')
title = soup.find('h1', class_="page-title list-title ng-binding")
print(title.text)
print('-' * 40)
# Use Selenium
info = browser.find_elements_by_xpath("//div[@class='dir-property-list']//div[1]//div[1]//div[1]")
print(info[0].text)
Output:
Byty 1+1 k pronájmu Praha ---------------------------------------- Pronájem bytu 1+kk 35 m² Praha 5 - Smíchov 14 000 Kč za měsíc
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  [BeautifulSoup] Find </body>? Winfried 3 1,242 Jul-21-2023, 11:25 AM
Last Post: Gaurav_Kumar
  HTML multi select HTML listbox with Flask/Python rfeyer 0 4,536 Mar-14-2021, 12:23 PM
Last Post: rfeyer
  Python3 + BeautifulSoup4 + lxml (HTML -> CSV) - How to loop to next HTML/new CSV Row BrandonKastning 0 2,329 Mar-22-2020, 06:10 AM
Last Post: BrandonKastning
  Is it possible to perform a PUT request by passing a req body instead of an ID ary 0 1,797 Feb-20-2019, 05:55 AM
Last Post: ary
  In CSV, how to write the header after writing the body? Tim 18 14,421 Jan-06-2018, 01:54 PM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020