Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
get hotel info from hotelscombined
#1
Hi guys,

I am developing a program for extracting the hotels details from the hotelscombined. however, as per my trial, all texts including hotels name, location, service etc can be extracted under <div @class='hc_sr_summary>. However, if I just wanna get the hotels name. How can I make it?
In my trial, news2 = browser.find_element_by_xpath("//div[@id='u1']").text <--this can extract first hotel name only; for the rest, cannot be extracted.

I guess it is the loop issue. because the structure is as below. Please help to answer my question. Thank you.

<div class='hc_sr_summary'> <= master level
<div id='uniquehotelID1' class='hc-searchresultitem'> <== child level
<div class="hc-searchresultitem__hotelsummary">
<H3 class="hc-searchresultitem__hotelname">
<a id="searchResultHeading2679577" class="hc-searchresultitem__hotelnamelink" ....>Hotel Midtown Richardson</a>

<div id='uniquehotelID2' class='hc-searchresultitem'> <== child level
<div id='uniquehotelID3' class='hc-searchresultitem'> <== child level
<div id='uniquehotelID4' class='hc-searchresultitem'> <== child level

************************
from selenium import webdriver

browser = webdriver.Firefox()
browser.get('https://www.hotelscombined.hk/Hotels/Search?destination=place%3ATaipei&checkin=2019-05-05&checkout=2019-05-06&Rooms=1&adults_1=2&languageCode=HK&currencyCode=HKD#destination=place:Taipei&radius=0km&checkin=2019-05-05&checkout=2019-05-06&Rooms=1&adults_1=2&pageSize=15&pageIndex=1&sort=Popularity-desc&showSoldOut=false&scroll=432&HotelID=&mapState=expanded%3D0')

##Get all text from <div @class='hc_sr_summary>
news = browser.find_element_by_xpath("//div[@class='hc_sr_summary']").text
print (news)

##Get 1st <h3> under  <div @class='hc_sr_summary> rather than all <h3>???
news2 = browser.find_element_by_xpath("//h3[@class='hc-searchresultitem__hotelname']").text
print (news2)

#browser.close()
Reply
#2
find_element_by_ methods will return just one/first element
You need to use find_elements_by_ methods that will return list of multiple elements

then you will iterate over the elements in the list and extract .text property

Locating elements docs
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#3
This is how I'd do it:
** Note ** I changed language to English to make it easier for myself, you can change back
from selenium import webdriver
from bs4 import BeautifulSoup


browser = webdriver.Firefox()
# Changesd language code to english (&LanguageCode=EN)
browser.get('https://www.hotelscombined.hk/Hotels/Search?destination=place%3ATaipei&checkin=2019-05-05&checkout=2019-05-06&Rooms=1&adults_1=2&languageCode=EN&currencyCode=HKD#destination=place:Taipei&radius=0km&checkin=2019-05-05&checkout=2019-05-06&Rooms=1&adults_1=2&pageSize=15&pageIndex=1&sort=Popularity-desc&showSoldOut=false&scroll=432&HotelID=&mapState=expanded%3D0')

src = browser.page_source
soup = BeautifulSoup(src,"lxml")
hotels = soup.find('div', {'class': 'hc_sr_summary'})

hotel_names = hotels.find_all('div', {'class': 'hc-searchresultitem'})
for hname in hotel_names:
    name = hname.get('fn')
    print(f'Name: {name}')

browser.close()
results:
Output:
Name: San_Want_Residences Name: Ximen_Taipei_DreamHouse Name: San_Want_Hotel_Taipei Name: Urtrip_Hotel Name: Backpackers_Hostel_Taipei_Changchun Name: Taipei_M_Hotel_Main_Station Name: Diary_of_Taipei_Hotel_Main_Station Name: Go_Sleep_Hotel_Hankou Name: Park_Taipei_Hotel Name: FX_Hotel_Taipei_Nanjing_East_Road_Branch Name: Space_Inn Name: Sunworld_Dynasty_Hotel_Taipei Name: Green_World_Hotel_Zhonghua Name: Just_Sleep_Ximending Name: ECFA_Hotel_Wan_Nian
Reply
#4
(Apr-22-2019, 12:03 PM)buran Wrote: find_element_by_ methods will return just one/first element You need to use find_elements_by_ methods that will return list of multiple elements then you will iterate over the elements in the list and extract .text property Locating elements docs

Hi buran, thanks you for your answer even Larz60+ has showed another way of using BS.

I tried your method to change from "element" to "elements"
case 1: news3 = browser.find_elements_by_xpath("//div[@class='hc_sr_summary']/div/div/h3/a").text
however, the error showed as AttributeError: 'list' object has no attribute 'text'

case 2: I removed the text as news3 = browser.find_elements_by_xpath("//div[@class='hc_sr_summary']/div/div/h3/a")
however, the result showed some element codes which are not found in the web.

Would you share more the proper way of using find_elements_by_xpath in this case? Thank.

[<selenium.webdriver.firefox.webelement.FirefoxWebElement (session="c8f60e25-62c6-438d-98dc-25a7e9779656", element="af4e5737-265e-4920-a2a5-cbfaa97646fa")>, <selenium.webdriver.firefox.webelement.FirefoxWebElement (session="c8f60e25-62c6-438d-98dc-25a7e9779656", element="1cfb5db4-0f5f-4a20-8ddb-3fa93903b60c")>, <selenium.webdriver.firefox.webelement.FirefoxWebElement (session="c8f60e25-62c6-438d-98dc-25a7e9779656", element="61e02d61-36c6-470c-9dad-b0e053979172")>,
Reply
#5
I use a combination of Selenium and BeautifulSoup for a couple of reasons.

In this particular instance, once you have expanded the JavaScript, there's no need for selenium anymore.

Beautiful Soup is the best way to traverse the DOM and scrape the data, so after all JavaScript has been expanded, I use Beautiful Soup to grab the desired data. It speeds up the process (which, in some instances, can be a considerable amount of time), and makes it easier to grab other data if needed later on.
Reply
#6
Like Larz a switch to English:

hotel_names = browser.find_elements_by_xpath("//h3[@class='hc-searchresultitem__hotelname']")
for hotel_name in hotel_names:
    print(hotel_name.text)
Output:
Hotel Midtown Richardson Cosmos Hotel Taipei Energy Inn Taipei City FN Hotel Taipei M Hotel - Main Station Palais De Chine Diary of Taipei Hotel Main Station Go Sleep Hotel - Hankou Taipei Triple Tiger Inn Cho Hotel Yomi Hotel - ShuangLian Park Taipei Hotel FX Hotel Taipei Nanjing East Road Branch Space Inn Mr Lobster Secret Den design hostel
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020