Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 Scape webpage that has text - javascript?
#1
Hi All Python web scraping experts, Tongue

I would like to extract data from this page:
https://www.brisbane.qld.gov.au/clean-an...collection

In the paste, I use to be able to right-click and view source. Copy the html and just do some text splitting (text to coln) using excel.

However, it looks like they updated the website and now the suburbs (area location) is in the source code however the date isn't.

Just wondering if some can point me in the right direction? any tutorials? or existing code someone can share with me?

I could do this manually however there's a lot of entries and I will need to do it once every 6 months. Hoping a simply py script can do the trick? Undecided
Quote
#2
(Aug-07-2019, 04:01 AM)lonelygirl Wrote: However, it looks like they updated the website and now the suburbs (area location) is in the source code however the date isn't.
Websites often change their code. In response scrapers have to change their code as well. Most of the time its just a class or xpath change, but sometimes you might have to rewrite a whole section if they added javascript. So i would first just check if they changed the tag that you are getting the date from before venturing into selenium. That happens a lot with me.

If they have javascript, then you would have to use Selenium to get the data. Its a little bit extra code if you are just using requests and BS4, but with selenium you could actually get rid of those and do it solely with selenium. Or just use it to get the correct HTML with javascript and then pass it over to BS4 (or whatever parser you are using).

We have a basic tutorial here.

EDIT:
That section is definitely using javascript. They even called the classes js (javascript)
Quote:<div class="js-webform-computed-wrapper" id="webform-computed-collection_starts_week_commencing-wrapper" data-webform-announce="Collection starts week commencing: is
12 August 2019
">
<div class="js-form-item form__item js-form-type-item form__item--item js-form-item-collection-starts-week-commencing form__item--collection-starts-week-commencing">
<label for="edit-collection-starts-week-commencing--UdYk6bJgUiI">
Collection starts week commencing:
</label>
<br>
12 August 2019
<input data-drupal-selector="edit-collection-starts-week-commencing" type="hidden" name="collection_starts_week_commencing" value="<br>
12 August 2019
">
<input class="js-hide js-webform-novalidate js-webform-computed-submit button js-form-submit form-submit" data-drupal-selector="edit-collection-starts-week-commencing-update" type="submit" id="edit-collection-starts-week-commencing-update--MouuJHidxPs" name="webform-computed-collection_starts_week_commencing-button" value="Update">

</div>
</div>

to select option from drop down menu you can do
from selenium.webdriver.support.select import Select
#load page in selenium first
select_fr = Select(driver.find_element_by_id("fruits01"))
select_fr.select_by_index(0)
or by CSS selector

driver.find_element_by_css_selector("#fruits01 [value='1']").click()
Quote
#3
thanks for your response.

before i just right click and use view source code and copy and pasted that block of text into excel and did replace text and split text to coln feature within excel ( didn't use python at all)

i use request and bsoup often but haven't heard about selenium. Thanks for linking to the tutorial. I'll give it a go. Cheers.
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  Display module variables and edit through webpage freak14 1 111 Aug-08-2019, 06:26 AM
Last Post: fishhook
  Unable to access javaScript generated data with selenium and headless FireFox. pjn4 0 120 Aug-04-2019, 11:10 AM
Last Post: pjn4
  Access my webpage and download files from Python Pedroski55 7 396 May-26-2019, 12:08 PM
Last Post: snippsat
  Read Save RadioButtons from Database in Python Flask Webpage Gary8877 0 300 Apr-11-2019, 12:33 AM
Last Post: Gary8877
  [split] How to find a specific word in a webpage and How to count it. marpop 2 351 Mar-12-2019, 08:25 AM
Last Post: snippsat
  Get data from a webpage Pedroski55 3 543 Mar-02-2019, 03:13 AM
Last Post: Pedroski55
  webpage input module rudolphyaber 2 285 Feb-26-2019, 12:13 AM
Last Post: Larz60+
  How can get url from JavaScript in Selenium (Python 3)? m0ntecr1st0 3 541 Feb-19-2019, 12:35 AM
Last Post: m0ntecr1st0
  Scraping a webpage with BS4 SBF12345 3 427 Jan-30-2019, 12:47 AM
Last Post: Larz60+
  display multiple sensors on webpage python flask jinja pascale 6 543 Jan-29-2019, 10:10 AM
Last Post: pascale

Forum Jump:


Users browsing this thread: 1 Guest(s)