Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 Scape webpage that has text - javascript?
#1
Hi All Python web scraping experts, Tongue

I would like to extract data from this page:
https://www.brisbane.qld.gov.au/clean-an...collection

In the paste, I use to be able to right-click and view source. Copy the html and just do some text splitting (text to coln) using excel.

However, it looks like they updated the website and now the suburbs (area location) is in the source code however the date isn't.

Just wondering if some can point me in the right direction? any tutorials? or existing code someone can share with me?

I could do this manually however there's a lot of entries and I will need to do it once every 6 months. Hoping a simply py script can do the trick? Undecided
Quote
#2
(Aug-07-2019, 04:01 AM)lonelygirl Wrote: However, it looks like they updated the website and now the suburbs (area location) is in the source code however the date isn't.
Websites often change their code. In response scrapers have to change their code as well. Most of the time its just a class or xpath change, but sometimes you might have to rewrite a whole section if they added javascript. So i would first just check if they changed the tag that you are getting the date from before venturing into selenium. That happens a lot with me.

If they have javascript, then you would have to use Selenium to get the data. Its a little bit extra code if you are just using requests and BS4, but with selenium you could actually get rid of those and do it solely with selenium. Or just use it to get the correct HTML with javascript and then pass it over to BS4 (or whatever parser you are using).

We have a basic tutorial here.

EDIT:
That section is definitely using javascript. They even called the classes js (javascript)
Quote:<div class="js-webform-computed-wrapper" id="webform-computed-collection_starts_week_commencing-wrapper" data-webform-announce="Collection starts week commencing: is
12 August 2019
">
<div class="js-form-item form__item js-form-type-item form__item--item js-form-item-collection-starts-week-commencing form__item--collection-starts-week-commencing">
<label for="edit-collection-starts-week-commencing--UdYk6bJgUiI">
Collection starts week commencing:
</label>
<br>
12 August 2019
<input data-drupal-selector="edit-collection-starts-week-commencing" type="hidden" name="collection_starts_week_commencing" value="<br>
12 August 2019
">
<input class="js-hide js-webform-novalidate js-webform-computed-submit button js-form-submit form-submit" data-drupal-selector="edit-collection-starts-week-commencing-update" type="submit" id="edit-collection-starts-week-commencing-update--MouuJHidxPs" name="webform-computed-collection_starts_week_commencing-button" value="Update">

</div>
</div>

to select option from drop down menu you can do
from selenium.webdriver.support.select import Select
#load page in selenium first
select_fr = Select(driver.find_element_by_id("fruits01"))
select_fr.select_by_index(0)
or by CSS selector

driver.find_element_by_css_selector("#fruits01 [value='1']").click()
Quote
#3
thanks for your response.

before i just right click and use view source code and copy and pasted that block of text into excel and did replace text and split text to coln feature within excel ( didn't use python at all)

i use request and bsoup often but haven't heard about selenium. Thanks for linking to the tutorial. I'll give it a go. Cheers.
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  scraping in a text/javascript saasyp 1 228 Aug-31-2019, 11:39 AM
Last Post: metulburr
  how to get all lines and text from a webpage pratheep 2 1,081 Mar-31-2018, 02:52 PM
Last Post: wavic

Forum Jump:


Users browsing this thread: 1 Guest(s)