Python Forum

I am working up to the following code and get the page with the button. I need to click it to go next page. I prefer to use Requests or BeautifulSoup. I also tried to install mechanize failed with error "mechanize only works on python 2.x". I use python 3.6.2

soup = BeautifulSoup(page.content, "html.parser")
html = list(soup.children)[1]
body = list(soup.children)[2]
btn = body.find("button")
print(btn)

<button class="actionbtn" id="getlinks" type="submit"><table class="gl1"><tr><td class="gl2"><img alt="Get download links" src="img/links.png"/></td><td class="gl3">Get download links!<br/></td></tr></table></button>
>>>

(Apr-14-2018, 03:32 PM)ian Wrote: [ -> ]I need to click it to go next page. I prefer to use Requests or BeautifulSoup.

Not a job that suits these well.

If you need to interact with a web-page: click buttons, scroll etc - you need to use a tool that utilizes a real browser, like Selenium.
I more about this in Web-scraping part-2

This button's type is 'submit'. I'm wondering if I can use requests.Session().Post

(Apr-14-2018, 05:05 PM)ian Wrote: [ -> ]This button's type is 'submit'. I'm wondering if I can use requests.Session().Post

Submit a form requests is possibly with Requests.
Example with this form Pen
With Requests:

import requests
 
headers = {'Content-type': 'application/x-www-form-urlencoded'}
data = {"email": "[email protected]"}
response = requests.post('http://127.0.0.1:5000/email', headers=headers, data=data)

So this example i have run before,it will send data to a Flask and catch with email = request.form['email'] on server.

I got to try Selenium. It asks for webdriver. I tried to install but hanging. I tried download webdriver for Edge, Ie, Firefox and Chrome all the same. I use Windows 10.

(Apr-14-2018, 08:57 PM)ian Wrote: [ -> ]I tried to install but hanging.

I don't know what you mean bye hanging.

As first test,do this pip install -U selenium.

C:\1_py\scrape
λ pip install -U selenium
Collecting selenium
  Downloading selenium-3.11.0-py2.py3-none-any.whl (943kB)
    100% |████████████████████████████████| 952kB 484kB/s
Installing collected packages: selenium
  Found existing installation: selenium 3.9.0
    Uninstalling selenium-3.9.0:
      Successfully uninstalled selenium-3.9.0
Successfully installed selenium-3.11.0

Download chromedriver,unzip to a folder eg C:\scrape
In same folder you have script under.

C:\scrape
  |-- get_doc.py
  |-- chromedriver.exe

# get_doc.py
from selenium import webdriver
import time

browser = webdriver.Chrome()
browser.get("http://www.python.org")
time.sleep(5)
doc = browser.find_elements_by_xpath('//*[@id="top"]/nav/ul/li[3]/a')[0]
doc.click()
time.sleep(5)
browser.quit()

Now you run get_doc.py,do this from command line.

C:\1_py\scrape
λ python get_doc.py

It should start browser,and after 5-sec it click on Docs so you are in Python 3.6.5 documentation.

Works great! Thank you very much.
I thought I need to manually install webdrivers but command_line window popup and hanging there.

Now I can find buttons/links and click going next page ok.
Is there a way to replace time.sleep in your sample with another one so I can search elements as soon as next page ALL loaded. I tried 'Wait.until' but cannot figure out how to wait until ALL elements loaded. Similar to Microsoft Powershell While ($ie.busy) { Start-Sleep -Seconds 1 } Thanks.

(Apr-15-2018, 06:42 PM)ian Wrote: [ -> ]Is there a way to replace time.sleep in your sample with another one so I can search elements as soon as next page ALL loaded. I tried 'Wait.until' but cannot figure out how to wait until ALL elements loaded. Similar to Microsoft Powershell While ($ie.busy) { Start-Sleep -Seconds 1 } Thanks.

There are two kind of waits(doc) explicit waits and implicit waits.
So time.sleep(...) is a forced wait no matter how fast the site load elements,
can be used as first test,but then it can be better to look at waits.

ian

snippsat

ian

snippsat

ian

snippsat

ian

snippsat