Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Searching yahoo with selenium
#11
Thank you. Then I'll first have to learn xpath I assume. Guess that it's often used with selenium.

p.s. now running snippsat's code, it only opens yahoo page.
Reply
#12
Quote:Then I'll first have to learn xpath I assume.
you can get cheat sheets here: http://scraping.pro/5-best-xpath-cheat-s...eferences/
Also an important thing to know is that you can get the xpath of any element from the firefox or chrome browser:
  • Highlight the data you want to find
  • Right click and select INspect Element
  • right click on html in inspector
  • Choose copy
  • select XPath
Reply
#13
im actually horrible at it, but i use it often because you only have to select the elements xpath to use it. Which is much simpler on the dev side as you just bring up the dev tools in your browser and select the elements xpath and copy it over. You dont technically need to know much other than the fact that it pinpoints that exact element in the page.
Recommended Tutorials:
Reply
#14
(Oct-12-2018, 09:50 PM)Truman Wrote: p.s. now running snippsat's code, it only opens yahoo page.
Nope it also search(it's your code i only added agree push button),if add some more lines i can get links from search.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import bs4
import time

browser = webdriver.Chrome()
browser.get('http://www.yahoo.com')
assert 'Yahoo' in browser.title

agree =  browser.find_element_by_xpath('/html/body/div[1]/div[2]/div[4]/div/div[2]/form[1]/div/input')
agree.click()
elem = browser.find_element_by_name('p') # find the search box
res = elem.send_keys('seleniumhq' + Keys.RETURN)
#print(repr(res))
time.sleep(5)
soup = bs4.BeautifulSoup(browser.page_source, "lxml")
link_a = soup.find_all('a', class_=" ac-algo fz-l ac-21th lh-24")
for link in link_a:
    print(link.text)
Output:
Selenium - Web Browser Automation Downloads - Selenium Selenium (software) - Wikipedia Selenium (@SeleniumHQ) | Twitter Selenium WebDriver - docs.seleniumhq.org SeleniumHQ/selenium - GitHub Selenium Projects Selenium Sponsors Selenium IDE Selenium Documentation — Selenium Documentation
Reply
#15
(Oct-12-2018, 11:00 PM)Larz60+ Wrote:
Quote:Then I'll first have to learn xpath I assume.
you can get cheat sheets here: http://scraping.pro/5-best-xpath-cheat-s...eferences/
Also an important thing to know is that you can get the xpath of any element from the firefox or chrome browser:
  • Highlight the data you want to find
  • Right click and select INspect Element
  • right click on html in inspector
  • Choose copy
  • select XPath

Thank you, this is priceless.

And so far I've been avoiding cheat sheets. Firstly because I want to be involved in these topics more thorough by reading documentation and available articles on it, and secondly because I hate those small pictures that I have to zoom in to be able to read the content.

snippsat, from some reason your code doesn't work on my end.
I had to make some changes ( Firefox in line 6 instead of Chrome and 'html.parser' instead of 'lxml' in line 16 ) but that shouldn't have any impact.
Reply
#16
FireFox may need some setup,as they messed with DesiredCapabilities, marionette ect with there driver.
Now i run with geckodriver.exe in same folder as script.
Or:
browser = webdriver.Firefox(capabilities=caps, executable_path=r"path to geckodriver")
Setup for headless is in Web-scraping part-2.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import bs4
import time

caps = webdriver.DesiredCapabilities().FIREFOX
caps["marionette"] = True
browser = webdriver.Firefox(capabilities=caps)
browser.get('http://www.yahoo.com')
assert 'Yahoo' in browser.title

agree =  browser.find_element_by_xpath('/html/body/div[1]/div[2]/div[4]/div/div[2]/form[1]/div/input')
agree.click()
elem = browser.find_element_by_name('p') # find the search box
res = elem.send_keys('seleniumhq' + Keys.RETURN)
#print(repr(res))
time.sleep(5)
soup = bs4.BeautifulSoup(browser.page_source, "html.parser")
link_a = soup.find_all('a', class_=" ac-algo fz-l ac-21th lh-24")
for link in link_a:
    print(link.text)
Output:
Selenium - Web Browser Automation Downloads - Selenium Selenium (software) - Wikipedia Selenium (@SeleniumHQ) | Twitter Selenium WebDriver - docs.seleniumhq.org SeleniumHQ/selenium - GitHub Selenium Projects Selenium Sponsors Selenium IDE Selenium Documentation — Selenium Documentation
Reply
#17
Still only opens yahoo page.
I don't want to bother you with this anymore. Will come back later to this problem and try to solve it.

By the way, on what element of the page did you click to get that exact xpath? It seems that I can't find it.
Reply
#18
Trueman Wrote:By the way, on what element of the page did you click to get that exact xpath? It seems that I can't find it.
Same in Chrome and FireFox,right click --> inspect select html tag right click --> Copy.
Output:
#pid_60555 //*[@id="pid_60555"]
[Image: OFiO0I.jpg]
Reply
#19
I know the procedure, larz explained it, I just can't find that exact path ('/html/body/div[1]/div[2]/div[4]/div/div[2]/form[1]/div/input') on web page.
Reply
#20
ok button(what i got XPath from) will only be there first time visits Yahoo,then cookies get saved.
So you will not see ok button it a browser where you have visit Yahoo before.

As Selenium start a new session every time and have no saved cookies,
then have to push ok button every time.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Web Scraping with Yahoo Finance miloellison 1 2,065 Jul-03-2020, 11:12 PM
Last Post: Larz60+
  getting financial data from yahoo finance asiaphone12 7 6,993 Jun-15-2020, 05:49 AM
Last Post: mick_g
  Scrap Yahoo Finance using BS4 mr_byte31 7 6,210 Aug-24-2018, 02:50 PM
Last Post: Larz60+
  webscraping yahoo data - custom date implementation Jens89 4 5,130 Jun-19-2018, 08:02 AM
Last Post: Jens89
  Error in Selenium: CRITICAL:root:Selenium module is not installed...Exiting program. AcszE 1 3,636 Nov-03-2017, 08:41 PM
Last Post: metulburr

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020