Searching yahoo with selenium

Truman · (This post was last modified: Oct-12-2018, 09:50 PM by Truman.)

Thank you. Then I'll first have to learn xpath I assume. Guess that it's often used with selenium.

p.s. now running snippsat's code, it only opens yahoo page.

**Larz60+** · (This post was last modified: Oct-12-2018, 11:00 PM by Larz60+.)

Quote:Then I'll first have to learn xpath I assume.

you can get cheat sheets here: http://scraping.pro/5-best-xpath-cheat-s...eferences/
Also an important thing to know is that you can get the xpath of any element from the firefox or chrome browser:

Highlight the data you want to find
Right click and select INspect Element
right click on html in inspector
Choose copy
select XPath

***metulburr*** · Oct-12-2018, 11:02 PM

im actually horrible at it, but i use it often because you only have to select the elements xpath to use it. Which is much simpler on the dev side as you just bring up the dev tools in your browser and select the elements xpath and copy it over. You dont technically need to know much other than the fact that it pinpoints that exact element in the page.

***snippsat*** · Oct-12-2018, 11:09 PM

(Oct-12-2018, 09:50 PM)Truman Wrote: p.s. now running snippsat's code, it only opens yahoo page.

Nope it also search(it's your code i only added agree push button),if add some more lines i can get links from search.

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import bs4
import time

browser = webdriver.Chrome()
browser.get('http://www.yahoo.com')
assert 'Yahoo' in browser.title

agree =  browser.find_element_by_xpath('/html/body/div[1]/div[2]/div[4]/div/div[2]/form[1]/div/input')
agree.click()
elem = browser.find_element_by_name('p') # find the search box
res = elem.send_keys('seleniumhq' + Keys.RETURN)
#print(repr(res))
time.sleep(5)
soup = bs4.BeautifulSoup(browser.page_source, "lxml")
link_a = soup.find_all('a', class_=" ac-algo fz-l ac-21th lh-24")
for link in link_a:
    print(link.text)

Output:Selenium - Web Browser Automation
Downloads - Selenium
Selenium (software) - Wikipedia
Selenium (@SeleniumHQ) | Twitter
Selenium WebDriver - docs.seleniumhq.org
SeleniumHQ/selenium - GitHub
Selenium Projects
Selenium Sponsors
Selenium IDE
Selenium Documentation — Selenium Documentation

Truman · (This post was last modified: Oct-12-2018, 11:59 PM by Truman.)

(Oct-12-2018, 11:00 PM)Larz60+ Wrote:
Quote:Then I'll first have to learn xpath I assume.
you can get cheat sheets here: http://scraping.pro/5-best-xpath-cheat-s...eferences/
Also an important thing to know is that you can get the xpath of any element from the firefox or chrome browser:
Highlight the data you want to find

Right click and select INspect Element

right click on html in inspector

Choose copy

select XPath

Thank you, this is priceless.

And so far I've been avoiding cheat sheets. Firstly because I want to be involved in these topics more thorough by reading documentation and available articles on it, and secondly because I hate those small pictures that I have to zoom in to be able to read the content.

snippsat, from some reason your code doesn't work on my end.
I had to make some changes ( Firefox in line 6 instead of Chrome and 'html.parser' instead of 'lxml' in line 16 ) but that shouldn't have any impact.

***snippsat*** · Oct-13-2018, 12:50 AM

FireFox may need some setup,as they messed with DesiredCapabilities, marionette ect with there driver.
Now i run with geckodriver.exe in same folder as script.
Or:

browser = webdriver.Firefox(capabilities=caps, executable_path=r"path to geckodriver")

Setup for headless is in Web-scraping part-2.

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import bs4
import time

caps = webdriver.DesiredCapabilities().FIREFOX
caps["marionette"] = True
browser = webdriver.Firefox(capabilities=caps)
browser.get('http://www.yahoo.com')
assert 'Yahoo' in browser.title

agree =  browser.find_element_by_xpath('/html/body/div[1]/div[2]/div[4]/div/div[2]/form[1]/div/input')
agree.click()
elem = browser.find_element_by_name('p') # find the search box
res = elem.send_keys('seleniumhq' + Keys.RETURN)
#print(repr(res))
time.sleep(5)
soup = bs4.BeautifulSoup(browser.page_source, "html.parser")
link_a = soup.find_all('a', class_=" ac-algo fz-l ac-21th lh-24")
for link in link_a:
    print(link.text)

Output:Selenium - Web Browser Automation
Downloads - Selenium
Selenium (software) - Wikipedia
Selenium (@SeleniumHQ) | Twitter
Selenium WebDriver - docs.seleniumhq.org
SeleniumHQ/selenium - GitHub
Selenium Projects
Selenium Sponsors
Selenium IDE
Selenium Documentation — Selenium Documentation

Truman · (This post was last modified: Oct-13-2018, 09:45 PM by Truman.)

Still only opens yahoo page.
I don't want to bother you with this anymore. Will come back later to this problem and try to solve it.

By the way, on what element of the page did you click to get that exact xpath? It seems that I can't find it.

***snippsat*** · Oct-13-2018, 10:44 PM

Trueman Wrote:By the way, on what element of the page did you click to get that exact xpath? It seems that I can't find it.

Same in Chrome and FireFox,right click --> inspect select html tag right click --> Copy.

Output:#pid_60555
//*[@id="pid_60555"]

Truman · Oct-13-2018, 10:59 PM

I know the procedure, larz explained it, I just can't find that exact path ('/html/body/div[1]/div[2]/div[4]/div/div[2]/form[1]/div/input') on web page.

***snippsat*** · (This post was last modified: Oct-13-2018, 11:56 PM by snippsat.)

ok button(what i got XPath from) will only be there first time visits Yahoo,then cookies get saved.
So you will not see ok button it a browser where you have visit Yahoo before.

As Selenium start a new session every time and have no saved cookies,
then have to push ok button every time.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Web Scraping with Yahoo Finance	miloellison	1	2,597	Jul-03-2020, 11:12 PM Last Post: Larz60+
	getting financial data from yahoo finance	asiaphone12	7	8,765	Jun-15-2020, 05:49 AM Last Post: mick_g
	Scrap Yahoo Finance using BS4	mr_byte31	7	7,769	Aug-24-2018, 02:50 PM Last Post: Larz60+
	webscraping yahoo data - custom date implementation	Jens89	4	6,455	Jun-19-2018, 08:02 AM Last Post: Jens89
	Error in Selenium: CRITICAL:root:Selenium module is not installed...Exiting program.	AcszE	1	4,485	Nov-03-2017, 08:41 PM Last Post: metulburr

Searching yahoo with selenium

User Panel Messages

Announcements