Thank you. Then I'll first have to learn xpath I assume. Guess that it's often used with selenium.
p.s. now running snippsat's code, it only opens yahoo page.
p.s. now running snippsat's code, it only opens yahoo page.
Searching yahoo with selenium
|
Thank you. Then I'll first have to learn xpath I assume. Guess that it's often used with selenium.
p.s. now running snippsat's code, it only opens yahoo page. Quote:Then I'll first have to learn xpath I assume.you can get cheat sheets here: http://scraping.pro/5-best-xpath-cheat-s...eferences/ Also an important thing to know is that you can get the xpath of any element from the firefox or chrome browser:
Oct-12-2018, 11:02 PM
im actually horrible at it, but i use it often because you only have to select the elements xpath to use it. Which is much simpler on the dev side as you just bring up the dev tools in your browser and select the elements xpath and copy it over. You dont technically need to know much other than the fact that it pinpoints that exact element in the page.
Recommended Tutorials:
Oct-12-2018, 11:09 PM
(Oct-12-2018, 09:50 PM)Truman Wrote: p.s. now running snippsat's code, it only opens yahoo page.Nope it also search(it's your code i only added agree push button),if add some more lines i can get links from search. from selenium import webdriver from selenium.webdriver.common.keys import Keys import bs4 import time browser = webdriver.Chrome() browser.get('http://www.yahoo.com') assert 'Yahoo' in browser.title agree = browser.find_element_by_xpath('/html/body/div[1]/div[2]/div[4]/div/div[2]/form[1]/div/input') agree.click() elem = browser.find_element_by_name('p') # find the search box res = elem.send_keys('seleniumhq' + Keys.RETURN) #print(repr(res)) time.sleep(5) soup = bs4.BeautifulSoup(browser.page_source, "lxml") link_a = soup.find_all('a', class_=" ac-algo fz-l ac-21th lh-24") for link in link_a: print(link.text)
(Oct-12-2018, 11:00 PM)Larz60+ Wrote:Quote:Then I'll first have to learn xpath I assume.you can get cheat sheets here: http://scraping.pro/5-best-xpath-cheat-s...eferences/ Thank you, this is priceless. And so far I've been avoiding cheat sheets. Firstly because I want to be involved in these topics more thorough by reading documentation and available articles on it, and secondly because I hate those small pictures that I have to zoom in to be able to read the content. snippsat, from some reason your code doesn't work on my end. I had to make some changes ( Firefox in line 6 instead of Chrome and 'html.parser' instead of 'lxml' in line 16 ) but that shouldn't have any impact.
Oct-13-2018, 12:50 AM
FireFox may need some setup,as they messed with DesiredCapabilities, marionette ect with there driver.
Now i run with geckodriver.exe in same folder as script.Or: browser = webdriver.Firefox(capabilities=caps, executable_path=r"path to geckodriver")Setup for headless is in Web-scraping part-2. from selenium import webdriver from selenium.webdriver.common.keys import Keys import bs4 import time caps = webdriver.DesiredCapabilities().FIREFOX caps["marionette"] = True browser = webdriver.Firefox(capabilities=caps) browser.get('http://www.yahoo.com') assert 'Yahoo' in browser.title agree = browser.find_element_by_xpath('/html/body/div[1]/div[2]/div[4]/div/div[2]/form[1]/div/input') agree.click() elem = browser.find_element_by_name('p') # find the search box res = elem.send_keys('seleniumhq' + Keys.RETURN) #print(repr(res)) time.sleep(5) soup = bs4.BeautifulSoup(browser.page_source, "html.parser") link_a = soup.find_all('a', class_=" ac-algo fz-l ac-21th lh-24") for link in link_a: print(link.text)
Still only opens yahoo page.
I don't want to bother you with this anymore. Will come back later to this problem and try to solve it. By the way, on what element of the page did you click to get that exact xpath? It seems that I can't find it.
Oct-13-2018, 10:44 PM
Oct-13-2018, 10:59 PM
I know the procedure, larz explained it, I just can't find that exact path ('/html/body/div[1]/div[2]/div[4]/div/div[2]/form[1]/div/input') on web page.
ok button(what i got XPath from) will only be there first time visits Yahoo,then cookies get saved.So you will not see ok button it a browser where you have visit Yahoo before.As Selenium start a new session every time and have no saved cookies, then have to push ok button every time. |
|
Possibly Related Threads… | |||||
Thread | Author | Replies | Views | Last Post | |
Web Scraping with Yahoo Finance | miloellison | 1 | 2,597 |
Jul-03-2020, 11:12 PM Last Post: Larz60+ |
|
getting financial data from yahoo finance | asiaphone12 | 7 | 8,765 |
Jun-15-2020, 05:49 AM Last Post: mick_g |
|
Scrap Yahoo Finance using BS4 | mr_byte31 | 7 | 7,769 |
Aug-24-2018, 02:50 PM Last Post: Larz60+ |
|
webscraping yahoo data - custom date implementation | Jens89 | 4 | 6,455 |
Jun-19-2018, 08:02 AM Last Post: Jens89 |
|
Error in Selenium: CRITICAL:root:Selenium module is not installed...Exiting program. | AcszE | 1 | 4,485 |
Nov-03-2017, 08:41 PM Last Post: metulburr |