Need some help with Selenium - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: Need some help with Selenium (/thread-34624.html) |
Need some help with Selenium - WallieA - Aug-14-2021 I have written the code below to scrape the data from the bet monitor. When i run the code, i will get 18x the text: 'Today 13 Aug 20:00' . I would like to get the result with all the different dates in stead of just the same text 18 times. So the result i want: " Today 13 Aug 20:00 Saturday 14 Aug 16:30 etc'' Can somebody help me to get all the dates instead of the same date many times? Thanks! from selenium import webdriver url = 'https://www.betmonitor.com/odds-comparison/football/netherlands-eredivisie/10000060' driver = webdriver.Chrome() driver.get(url) header = driver.find_element_by_id('content') event = header.find_elements_by_class_name('league-event-new') for details in event: datum = details.find_element_by_xpath('//div[@class="evtime"]').text print(datum) RE: Need some help with Selenium - snippsat - Aug-14-2021 Like this,and look setup. The can eg run headless or other options. from selenium import webdriver from selenium.webdriver.chrome.options import Options from time import sleep #--| Setup options = Options() options.add_argument("--headless") driver = webdriver.Chrome(executable_path=r'C:\cmder\bin\chromedriver.exe', options=options) #--| Parse or automation url = "https://www.betmonitor.com/odds-comparison/football/netherlands-eredivisie/10000060" driver.get(url) time_play = driver.find_elements_by_css_selector("div.evtime") for index, time_event in enumerate(time_play): print(time_play[index].text) print('-' * 10)
RE: Need some help with Selenium - WallieA - Aug-15-2021 That works for me! Thank you very much. I was hoping that it was just a little modification in my script... RE: Need some help with Selenium - snippsat - Aug-16-2021 To answer you code question in PM here,as we want the knowledge to be available on the forum for all. WallieA Wrote:Is it easy to add also other lines which i want to scrap?It's not hard but you have look at source code and try to understand the structure in eg Chrome/FireFox DevTools. Try to get info for one event,if look at source so is that all in class="league-event-new" .So this line will get all and if look first element it will be like this. event = driver.find_elements_by_css_selector("div.league-event-new")Test. >>> event [<selenium.webdriver.remote.webelement.WebElement (session="4a5e97dcfedde6786f8a1605ae80a197", element="a147ce8a-4e93-4abc-9544-9e71c18d4389")>, <selenium.webdriver.remote.webelement.WebElement (session="4a5e97dcfedde6786f8a1605ae80a197", element="2c8db369-a8ff-4a90-a258-7b5ec678e5da")>, <selenium.webdriver.remote.webelement.WebElement (session="4a5e97dcfedde6786f8a1605ae80a197", element="48482681-b645-4570-8d64-e1ff9e12a089")>, <selenium.webdriver.remote.webelement.WebElement (session="4a5e97dcfedde6786f8a1605ae80a197", element="5c229435-dca3-41e7-8e2b-215e4d0e64a1")>, <selenium.webdriver.remote.webelement.WebElement (session="4a5e97dcfedde6786f8a1605ae80a197", element="2213fa4d-c397-4422-a5d1-7ac3932d5d2c")>, <selenium.webdriver.remote.webelement.WebElement (session="4a5e97dcfedde6786f8a1605ae80a197", element="8ebb073c-f525-4fe4-a93d-5c487680f302")>, <selenium.webdriver.remote.webelement.WebElement (session="4a5e97dcfedde6786f8a1605ae80a197", element="1dafeb95-5553-4bbf-a0e9-f5ae3fa79c9f")>, <selenium.webdriver.remote.webelement.WebElement (session="4a5e97dcfedde6786f8a1605ae80a197", element="91e0b62a-685c-4946-9cf6-819b6cce6150")>, <selenium.webdriver.remote.webelement.WebElement (session="4a5e97dcfedde6786f8a1605ae80a197", element="6a4c75c8-7ff9-4b90-b50a-2cdf2303454f")>, <selenium.webdriver.remote.webelement.WebElement (session="4a5e97dcfedde6786f8a1605ae80a197", element="7ac9212d-e42b-4df8-ad83-b0a5a5dd43e6")>, <selenium.webdriver.remote.webelement.WebElement (session="4a5e97dcfedde6786f8a1605ae80a197", element="eae95338-75f4-4d65-9a26-577c486ed46e")>, <selenium.webdriver.remote.webelement.WebElement (session="4a5e97dcfedde6786f8a1605ae80a197", element="ee561496-420b-4b4d-a7ff-f2ec9359d189")>, <selenium.webdriver.remote.webelement.WebElement (session="4a5e97dcfedde6786f8a1605ae80a197", element="ec6a9491-a801-4cce-8cff-3f216001ccb4")>, <selenium.webdriver.remote.webelement.WebElement (session="4a5e97dcfedde6786f8a1605ae80a197", element="bcbe06c3-8672-4be3-a654-197546b2f7a7")>, <selenium.webdriver.remote.webelement.WebElement (session="4a5e97dcfedde6786f8a1605ae80a197", element="bc350602-d530-4276-aa97-9cf11e3a2077")>, <selenium.webdriver.remote.webelement.WebElement (session="4a5e97dcfedde6786f8a1605ae80a197", element="f4e0306d-34df-4074-a70d-b09b5cf137f9")>, <selenium.webdriver.remote.webelement.WebElement (session="4a5e97dcfedde6786f8a1605ae80a197", element="19b5afb6-3206-4e80-beae-149d245cd105")>, <selenium.webdriver.remote.webelement.WebElement (session="4a5e97dcfedde6786f8a1605ae80a197", element="313861bb-50fd-4bd1-a5b7-6eeb1ba894d6")>, <selenium.webdriver.remote.webelement.WebElement (session="4a5e97dcfedde6786f8a1605ae80a197", element="ed793e19-a5b7-4f29-aa6c-a93075f261ae")>, <selenium.webdriver.remote.webelement.WebElement (session="4a5e97dcfedde6786f8a1605ae80a197", element="5762d704-7c0c-4170-9bbd-40a3d897273f")>, <selenium.webdriver.remote.webelement.WebElement (session="4a5e97dcfedde6786f8a1605ae80a197", element="9c26cfe8-697d-4da2-876a-baa6c0cbdb8f")>, <selenium.webdriver.remote.webelement.WebElement (session="4a5e97dcfedde6786f8a1605ae80a197", element="190d0053-bec2-47f7-9213-5b56bf2cf332")>, <selenium.webdriver.remote.webelement.WebElement (session="4a5e97dcfedde6786f8a1605ae80a197", element="571ac12c-d4e1-4b78-82da-45712723f312")>] >>> print(event[0].text) Friday 20 Aug 20:00 Football · Netherlands · Netherlands Eredivisie NEC - ZWOLLE 58 Bookmakers, 3905 odds 1 2.52 X 3.20 2 2.81 O 1.76 U 2.01 O/U 2.5So it's all there date,names,teams,odds...ect Can also copy CSS selector/XPath(right click over tag and copy) for Devtool to get a exact vault for a tag on site. odds = driver.find_elements_by_css_selector("#content > div:nth-child(5) > div:nth-child(4)")Test. >>> odds [<selenium.webdriver.remote.webelement.WebElement (session="4a5e97dcfedde6786f8a1605ae80a197", element="f6008c3c-6c39-4a82-8522-8d8ad49514e3")>] >>> odds[0].text '1 2.52\nX 3.20\n2 2.81' >>> print(odds[0].text) 1 2.52 X 3.20 2 2.81 RE: Need some help with Selenium - WallieA - Aug-16-2021 Thanks for your replies . I am a little bit further now, but not at the point that i wish. I will try to get closer to my goal with your help RE: Need some help with Selenium - WallieA - Aug-18-2021 Excuse me, but i have again a question. I am trying for at least 10 hours to scrape the data from betmonitor, but everytime when i think i've got a good python script, it doesn't work . With the code below, i try to get a Excel file with 5 columns: date, league, match, 1x2odd (3 seperatie columns) and ouodd (2 seperate columns). Can somebody please tell me what is wrong in the code below and why i don't get any data? Thanks a lot!!!! from selenium import webdriver import time import pandas as pd url = 'https://www.betmonitor.com/odds-comparison/football/germany-bundesliga/10000090' driver = webdriver.Chrome(executable_path='C:/webdrivers/chromedriver.exe') evt_details = [] driver.get(url) time.sleep(5) evt_list = driver.find_elements_by_css_selector('div.league-event-new') for evt_match in evt_list: # Getting match info evt_date = evt_match.find_elements_by_xpath('//div[@class="evtime"]')[0] evt_league = evt_match.find_elements_by_xpath('//div[@class="league"]')[0] evt_teams = evt_match.find_elements_by_xpath('//div[@class="teams"]')[0] for x in range(1, 20) evt_1x2odds = evt_match.find_elements_by_xpath('//*[@id="content"]/div[{x}]/div[4]"]')[0] evt_OUodds = evt_match.find_elements_by_xpath('//*[@id="content"]/div[{x}]/div[5]"]')[0] # Saving match info match_info = [evt_date.text, evt_league.text, evt_teams.text, evt_1x2odds.text, evt_OUodds.text] # Saving into evt details evt_details.append(match_info) driver.quit() evt_details_df = pd.DataFrame(evt_details) evt_details_df.columns = ['date', 'league', 'teams', 'odds 1x2', 'odds OU2.5'] evt_details_df.to_csv('evt_details.csv', index=False) RE: Need some help with Selenium - snippsat - Aug-18-2021 So from this line i would do some test my own way not using your code,do not loop before figure out the basic. event = driver.find_elements_by_css_selector("div.league-event-new") >>> data = event[0].text >>> data = data.split('\n') >>> data ['Friday', '20 Aug', '20:00', 'Football · Netherlands · Netherlands Eredivisie', 'NEC - ZWOLLE', '61 Bookmakers, 4329 odds', '1 2.50', 'X 3.22', '2 2.78', 'O 1.76', 'U 2.00', 'O/U 2.5'] >>> >>> data = list(zip(*[data[i::3] for i in range(3)])) >>> data [('Friday', '20 Aug', '20:00'), ('Football · Netherlands · Netherlands Eredivisie', 'NEC - ZWOLLE', '61 Bookmakers, 4329 odds'), ('1 2.50', 'X 3.22', '2 2.78'), ('O 1.76', 'U 2.00', 'O/U 2.5')] >>> >>> df = pd.DataFrame(data) >>> df = df.transpose() >>> df 0 1 2 3 0 Friday Football · Netherlands · Netherlands Eredivisie 1 2.50 O 1.76 1 20 Aug NEC - ZWOLLE X 3.22 U 2.00 2 20:00 61 Bookmakers, 4329 odds 2 2.78 O/U 2.5 >>> >>> df.rename(columns={0: "Date", 1: "Event", 2: "Odds_1", 3: "Odds_2"}, inplace=True) >>> df Date Event Odds_1 Odds_2 0 Friday Football · Netherlands · Netherlands Eredivisie 1 2.50 O 1.76 1 20 Aug NEC - ZWOLLE X 3.22 U 2.00 2 20:00 61 Bookmakers, 4329 odds 2 2.78 O/U 2.5So now have first row structure with column name(added) in same ways as shown on website.
|