Jan-11-2019, 09:38 AM
Hi all.
I'm experiencing a problem while scraping information from this URL.
The problem arises because mechanize changes the hours while retrieving the html source code. Any hour has a delay of -1 hours. I think it might depend on some local configuration on my system (I live in Italy and the site might have another time zone).
This being said, I could not solve the problem and ask for some help :)
This is a brief working extract of my code
I'm experiencing a problem while scraping information from this URL.
The problem arises because mechanize changes the hours while retrieving the html source code. Any hour has a delay of -1 hours. I think it might depend on some local configuration on my system (I live in Italy and the site might have another time zone).
This being said, I could not solve the problem and ask for some help :)
This is a brief working extract of my code
from __future__ import print_function from bs4 import BeautifulSoup import regex as re import mechanize from datetime import datetime URL_PAGE = 'https://www.myfxbook.com/forex-economic-calendar' # retrieve html code br = mechanize.Browser() br.set_handle_robots(False) br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')] html_content = br.open(URL_PAGE).read() # soup soup = BeautifulSoup(html_content, "html.parser") #regex for extraction cal_row_re = re.compile(r'^calRow.*') # <-- name date_re = re.compile(r'\w+\s?\d+:\d+') # <-- date #extracting events CalEvents = soup.find_all(id=cal_row_re) for singleEvent in CalEvents: date = singleEvent.find(text=date_re).strip() eventName = singleEvent.find(class_='noUnderline').get_text().strip() print(date, eventName, sep = ';')Thank you in advance