Jun-29-2021, 10:22 PM
I am trying to scrape all thee events from https://www.onthisday.com/events/february/5 I am getting all the events from first page.How can I get other events from the second page and merge into one list?
Right now I tried to catch the next page link and parse it but it didn't work still getting the results from first page.
Here is my code:
Some pages contains the next page and some don't so I am looking to handle both the situations.
Right now I tried to catch the next page link and parse it but it didn't work still getting the results from first page.
Here is my code:
from typing import List import requests as _requests import bs4 as _bs4 def _generate_url(month: str, day: int) -> str: url = f'https://www.onthisday.com/events/{month}/{day}' return url def _get_page(url: str) -> _bs4.BeautifulSoup: _page = _requests.get(url) soup = _bs4.BeautifulSoup(_page.content, 'html.parser') return soup def events_of_the_day(month: str, day: int) -> List[str]: """ Return the events of a given day """ url = _generate_url(month, day) page = _get_page(url) next_link = page.select_one("a.pag__next") raw_events = [event.text for event in page.select("li.event")] if next_link: next_url = 'https://www.onthisday.com/events'+next_link['href'] page_next = _get_page(next_url) for eve in page_next.select("li.event"): print(eve.text) #print(raw_events) events_of_the_day("february", 5)Note:
Some pages contains the next page and some don't so I am looking to handle both the situations.