Jul-01-2024, 07:40 AM
(This post was last modified: Sep-26-2024, 10:28 AM by JamesWilson.)
Hello everyone. I'm having trouble with web scraping and can't determine why it's failing. I'm using XPath and BeautifulSoup to extract the next URL, but it doesn't seem to work. What could I be doing wrong?
import requests from lxml import etree import html5lib from bs4 import BeautifulSoup from urllib.parse import urljoin import time, re import csv import time start = time.time() print('Starting Program') base = "https://pokiesman.net/" url = "https://pokiesman.net/real-money-pokies/" while True: request = requests.get(urljoin(base, url)) # Get URL server status soup = BeautifulSoup(request.content, 'html5lib') # Pass URL content to Soup dom = etree.HTML(str(soup)) # Initialize etree url = dom.xpath('//a[@class="next-page-link"]/@href') # Find Next Page URL url2 = urljoin(base, url) urltest2 = soup.find_all("span", class_="game-title") # Find next URL print('Test First URL', url2, ' Test number 2 ', urltest2)