Website Scraping Problems

JamesWilson · (This post was last modified: Sep-26-2024, 10:28 AM by JamesWilson.)

Hello everyone. I'm having trouble with web scraping and can't determine why it's failing. I'm using XPath and BeautifulSoup to extract the next URL, but it doesn't seem to work. What could I be doing wrong?

import requests
from lxml import etree
import html5lib
from bs4 import BeautifulSoup
from urllib.parse import urljoin
import time, re
import csv
import time

start = time.time()

print('Starting Program')
base = "https://pokiesman.net/"
url = "https://pokiesman.net/real-money-pokies/"

while True:
    request = requests.get(urljoin(base, url)) # Get URL server status
    soup = BeautifulSoup(request.content, 'html5lib') # Pass URL content to Soup

    dom = etree.HTML(str(soup)) # Initialize etree
    url = dom.xpath('//a[@class="next-page-link"]/@href') # Find Next Page URL
    url2 = urljoin(base, url)

    urltest2 = soup.find_all("span", class_="game-title") # Find next URL
    print('Test First URL', url2, ' Test number 2 ', urltest2)

**Larz60+** · Jul-01-2024, 09:46 AM

The following code will get you all of the links:

from bs4 import BeautifulSoup
import requests

url = "https://pokiesman.com/real-money-pokies/"
response = requests.get(url)

soup = BeautifulSoup(response.content, 'lxml')
links = soup.find_all('a')
for link in links:
    print(link)

returns:

Output:<a href="/"> <picture class="render-image flex flex-align-center flex-justify-center picture-image-logo"> <source srcset="https://pokiesman.com/wp-content/themes/pokiesman-com/img/logo.svg" type="image/svg+xml"/> <img alt="" class="no-lazy logo" height="57" loading="lazy" src="https://pokiesman.com/wp-content/themes/pokiesman-com/img/logo.svg" width="67"/> </picture> </a>
<a href="https://pokiesman.com/free-pokies/">Free Pokies</a>
<a href="#">Software</a>
<a href="https://pokiesman.com/aristocrat/">Aristocrat Pokies</a>
<a href="https://pokiesman.com/ainsworth/">Ainsworth</a>
<a href="https://pokiesman.com/pragmatic-play/">Pragmatic Play</a>
<a href="https://pokiesman.com/bally/">Bally</a>
<a href="https://pokiesman.com/igt/">IGT</a>
<a href="https://pokiesman.com/konami/">Konami</a>
<a href="https://pokiesman.com/playtech/">Playtech</a>
<a href="https://pokiesman.com/microgaming/">Microgaming</a>
<a href="https://pokiesman.com/wms/">WMS</a>
<a href="https://pokiesman.com/online-casinos/">Online Casinos Australia</a>
<a href="#">Other Pokies</a>
<a href="https://pokiesman.com/no-deposit-free-spins-pokies/">No Deposit Free Spins</a>
<a href="https://pokiesman.com/mobile-pokies/">Mobile Pokies</a>
<a href="https://pokiesman.com/new-pokies/">New Pokies</a>
<a href="https://pokiesman.com/offline-pokies/">Offline Pokies</a>
<a href="/" itemprop="item"> <span itemprop="name">Home</span> </a>
<a class="btn btn-orange" href="https://pokiesman.com/go/richard-casino/" rel="nofollow" target="_blank"> <span>PLAY NOW</span> </a>
<a class="btn btn-orange" href="https://pokiesman.com/go/wanted-win/" rel="nofollow" target="_blank"> <span>PLAY NOW</span> </a>
<a class="btn btn-orange" href="/go/staycasino/" rel="nofollow" target="_blank"> <span>PLAY NOW</span> </a>
<a class="btn btn-orange" href="https://pokiesman.com/go/neospin/ " rel="nofollow" target="_blank"> <span>PLAY NOW</span> </a>
<a class="btn btn-orange" href="https://pokiesman.com/go/dundeeslots/" rel="nofollow" target="_blank"> <span>PLAY NOW</span> </a>
<a class="btn btn-orange" href="https://pokiesman.com/go/casinonic/" rel="nofollow" target="_blank"> <span>PLAY NOW</span> </a>
<a class="btn btn-orange" href="https://pokiesman.com/go/lunubet/" rel="nofollow" target="_blank"> <span>PLAY NOW</span> </a>
<a href="https://pokiesman.com/">best Australian online pokies reviews</a>
<a href="https://pokiesman.com/free-pokies/">free play pokies</a>
<a href="https://pokiesman.com/mobile-pokies/">free pokies for mobiles.</a>
<a href="https://pokiesman.com/aristocrat/">Aristocrat free pokies.</a>
<a href="https://pokiesman.com/new-pokies/">new online pokies with real money</a>
<a href="https://pokiesman.com/no-deposit-free-spins-pokies/">No deposit free spins pokies:</a>
<a href="https://pokiesman.com/free-pokies/lightning-link/">Lightning Link</a>
<a href="https://pokiesman.com/free-pokies/lightning-link/">Lightning Link</a>
<a href="https://pokiesman.com/free-pokies/wheres-the-gold/">Where’s the Gold</a>
<a href="https://pokiesman.com/free-pokies/dragon-link/">Dragon Link</a>
<a href="https://pokiesman.com/free-pokies/5-dragons/">5 Dragons</a>
<a href="https://pokiesman.com/free-pokies/big-red/">Big Red</a>
<a href="https://pokiesman.com/responsible-gambling/">Responsible Gambling</a>
<a href="https://pokiesman.com/privacy-policy/">Privacy Policy</a>
<a href="https://pokiesman.com/contact-us/">Contact Us</a>
<a href="https://pokiesman.com/our-team/">Our Team</a>
<a href="https://pokiesman.com/sitemap/">Sitemap</a>

I don't see any link with a class named 'next-page-link'

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	web scraping for new additions/modifed website?	kingoman123	4	3,219	Apr-14-2022, 04:46 PM Last Post: snippsat
	Scraping lender data from Ren Ren Dai website using Python. I will pay for that 200$	Hafedh_2021	1	3,441	May-18-2021, 08:41 PM Last Post: snippsat
	Scraping all website text using Python	MKMKMKMK	1	2,771	Nov-26-2020, 10:35 PM Last Post: Larz60+
	Scraping a Website (HELP)	LearnPython2	1	2,359	May-08-2020, 03:20 PM Last Post: Larz60+
	scraping from a website that hides source code	PIWI_Protein	1	2,666	Mar-27-2020, 05:08 PM Last Post: Larz60+
	Scraping not moving to the next pages in a website	jithin123	0	2,472	Mar-23-2020, 06:10 PM Last Post: jithin123
	Scraping problems with Python requests.	gtlhbkkj	1	2,385	Jan-22-2020, 11:00 AM Last Post: gtlhbkkj
	Scraping problems. Pls help with a correct request query.	gtlhbkkj	0	1,907	Oct-09-2019, 12:00 PM Last Post: gtlhbkkj
	Scraping problems. Pls help with a correct request query.	gtlhbkkj	6	4,217	Oct-01-2019, 09:22 PM Last Post: gtlhbkkj
	Random Loss of Control of Website When Scraping	bmccollum	0	2,013	Aug-30-2019, 04:04 AM Last Post: bmccollum

Website Scraping Problems

User Panel Messages

Announcements