Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Website Scraping Problems
#1
Hello everyone. I'm having trouble with web scraping and can't determine why it's failing. I'm using XPath and BeautifulSoup to extract the next URL, but it doesn't seem to work. What could I be doing wrong?

import requests
from lxml import etree
import html5lib
from bs4 import BeautifulSoup
from urllib.parse import urljoin
import time, re
import csv
import time

start = time.time()

print('Starting Program')
base = "https://pokiesman.com/"
url = "https://pokiesman.com/real-money-pokies/"

while True:
    request = requests.get(urljoin(base, url)) # Get URL server status
    soup = BeautifulSoup(request.content, 'html5lib') # Pass URL content to Soup

    dom = etree.HTML(str(soup)) # Initialize etree
    url = dom.xpath('//a[@class="next-page-link"]/@href') # Find Next Page URL
    url2 = urljoin(base, url)

    urltest2 = soup.find_all("span", class_="game-title") # Find next URL
    print('Test First URL', url2, ' Test number 2 ', urltest2)
Reply
#2
The following code will get you all of the links:
from bs4 import BeautifulSoup
import requests

url = "https://pokiesman.com/real-money-pokies/"
response = requests.get(url)

soup = BeautifulSoup(response.content, 'lxml')
links = soup.find_all('a')
for link in links:
    print(link)
returns:
Output:
<a href="/"> <picture class="render-image flex flex-align-center flex-justify-center picture-image-logo"> <source srcset="https://pokiesman.com/wp-content/themes/pokiesman-com/img/logo.svg" type="image/svg+xml"/> <img alt="" class="no-lazy logo" height="57" loading="lazy" src="https://pokiesman.com/wp-content/themes/pokiesman-com/img/logo.svg" width="67"/> </picture> </a> <a href="https://pokiesman.com/free-pokies/">Free Pokies</a> <a href="#">Software</a> <a href="https://pokiesman.com/aristocrat/">Aristocrat Pokies</a> <a href="https://pokiesman.com/ainsworth/">Ainsworth</a> <a href="https://pokiesman.com/pragmatic-play/">Pragmatic Play</a> <a href="https://pokiesman.com/bally/">Bally</a> <a href="https://pokiesman.com/igt/">IGT</a> <a href="https://pokiesman.com/konami/">Konami</a> <a href="https://pokiesman.com/playtech/">Playtech</a> <a href="https://pokiesman.com/microgaming/">Microgaming</a> <a href="https://pokiesman.com/wms/">WMS</a> <a href="https://pokiesman.com/online-casinos/">Online Casinos Australia</a> <a href="#">Other Pokies</a> <a href="https://pokiesman.com/no-deposit-free-spins-pokies/">No Deposit Free Spins</a> <a href="https://pokiesman.com/mobile-pokies/">Mobile Pokies</a> <a href="https://pokiesman.com/new-pokies/">New Pokies</a> <a href="https://pokiesman.com/offline-pokies/">Offline Pokies</a> <a href="/" itemprop="item"> <span itemprop="name">Home</span> </a> <a class="btn btn-orange" href="https://pokiesman.com/go/richard-casino/" rel="nofollow" target="_blank"> <span>PLAY NOW</span> </a> <a class="btn btn-orange" href="https://pokiesman.com/go/wanted-win/" rel="nofollow" target="_blank"> <span>PLAY NOW</span> </a> <a class="btn btn-orange" href="/go/staycasino/" rel="nofollow" target="_blank"> <span>PLAY NOW</span> </a> <a class="btn btn-orange" href="https://pokiesman.com/go/neospin/ " rel="nofollow" target="_blank"> <span>PLAY NOW</span> </a> <a class="btn btn-orange" href="https://pokiesman.com/go/dundeeslots/" rel="nofollow" target="_blank"> <span>PLAY NOW</span> </a> <a class="btn btn-orange" href="https://pokiesman.com/go/casinonic/" rel="nofollow" target="_blank"> <span>PLAY NOW</span> </a> <a class="btn btn-orange" href="https://pokiesman.com/go/lunubet/" rel="nofollow" target="_blank"> <span>PLAY NOW</span> </a> <a href="https://pokiesman.com/">best Australian online pokies reviews</a> <a href="https://pokiesman.com/free-pokies/">free play pokies</a> <a href="https://pokiesman.com/mobile-pokies/">free pokies for mobiles.</a> <a href="https://pokiesman.com/aristocrat/">Aristocrat free pokies.</a> <a href="https://pokiesman.com/new-pokies/">new online pokies with real money</a> <a href="https://pokiesman.com/no-deposit-free-spins-pokies/">No deposit free spins pokies:</a> <a href="https://pokiesman.com/free-pokies/lightning-link/">Lightning Link</a> <a href="https://pokiesman.com/free-pokies/lightning-link/">Lightning Link</a> <a href="https://pokiesman.com/free-pokies/wheres-the-gold/">Where’s the Gold</a> <a href="https://pokiesman.com/free-pokies/dragon-link/">Dragon Link</a> <a href="https://pokiesman.com/free-pokies/5-dragons/">5 Dragons</a> <a href="https://pokiesman.com/free-pokies/big-red/">Big Red</a> <a href="https://pokiesman.com/responsible-gambling/">Responsible Gambling</a> <a href="https://pokiesman.com/privacy-policy/">Privacy Policy</a> <a href="https://pokiesman.com/contact-us/">Contact Us</a> <a href="https://pokiesman.com/our-team/">Our Team</a> <a href="https://pokiesman.com/sitemap/">Sitemap</a>
I don't see any link with a class named 'next-page-link'
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  web scraping for new additions/modifed website? kingoman123 4 2,412 Apr-14-2022, 04:46 PM
Last Post: snippsat
  Scraping lender data from Ren Ren Dai website using Python. I will pay for that 200$ Hafedh_2021 1 2,847 May-18-2021, 08:41 PM
Last Post: snippsat
  Scraping all website text using Python MKMKMKMK 1 2,215 Nov-26-2020, 10:35 PM
Last Post: Larz60+
  Scraping a Website (HELP) LearnPython2 1 1,900 May-08-2020, 03:20 PM
Last Post: Larz60+
  scraping from a website that hides source code PIWI_Protein 1 2,076 Mar-27-2020, 05:08 PM
Last Post: Larz60+
  Scraping not moving to the next pages in a website jithin123 0 2,057 Mar-23-2020, 06:10 PM
Last Post: jithin123
  Scraping problems with Python requests. gtlhbkkj 1 1,966 Jan-22-2020, 11:00 AM
Last Post: gtlhbkkj
  Scraping problems. Pls help with a correct request query. gtlhbkkj 0 1,594 Oct-09-2019, 12:00 PM
Last Post: gtlhbkkj
  Scraping problems. Pls help with a correct request query. gtlhbkkj 6 3,286 Oct-01-2019, 09:22 PM
Last Post: gtlhbkkj
  Random Loss of Control of Website When Scraping bmccollum 0 1,603 Aug-30-2019, 04:04 AM
Last Post: bmccollum

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020