Feb-19-2022, 01:52 PM
Hi everyone I am facing an issue. I am looking to cycle through webpages while printing the upcoming url and stop at a specific url, however when it starts only the second url prints. I used information from this url and this one
What modifications can I make to solve this issue? Thanks
What modifications can I make to solve this issue? Thanks
from lxml import etree import html5lib import requests from bs4 import BeautifulSoup url = "https://www.startpage.com" while True: request = requests.get(url) #Get URL server status soup = BeautifulSoup(request.content, 'html5lib') #Pass url content to Soup dom = etree.HTML(str(soup)) #Ini etree pages = dom.xpath('//*[@id="content-column"]/div[3]/div/div[6]/div/div/a[2]')[0].get("href") #Find Next Page URL print('pages',pages) nextpage = requests.get(pages) #Get New URL server status nextsoup = BeautifulSoup(nextpage.content, 'html5lib') #Pass New url content to NextSoup print(nextsoup) #Check to see if next page content is being viewed endpage = dom.xpath('//*[@id="content-column"]/div[3]/div/div[6]/div/div/a[2]')[0].get("href") print(dom.xpath('//*[@id="content-column"]/div[3]/div/div[6]/div/div/a[2]')[0].get("href")) #Print the link Pages if endpage is 'https://www.endpage.com': #Page to Stop break #Break out of loop