Python Forum

Full Version: Looping through multiple pages with changing url
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi, I am new to web scraping and have just managed to write my first working script. However it is only able to extract data from the first page. I have not been able to apply solutions offered online successfully. Will be might glad if someone can assist me write a complete scrip that extracts data from all pages. below is my current working script

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url = 'https://www.merchantcircle.com/search?q=self-storage&qn='

#opens connection, grabbing the page
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

#page parser
page_soup = soup(page_html, "html.parser")

businesses = page_soup.findAll("div",{"class":"hInfo vcard"})


filename = "storage.csv"
f = open(filename, "w")

#headers = "brand, product_name, price, shipping\n"
headers = "biz_name, biz_address, biz_phone_num\n"

f.write(headers)

for business in businesses:
		

	#grabs business name
	biz_name = business.h2.a.text.strip()

	#grabs business address
	address = business.find("a",{"class":"directions"})
	biz_address = address.text.strip()	


	#grabs phone number
	phone_num = business.find("a",{"class":"phone digits tel"})
	biz_phone_num = phone_num.text.strip()



	print("biz_name: " + biz_name)
	print("biz_address: " + biz_address)
	print("biz_phone_num: " + biz_phone_num)

	f.write(biz_name + "," + biz_address.replace(",", "|") + "," + biz_phone_num + "\n")

f.close()
I'm not sure what links you are interested in, but using your first result 'businesses' on line 14 (for example)
You can pull the link to kabbage by adding after line 14
next_url = businesses.h2.a.get('href')
and review link for same using review_url = businesses.div.a.get('href')
I'd also use requests rather that urllib.
see examples in the following two threads:
web scraping part1
web scraping part2
Oh, sorry for the omission. I'm interested in the page links so that I'm able to extract all 65000+ records. Currently only able to extract 21 records in page one.