Jan-16-2020, 09:11 PM
Hi, I am new to web scraping and have just managed to write my first working script. However it is only able to extract data from the first page. I have not been able to apply solutions offered online successfully. Will be might glad if someone can assist me write a complete scrip that extracts data from all pages. below is my current working script
from urllib.request import urlopen as uReq from bs4 import BeautifulSoup as soup my_url = 'https://www.merchantcircle.com/search?q=self-storage&qn=' #opens connection, grabbing the page uClient = uReq(my_url) page_html = uClient.read() uClient.close() #page parser page_soup = soup(page_html, "html.parser") businesses = page_soup.findAll("div",{"class":"hInfo vcard"}) filename = "storage.csv" f = open(filename, "w") #headers = "brand, product_name, price, shipping\n" headers = "biz_name, biz_address, biz_phone_num\n" f.write(headers) for business in businesses: #grabs business name biz_name = business.h2.a.text.strip() #grabs business address address = business.find("a",{"class":"directions"}) biz_address = address.text.strip() #grabs phone number phone_num = business.find("a",{"class":"phone digits tel"}) biz_phone_num = phone_num.text.strip() print("biz_name: " + biz_name) print("biz_address: " + biz_address) print("biz_phone_num: " + biz_phone_num) f.write(biz_name + "," + biz_address.replace(",", "|") + "," + biz_phone_num + "\n") f.close()