Dec-10-2021, 08:49 AM
Fantastic!
This does the job. The next thing I need to do is grab the city, state and category for each listing. But it only appears in two places, the URL and at the top of the page. Then I will add them to the dictionary using RegEx.
This does the job. The next thing I need to do is grab the city, state and category for each listing. But it only appears in two places, the URL and at the top of the page. Then I will add them to the dictionary using RegEx.
PAGE = 0 while True: html = get_html(session, BASE_URL, PAGE) listings = get_listings(html) for listing in listings: print(listing['company'], listing['phone'], listing['rating'], end='\n') button = html.select('button.page-next.\@px-1.\@ml-1') if button[0].attrs.get('disabled') == 'disabled': break PAGE += 25I also have to read each URL from a file instead of hard coding it. In the following code, I am appending each state > city > category > listings then writing the rows to a CSV file. One question, how do I only write the column names one time?
def save_csv(listings, filename): filename = 'home-advisor-data-{}.csv'.format(state) with open(filename, 'a', encoding='utf-8', newline='') as file: writer = csv.writer(file, delimiter=',') writer.writerow(['Company', 'Phone Number', 'Rating']) #While paginating through each page of results, it will write these literal columns. #How do I avoid this? I only want these at the top, once. for listing in listings: writer.writerow( [listing['company'], listing['Phone_Number'], listing['Rating']])