Aug-25-2019, 12:57 PM
I'm beginner in python and webscraping. My objectif was to scrape 30 reviews from a tripadvisor restaurant. But when I open the file I have 301 reviews, the 30 reviews are repeated more than five times. Could you tell me what is wrong?... What am I missing? ... This is my code :
with requests.Session() as s: for offset in range(10,40): url = f'https://www.tripadvisor.fr/Restaurant_Review-g187147-d947475-Reviews-or{offset}-Le_Bouclard-Paris_Ile_de_France.html' r = s.get(url) soup = bs(r.content, 'lxml') reviews = soup.select('.reviewSelector') ids = [review.get('data-reviewid') for review in reviews] r = s.post( 'https://www.tripadvisor.fr/OverlayWidgetAjax?Mode=EXPANDED_HOTEL_REVIEWS_RESP&metaReferer=', data = {'reviews': ','.join(ids), 'contextChoice': 'DETAIL'}, headers = {'referer': r.url} ) soup = bs(r.content, 'lxml') if not offset: inf_rest_name = soup.select_one('.heading').text.replace("\n","").strip() rest_eclf = soup.select_one('.header_links a').text.strip() for review in soup.select('.reviewSelector'): name_client = review.select_one('.info_text > div:first-child').text.strip() date_rev_cl = review.select_one('.ratingDate')['title'].strip() titre_rev_cl = review.select_one('.noQuotes').text.strip() opinion_cl = review.select_one('.partial_entry').text.replace("\n","").strip() row = [f"{inf_rest_name}", f"{rest_eclf}", f"{name_client}", f"{date_rev_cl}" , f"{titre_rev_cl}", f"{opinion_cl}"] w.writerow(row)I tried to change the variable review for opinion_cl, because I thought that it was the error, but it shows me the same 301 reviews. I will appreciate your help.