Python Forum
Code scrape more than one time information
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Code scrape more than one time information
#1
I'm beginner in python and webscraping. My objectif was to scrape 30 reviews from a tripadvisor restaurant. But when I open the file I have 301 reviews, the 30 reviews are repeated more than five times. Could you tell me what is wrong?... What am I missing? ... This is my code :
with requests.Session() as s:
        for offset in range(10,40):
            url = f'https://www.tripadvisor.fr/Restaurant_Review-g187147-d947475-Reviews-or{offset}-Le_Bouclard-Paris_Ile_de_France.html'
            r = s.get(url)
            soup = bs(r.content, 'lxml')
            reviews = soup.select('.reviewSelector')
            ids = [review.get('data-reviewid') for review in reviews]
            r = s.post(
                    'https://www.tripadvisor.fr/OverlayWidgetAjax?Mode=EXPANDED_HOTEL_REVIEWS_RESP&metaReferer=',
                    data = {'reviews': ','.join(ids), 'contextChoice': 'DETAIL'},
                    headers = {'referer': r.url}
                    )
              
            soup = bs(r.content, 'lxml')
            if not offset:
                inf_rest_name = soup.select_one('.heading').text.replace("\n","").strip()
                rest_eclf = soup.select_one('.header_links a').text.strip()
  
            for review in soup.select('.reviewSelector'):
                name_client = review.select_one('.info_text > div:first-child').text.strip()
                date_rev_cl = review.select_one('.ratingDate')['title'].strip()
                titre_rev_cl = review.select_one('.noQuotes').text.strip()
                opinion_cl = review.select_one('.partial_entry').text.replace("\n","").strip()
                row = [f"{inf_rest_name}", f"{rest_eclf}", f"{name_client}", f"{date_rev_cl}" , f"{titre_rev_cl}", f"{opinion_cl}"]
                w.writerow(row)
I tried to change the variable review for opinion_cl, because I thought that it was the error, but it shows me the same 301 reviews. I will appreciate your help.
Reply
#2
Your loop runs 30 times, once for each number between 10 and 40.

Every number 10-19 gets redirected to 10, 20-29 get redirected to 20, and 30-39 get redirected to 30.
This means you scrape each of those pages 10 times, geting 10 duplicates for each review.

Maybe you meant for your loop to be for offset in range(10, 40, 10): instead?
Reply
#3
Thank you so much! It works perfectly. So it want to say if I want to scrape from 220 to 890 reviews I have to put "for offset in rage(220,890,220), that's right?
Reply
#4
No, the third argument to range() is the step, which you want to be 10 (every tenth number).
Reply
#5
Great! thank you again!
Reply
#6
I have other question . I need other page who has at least 1000 reviewers. I ran the code at 10h40. Now it doesn't show information scraped and I tried to run again the code and it seems to be blcked. It doesn't answer. Is it normal? what can I do to unblock the code? and take information faster?
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  How do I scrape profile information from Twitter People search results? asdad 0 704 Nov-29-2022, 10:25 AM
Last Post: asdad
  Assistance with running a few lines of code at an EXACT time nethatar 5 3,167 Feb-24-2021, 10:43 PM
Last Post: nilamo
  Stumped by my own code (ratio & epoch-time calculation). MvGulik 2 2,090 Dec-30-2020, 12:04 AM
Last Post: MvGulik
  Code taking too much time to process ErPipex 11 4,820 Nov-16-2020, 09:42 AM
Last Post: DeaD_EyE
  What is the run time complexity of this code and please explain? samlee916 2 2,259 Nov-06-2020, 02:37 PM
Last Post: deanhystad
  The count variable is giving me a hard time in this code D4isyy 2 1,929 Aug-09-2020, 10:32 PM
Last Post: bowlofred
  Having a hard time combining two parts of code. Coozeki 6 3,012 May-10-2020, 06:50 AM
Last Post: Coozeki
  Parsing Date/Time from Metar Reports with 6 hourly weather information Lawrence 0 2,291 May-03-2020, 08:15 PM
Last Post: Lawrence
  How to avoid open and save a url every time I run code davidm 4 2,585 Mar-03-2020, 10:37 PM
Last Post: snippsat
  Help to reduce time to execute the code prakash52kar 1 2,196 Oct-14-2019, 10:56 AM
Last Post: scidam

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020