Python Forum
missing append in a loop? - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html)
+--- Thread: missing append in a loop? (/thread-20896.html)



missing append in a loop? - zarize - Sep-05-2019

hi guys,

i got stucked on last thing to do my first basic scrapping...

The script is done in a loop but it is getting data only from one page instead of 50~. Why is like that? Am i missing some .append?

from bs4 import BeautifulSoup
import requests
import pandas as pd
import re
import urllib

headers = {
    'Sec-Fetch-Mode': 'cors',
    'Referer': 'https://www.pararius.com/apartments/amsterdam',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36',
    'Content-Type': 'text/plain',
}

data = '{"tags":[{"sizes":[{"width":728,"height":90},{"width":970,"height":250}],"primary_size":{"width":728,"height":90},"ad_types":["banner"],"uuid":"5f5a2718d3aa6d","id":11247563,"allow_smaller_sizes":false,"use_pmt_rule":false,"prebid":true,"disable_psa":true},{"sizes":[{"width":728,"height":90},{"width":970,"height":250}],"primary_size":{"width":728,"height":90},"ad_types":["banner"],"uuid":"66526a063a1a8c","id":11247564,"allow_smaller_sizes":false,"use_pmt_rule":false,"prebid":true,"disable_psa":true}],"sdk":{"source":"pbjs","version":"2.19.0-pre"},"gdpr_consent":{"consent_string":"BOmDsv2OmDsv2BQABBENCN-AAAAmd7_______9______5uz_Ov_v_f__33e8__9v_l_7_-___u_-3zd4-_1vf99yfm1-7etr3tp_87ues2_Xur__59__3z3_9phPrsk89ryw","consent_required":true},"referrer_detection":{"rd_ref":"https%3A%2F%2Fwww.pararius.com%2Fapartments%2Famsterdam","rd_top":true,"rd_ifs":1,"rd_stk":"https%3A%2F%2Fwww.pararius.com%2Fapartments%2Famsterdam,https%3A%2F%2Fwww.pararius.com%2Fapartments%2Famsterdam"}}'


#for n in range(1, num_pages):
page = 'https://www.pararius.com/apartments/amsterdam/page-1'

    
    
r = requests.get(page, headers=headers, data=data)
content = (r.text)
soup = BeautifulSoup(content, 'html.parser')


#pagination- find max pages
page1 = soup.find('ul', {'class': 'pagination'})
pages = page1.find_all('li')
last_page = pages[-3]
num_pages = last_page.find('a').text

fulldata = []

for n in range(1, int(num_pages)+1):
    page = 'https://www.pararius.com/apartments/amsterdam/page-' + str(n)
    print(page)


    for section in soup.find_all(class_='property-list-item-container'):
        dlink = section.find('a').get('href')
        type = section.find('span', {'class': 'type'}).text
        neighborhood = section.find('a').text.strip().split()[1]
        size = section.find('li', {'class': 'surface'}).text.strip().split()[0]
        bedrooms = section.find('li', {'class': 'surface'}).text.strip().split()[2]
        furniture = section.find('li', {'class': 'surface'}).text.strip().split()[4]
        if furniture == 'upholstered':
            furniture = "Unfurnished"
        elif furniture == 'furnished or upholstered':
            furniture = "Furnished & Unfurnished"
        #availablefrom = size = section.find('li', {'class': 'surface'}).text.strip().split()[6]
        price = section.find('p', {'class': 'price '}).text.strip().split()[0]
        curr = "EUR" if "€" in price else "other"

        data = {
            'Direct Link':[dlink],
            'Typee':[type],
            'Neighborhood':[neighborhood],
            'Size':[size],
            'Bedrooms':[bedrooms],
            'Furniture':[furniture],
            'Price':[price],
            'Currency':[curr]
            }
        fulldata.append({
        'Direct Link': dlink,
        'Typee': type,
        'Neighborhood': neighborhood,
        'Size': size,
        'Bedrooms': bedrooms,
        'Price': price,
        'Currency': curr
        })

    
print(fulldata)
df = pd.DataFrame(fulldata)

df.to_excel(r'C:\Users\user\Desktop\scrap_data\tests\test.xlsx')



RE: missing append in a loop? - buran - Sep-05-2019

You never do any requests in the loop. You need get request and convert respective response to soup object in the loop. Currently you work with same soup object created before the loop.


RE: missing append in a loop? - zarize - Sep-05-2019

(Sep-05-2019, 07:55 AM)buran Wrote: You never do any requests in the loop. You need get request and convert respective response to soup object in the loop. Currently you work with same soup object created before the loop.

easy as that!!!!!!!!

Thank you very much!!