Quote:See my code below. Does anybody know how i can print headers only in the first row of the csv file? Many thanks!csv is a little messy and hard to work with this kind of data.
I would put it in a dictionary structure,the can use json to serialize to and from disk.
Eg.
import requests from bs4 import BeautifulSoup open('output.csv', 'w').close() import re def fundaSpider(max_pages): page = 1 d = {} while page <= max_pages: url = 'http://www.funda.nl/koop/rotterdam/p{}'.format(page) source_code = requests.get(url) plain_text = source_code.text soup = BeautifulSoup(plain_text, 'html.parser') ads = soup.find_all('li', {'class': 'search-result'}) for ad in ads: title = ad.find('h3') title = ' '.join(title.get_text(separator='\n', strip=True).split()[ :-3]) address = ad.find('small').text.strip() price = ad.find('div', {'class': 'search-result-info search-result-info-price'}).text.strip() price = re.findall(r'\d', price) price = ''.join(price) size_results = ad.find('ul', {'class': 'search-result-kenmerken'}) li = size_results.find_all('li') size = li[0] size = size.get_text(strip=True) size = size.split(" ")[0] room = li[1].text.strip() room = room.split(" ")[0] href = ('http://www.funda.nl' + ad.find_all('a')[2]['href']) area = get_single_item_data(href) d[title] = address,price,href print(d) page += 1 def get_single_item_data(item_url): source_code = requests.get(item_url) plain_text = source_code.text soup = BeautifulSoup(plain_text, 'html.parser') li = soup.find_all('li', {'class': 'breadcrumb-listitem'}) return (li[2].a.text) fundaSpider(1)So the structure can be chosen in many ways.
Here i choose title as key and rest as tuple.
Eg:
>>> d = {'Scottstraat 3': ('3076 GX Rotterdam', '165000', 'http://www.funda.nl/koop/rotterdam/huis-85488249-scottstraat-3/')} >>> d['Scottstraat 3'] ('3076 GX Rotterdam', '165000', 'http://www.funda.nl/koop/rotterdam/huis-85488249-scottstraat-3/') >>> d['Scottstraat 3'][2] 'http://www.funda.nl/koop/rotterdam/huis-85488249-scottstraat-3/'The advantages now is that can use json and serialize to disk.
import json d = {'Scottstraat 3': ('3076 GX Rotterdam', '165000', 'http://www.funda.nl/koop/rotterdam/huis-85488249-scottstraat-3/')} with open("my_file.json", "w") as j_in: json.dump(d, j_in) with open("my_file.json") as j_out: saved_data = json.load(j_out)
Output:# It come out as the same working dictionary
print(saved_data)
{'Scottstraat 3': ('3076 GX Rotterdam', '165000', 'http://www.funda.nl/koop/rotterdam/huis-85488249-scottstraat-3/')}
As @wavic suggests is fine.Also annoying errors in web-scraping can be passed out.
As long this not pass out data that can be needed
![Hand Hand](https://python-forum.io/images/smilies/eusa_hand.gif)
try: room = li[1].text.strip() except IndexError: pass