Hi folks,
I'm new to python and to this forum. Background, I started coding recently to make my own life easier by automating as much in my life as possible. As a result, I don't have much experience but I am doing my best to catch up.
I made a web crawler to extract info about houses for sale. Each page on the housing site contains 15 houses.
(note that i have added some spaces in the web url, otherwise i could not make a forum post)
1. My output is not correct, I should get the location price and size returned for each listing per page. I know that probably my 3 " for" loops are not correct, Ii have tried several things but I am getting many different variants of solutions that are not correct.
2. Currently the information for "Location" and " Size" comes together.
Location: Slaak 1123061 CZ Rotterdam;
Price: € 175.000 k.k.;
Size: 135 m²/171 m²5 kamers
I would prefer to extract this separately:
Slaak 112
3061 CZ Rotterdam
€ 175.000 k.k.
135 m²/171 m²
5 kamers
Any tips are welcome. All help is much appreciated!
I'm new to python and to this forum. Background, I started coding recently to make my own life easier by automating as much in my life as possible. As a result, I don't have much experience but I am doing my best to catch up.
I made a web crawler to extract info about houses for sale. Each page on the housing site contains 15 houses.
(note that i have added some spaces in the web url, otherwise i could not make a forum post)
import requests from bs4 import BeautifulSoup def fundaSpider(max_pages): page = 1 while page <= max_pages: url = 'ht tp://www. funda. nl/koop/rotterdam/p' + str(page) source_code = requests.get(url) plain_text = source_code.text soup = BeautifulSoup(plain_text, 'html.parser') for x in soup.find_all('h3', {"class": "search-result-title"}): location = x.get_text(strip=True) for y in soup.find_all('div', {'class':'search-result-info search-result-info-price'}): price = y.get_text(strip=True) for z in soup.find_all('ul', {'class': 'search-result-kenmerken'}): size = z.get_text(strip=True) print(location +";" + price +";"+ size) page += 1 fundaSpider(2)The output of this code returns the "location" and the " price" of the first listning for each of the 15 listings per page and the " size" correctly for each of the 15 listings per page.
Output:Slaak 1123061 CZ Rotterdam;€ 175.000 k.k.;67 m²/138 m²3 kamers
Slaak 1123061 CZ Rotterdam;€ 175.000 k.k.;135 m²/171 m²5 kamers
Slaak 1123061 CZ Rotterdam;€ 175.000 k.k.;102 m²/102 m²5 kamers
Slaak 1123061 CZ Rotterdam;€ 175.000 k.k.;143 m²/127 m²5 kamers
Slaak 1123061 CZ Rotterdam;€ 175.000 k.k.;90 m²/131 m²4 kamers
Slaak 1123061 CZ Rotterdam;€ 175.000 k.k.;125 m²/270 m²5 kamers
Slaak 1123061 CZ Rotterdam;€ 175.000 k.k.;76 m²/317 m²3 kamers
Slaak 1123061 CZ Rotterdam;€ 175.000 k.k.;245 m²/190 m²6 kamers
Slaak 1123061 CZ Rotterdam;€ 175.000 k.k.;225 m²/709 m²6 kamers
Slaak 1123061 CZ Rotterdam;€ 175.000 k.k.;123 m²/103 m²6 kamers
Slaak 1123061 CZ Rotterdam;€ 175.000 k.k.;125 m²/153 m²5 kamers
Slaak 1123061 CZ Rotterdam;€ 175.000 k.k.;92 m²/101 m²4 kamers
Slaak 1123061 CZ Rotterdam;€ 175.000 k.k.;80 m²/86 m²4 kamers
Slaak 1123061 CZ Rotterdam;€ 175.000 k.k.;160 m²5 kamers
Slaak 1123061 CZ Rotterdam;€ 175.000 k.k.;66 m²3 kamers
Geelkruid 953068 DT Rotterdam;€ 269.000 k.k.;180 m²5 kamers
Geelkruid 953068 DT Rotterdam;€ 269.000 k.k.;76 m²2 kamers
Geelkruid 953068 DT Rotterdam;€ 269.000 k.k.;64 m²3 kamers
Geelkruid 953068 DT Rotterdam;€ 269.000 k.k.;99 m²4 kamers
Geelkruid 953068 DT Rotterdam;€ 269.000 k.k.;46 m²2 kamers
Geelkruid 953068 DT Rotterdam;€ 269.000 k.k.;120 m²4 kamers
Geelkruid 953068 DT Rotterdam;€ 269.000 k.k.;103 m²5 kamers
Geelkruid 953068 DT Rotterdam;€ 269.000 k.k.;83 m²2 kamers
Geelkruid 953068 DT Rotterdam;€ 269.000 k.k.;83 m²3 kamers
Geelkruid 953068 DT Rotterdam;€ 269.000 k.k.;82 m²3 kamers
Geelkruid 953068 DT Rotterdam;€ 269.000 k.k.;68 m²3 kamers
Geelkruid 953068 DT Rotterdam;€ 269.000 k.k.;85 m²3 kamers
Geelkruid 953068 DT Rotterdam;€ 269.000 k.k.;110 m²/160 m²5 kamers
Geelkruid 953068 DT Rotterdam;€ 269.000 k.k.;129 m²/224 m²6 kamers
Geelkruid 953068 DT Rotterdam;€ 269.000 k.k.;125 m²/142 m²5 kamers
Below I have included the HTML code of the webpage for one of the listings.Output:</div>
</a>
</div>
<div class="search-result-content">
<div class="search-result-content-inner">
<div class="search-result-header">
<a href="/koop/rotterdam/huis-85488249-scottstraat-3/" data-search-result-item-anchor="85488249">
<h3 class="search-result-title">
[color=#ff3333]Scottstraat 3[/color]
<small class="search-result-subtitle">
[color=#ff3333]3076 GX Rotterdam[/color]
</small>
</h3>
</a>
</div> <div class="search-result-info search-result-info-price">
<span class="search-result-price">[color=#ff3333]€ 165.000 k.k.[/color]</span>
</div>
<div class="search-result-info">
<ul class="search-result-kenmerken ">
<li>
<span title="Woonoppervlakte">[color=#ff3333]67 m²[/color]</span>
/
<span title="Perceeloppervlakte">138 m²</span>
</li>
<li>[color=#ff3333]3 kamers[/color]</li>
</ul>
</div>
My questions:1. My output is not correct, I should get the location price and size returned for each listing per page. I know that probably my 3 " for" loops are not correct, Ii have tried several things but I am getting many different variants of solutions that are not correct.
2. Currently the information for "Location" and " Size" comes together.
Location: Slaak 1123061 CZ Rotterdam;
Price: € 175.000 k.k.;
Size: 135 m²/171 m²5 kamers
I would prefer to extract this separately:
Slaak 112
3061 CZ Rotterdam
€ 175.000 k.k.
135 m²/171 m²
5 kamers
Any tips are welcome. All help is much appreciated!