Help with my coding

elgreedy · (This post was last modified: Oct-04-2024, 07:53 AM by Larz60+.)

Hello everyone,

I am noob in python so I need your help!
I tried my best to retrieve all the information of the houses in this website with a webscraper.
It does work with all the information except for the prices and I really need it.

Here is my code:

import requests
import pandas as pd
from bs4 import BeautifulSoup

# URL of the page
url = "https://www.immoweb.be/fr/recherche/maison/a-vendre?countries=BE&page=1&orderBy=relevance"

# Send HTTP request
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')

# Find all article links
articles = soup.find_all('a', class_='card-link')  # Find all article links

data = []

# Iterate over each article and extract the information
for article in articles:
    article_url = article['href']
    
    # Send a request to the article URL
    article_response = requests.get(article_url)
    article_soup = BeautifulSoup(article_response.content, 'html.parser')
    
    # Extract the information
    try:
        title = article_soup.find('h1').text.strip()  # Title
        
        # Find the price container
        price_container = article_soup.find('div', class_='price')  # Adjust based on actual class name

        # Extract visible price (aria-hidden="true")
        visible_price = price_container.find('span', attrs={'aria-hidden': 'true'}).text.strip()

        # Extract hidden price (sr-only class)
        hidden_price = price_container.find('span', class_='sr-only').text.strip()

        description = article_soup.find('div', class_='description').text.strip()  # Description
        
        data.append({
            'title': title,
            'visible_price': visible_price,
            'hidden_price': hidden_price,
            'description': description,
            'url': article_url
        })
    except AttributeError as e:
        print(f"Error extracting information for: {article_url}. Error: {e}")



# Export the data to an Excel file
df.to_excel('immoweb_data.xlsx', index=False)
# If using Google Colab, download the Excel file
from google.colab import files
files.download('immoweb_data.xlsx')

Thanks in advance

Larz60+ write Oct-04-2024, 07:52 AM:
Please post all code, output and errors (it it's entirety) between their respective tags. Refer to BBCode help topic on how to post. Use the "Preview Post" button to make sure the code is presented as you expect before hitting the "Post Reply/Thread" button.
BBCode tags have been added this time. Please use BBCode tags on future posts.

**deanhystad** · Oct-04-2024, 05:11 PM

Where does it fail? Do you get something returned for price_container? What about visible_price and hidden_price?

Is there an error? If so, please post error message and trace information.

***snippsat*** · (This post was last modified: Oct-05-2024, 08:32 AM by snippsat.)

(Oct-03-2024, 10:47 PM)elgreedy Wrote: I tried my best to retrieve all the information

Or with some AI👀 help,as code comments show that.
I have no problem with this approach,but problem will be troubleshooting and or not knowing some basic stuff about web-scraping.
After line 10 nothing will work,have to look at what get back print(soup).
In there there is this line.

Output:
<h1 data-translate="block_headline">Sorry, you have been blocked</h1>

Can fix this bye adding a user-agent to Requests,then get source back.
But is still difficult to parse,because much of content get generate bye JavaScript,
so Selenium or Playwright can help with this.

Quote:I am noob in python so I need your help!

This not a beginner friendly site to scrape,should start with some more basic stuff.

AlluminumFoil · Oct-07-2024, 06:37 PM

I agree, YT helped me a lot to learn the basic things, so I'll start there.

Help with my coding

User Panel Messages

Announcements