Posts: 30
Threads: 11
Joined: Mar 2021
Mar-16-2021, 11:33 PM
Hello guys,
This is my first time using Python and I'm very excited about it.
I'm trying Scrape some data from a website ( www.zapimoveis.com.br), and I found a developer who created a project on Pypi jus to do it. His project can be accessed via ( https://pypi.org/project/zapimoveis-scraper/), and installed via (pip install zapimoveis-scraper).
My question is, how can I extract the desired data and export it to an Excel Spreadsheet? There is no tutorial or infos available on his website. Basically I would like to know how can I use his project/ codes?
At the Usage Example of his page he just says:
zapimoveis_scraper.search(localization=”go+goiania++setor-oeste”, num_pages=5)
The objects returned from search contain the following attributes:
description: property description
price: property price (monthly)
bedrooms: number of bedrooms on property
bathrooms: number of bathrooms on property
total_area_m2: property area (square meters)
vacancies: parking spots available on property
address: property address
Maybe it's a very simple and stupid question for you guys... but as I mentioned, I'm just beggining and I'm learning a lot!
Thank you
Posts: 7,320
Threads: 123
Joined: Sep 2016
You use it like this after install.
# zap.py
import zapimoveis_scraper as zap
result = zap.search(localization="go+goiania++setor-oeste", num_pages=5)
print(result[0].description)
print(result[0].price)
print(result[0].bedrooms) Output: λ python zap.py
A casa no bairro Setor Oeste possui 100 metros quadrados com 3 quartos sendo 1 suite e 2 banheiros, 02 salas, cozinha, area de serviço.
mais
R$ 1.700
3
brunolelli Wrote:My question is, how can I extract the desired data and export it to an Excel Spreadsheet? There are serval ways i find pandas most easy as get a look like Excel and has read_excel() to_excel() methods.
A other way is openpyxl
Posts: 30
Threads: 11
Joined: Mar 2021
Thank you so much!
It's working...
Would you mind providing me an example using pandas?
How can I extract all outputs and send the results to an excel .csv file?
Thank you!
Posts: 7,320
Threads: 123
Joined: Sep 2016
(Mar-17-2021, 12:34 AM)brunolelli Wrote: Would you mind providing me an example using pandas?
How can I extract all outputs and send the results to an excel .csv file? Can look this post it will be in similar way.
So you can loop over result as show in post and save in eg a dictionary for easier import to pandas.
As this is your first time using Python,so can there be some struggle with this if lack the basic knowledge.
Posts: 30
Threads: 11
Joined: Mar 2021
Mar-17-2021, 01:44 AM
(This post was last modified: Mar-17-2021, 01:45 AM by brunolelli.)
Thank you again!
I'm using the following code. What are your thoughts?
df = pd.DataFrame([], columns=['Address', 'Price', 'Total Area', 'Bathrooms', 'Badrooms', 'Vacancies', 'Description'])
for i in range(5):
df = df.append({'Address': result[i].address, 'Price': result[i].price, 'Total Area': result[i].total_area_m2, 'Bathrooms': result[i].bathrooms, 'Badrooms':result[i].bedrooms, 'Vacancies':result[i].vacancies, 'Description':result[i].description}, ignore_index=True)
df.to_csv (r'C:\Users\bruno\Desktop\FII\Zap.csv', index = False, header=True, sep=';')
print(df) As you can see here, I'm just looping through 5 items. How can I loop through all items? And it's not a fixed number, it varies...
Thank you
Posts: 30
Threads: 11
Joined: Mar 2021
Mar-17-2021, 01:50 AM
(This post was last modified: Mar-17-2021, 01:50 AM by brunolelli.)
I think I found a solution:
What do you think about it? Is it appropriate?
ofertas = len(result)
print(ofertas)
df = pd.DataFrame([], columns=['Address', 'Price', 'Total Area', 'Bathrooms', 'Badrooms', 'Vacancies', 'Description'])
for i in range(ofertas):
df = df.append({'Address': result[i].address, 'Price': result[i].price, 'Total Area': result[i].total_area_m2, 'Bathrooms': result[i].bathrooms, 'Badrooms':result[i].bedrooms, 'Vacancies':result[i].vacancies, 'Description':result[i].description}, ignore_index=True)
df.to_csv (r'C:\Users\bruno\Desktop\FII\Zap.csv', index = False, header=True, sep=';')
print(df)
Posts: 56
Threads: 2
Joined: Jan 2021
Here is a script I wrote a couple days ago. Its purpose is to go out to a Web site, find the product page, and add the product description to my spreadsheet. The spreadsheet starts out with an ID number in column A and an (empty) description in column E. If the description is blank or empty, it queries the Web site. I've elided some details.
import requests
import re
import sys
import openpyxl
ErrorLog = []
debug = False
# *******************************************************************
# logError
# Inputs:
# s: a string to add to the error log
# *******************************************************************
def logError(s):
ErrorLog.append(s)
# *******************************************************************
# msgProlog
# Inputs:
# C. which is a cell value
# Result: string
# The standard prolog to a message, which includes a trailing space
# 'A1: "123" '
# *******************************************************************
def msgProlog(C):
return (C.coordinate + ": \"" + str(C.value) + '" ')
# *******************************************************************
# deHTML
# Inputs:
# s: A string
# Result: string
# The input string with all the &...; codes replaced
# *******************************************************************
def deHTML(s):
t = s
t = t.replace('"', '"')
t = t.replace('<', '<')
t = t.replace('>', ">")
t = t.replace('&', '&')
return t
# *******************************************************************
# Main program
# *******************************************************************
# Open the workspace
workspace_name = "inventory.xlsx"
try:
wb = openpyxl.load_workbook(filename = workspace_name)
except:
print("Error opening file ", sys.exc_info()[0])
raise
sheet = wb.active
# Set the default columns in case we can't figure out what is going on
colPartNumber = 1
captionPartNumber = "#"
colManufacturer = 2
captionManufacturer = "Mfg"
colDescription = 5
captionDescription = "Description"
# First, identify the columns that have the headings
# "#" colPartNumber
# "Mfg" colManufacturer
# "Description" colDescription
for col in range(1,20):
cell = sheet.cell(1, col)
caption = sheet.cell(1, col).value
if caption == None:
print(cell.coordinate, end=": ")
print("<None>")
break
print(cell.coordinate + ":\"" + caption + "\"")
if caption == captionPartNumber:
colPartNumber = col
print(captionPartNumber + ' = ' + str(col))
elif caption == captionManufacturer:
colManufacturer = col
print(captionManufacturer + ' = ' + str(col))
elif caption == captionDescription:
colDescription = col
print(captionDescription + ' = ' + str(col))
print("")
# Now we can iterate through the cells
row = 1
try:
while(True):
row += 1
# Get the part #
partNo = sheet.cell(row = row, column = colPartNumber)
part = partNo.value
if(part == None):
print(partNo.coordinate, end=": ")
print("<None>")
break;
# do we already have a description?
desc = sheet.cell(row = row, column = colDescription)
if desc.value != None:
# We have a description. If it has not been deHTMLed, do so
nicedesc = deHTML(desc.value)
if(desc.value != nicedesc):
msg = msgProlog(partNo) + '"' + desc.value + '"=>"' + nicedesc + '"****************'
print(msg)
logError(msg)
desc.value = nicedesc
continue
msg = msgProlog(partNo) + ' already has description "' + desc.value + '"'
print(msg)
continue
# We do not have a description. Go find it
try:
r = requests.get('https://www.elided.com/products/' + str(partNo.value))
# If there was an error, throw an exception
r.raise_for_status()
except:
msg = msgProlog(partNo) + ' "' + str(r) + '"'
print(msg)
logError(msg)
print(r)
continue
# The best description is found in an entry that says
# twitter:title<useless stuff>content="(.*)"
x = re.search('twitter:title.*content="(.*)"', r.text)
if x != None:
msgProlog(partNo) + '"' + x.group(1) + '"'
print(msg)
caption = x.group(1)
caption = deHTML(caption)
desc.value = caption
else:
msg = msgProlog(partNo) + "<No attribute found>"
logError(msg)
print(msg)
except:
print("Exception occurred", sys.exc_info()[0])
# Now save the workbook
wb.save(filename = workspace_name)
print("File saved")
print("=========================== Errors ===========================")
for msg in ErrorLog:
print(msg)
brunolelli likes this post
Posts: 7,320
Threads: 123
Joined: Sep 2016
(Mar-17-2021, 01:50 AM)brunolelli Wrote: What do you think about it? Is it appropriate? If it give you what wanted then is okay👍
I can not run it as it not all code,line length get way to long.
ofertas = len(result)
print(ofertas)
df = pd.DataFrame(
[],
columns=[
"Address",
"Price",
"Total Area",
"Bathrooms",
"Badrooms",
"Vacancies",
"Description",
],
)
for i in range(ofertas):
df = df.append(
{
"Address": result[i].address,
"Price": result[i].price,
"Total Area": result[i].total_area_m2,
"Bathrooms": result[i].bathrooms,
"Badrooms": result[i].bedrooms,
"Vacancies": result[i].vacancies,
"Description": result[i].description,
},
ignore_index=True,
)
df.to_csv(r"C:\Users\bruno\Desktop\FII\Zap.csv", index=False, header=True, sep=";",)
print(df)
brunolelli likes this post
Posts: 30
Threads: 11
Joined: Mar 2021
Thank you so much guys!
You guys are the best!
Thank you
|