Posts: 30
Threads: 11
Joined: Mar 2021
Mar-16-2021, 11:33 PM
Hello guys,
This is my first time using Python and I'm very excited about it.
I'm trying Scrape some data from a website ( www.zapimoveis.com.br), and I found a developer who created a project on Pypi jus to do it. His project can be accessed via ( https://pypi.org/project/zapimoveis-scraper/), and installed via (pip install zapimoveis-scraper).
My question is, how can I extract the desired data and export it to an Excel Spreadsheet? There is no tutorial or infos available on his website. Basically I would like to know how can I use his project/ codes?
At the Usage Example of his page he just says:
zapimoveis_scraper.search(localization=”go+goiania++setor-oeste”, num_pages=5)
The objects returned from search contain the following attributes:
description: property description
price: property price (monthly)
bedrooms: number of bedrooms on property
bathrooms: number of bathrooms on property
total_area_m2: property area (square meters)
vacancies: parking spots available on property
address: property address
Maybe it's a very simple and stupid question for you guys... but as I mentioned, I'm just beggining and I'm learning a lot!
Thank you
Posts: 7,324
Threads: 123
Joined: Sep 2016
You use it like this after install.
1 2 3 4 5 6 7 |
import zapimoveis_scraper as zap
result = zap.search(localization = "go+goiania++setor-oeste" , num_pages = 5 )
print (result[ 0 ].description)
print (result[ 0 ].price)
print (result[ 0 ].bedrooms)
|
Output: λ python zap.py
A casa no bairro Setor Oeste possui 100 metros quadrados com 3 quartos sendo 1 suite e 2 banheiros, 02 salas, cozinha, area de serviço.
mais
R$ 1.700
3
brunolelli Wrote:My question is, how can I extract the desired data and export it to an Excel Spreadsheet? There are serval ways i find pandas most easy as get a look like Excel and has read_excel() to_excel() methods.
A other way is openpyxl
Posts: 30
Threads: 11
Joined: Mar 2021
Thank you so much!
It's working...
Would you mind providing me an example using pandas?
How can I extract all outputs and send the results to an excel .csv file?
Thank you!
Posts: 7,324
Threads: 123
Joined: Sep 2016
(Mar-17-2021, 12:34 AM)brunolelli Wrote: Would you mind providing me an example using pandas?
How can I extract all outputs and send the results to an excel .csv file? Can look this post it will be in similar way.
So you can loop over result as show in post and save in eg a dictionary for easier import to pandas.
As this is your first time using Python,so can there be some struggle with this if lack the basic knowledge.
Posts: 30
Threads: 11
Joined: Mar 2021
Mar-17-2021, 01:44 AM
(This post was last modified: Mar-17-2021, 01:45 AM by brunolelli.)
Thank you again!
I'm using the following code. What are your thoughts?
1 2 3 4 5 6 7 |
df = pd.DataFrame([], columns = [ 'Address' , 'Price' , 'Total Area' , 'Bathrooms' , 'Badrooms' , 'Vacancies' , 'Description' ])
for i in range ( 5 ):
df = df.append({ 'Address' : result[i].address, 'Price' : result[i].price, 'Total Area' : result[i].total_area_m2, 'Bathrooms' : result[i].bathrooms, 'Badrooms' :result[i].bedrooms, 'Vacancies' :result[i].vacancies, 'Description' :result[i].description}, ignore_index = True )
df.to_csv (r 'C:\Users\bruno\Desktop\FII\Zap.csv' , index = False , header = True , sep = ';' )
print (df)
|
As you can see here, I'm just looping through 5 items. How can I loop through all items? And it's not a fixed number, it varies...
Thank you
Posts: 30
Threads: 11
Joined: Mar 2021
Mar-17-2021, 01:50 AM
(This post was last modified: Mar-17-2021, 01:50 AM by brunolelli.)
I think I found a solution:
What do you think about it? Is it appropriate?
1 2 3 4 5 6 7 8 9 10 |
ofertas = len (result)
print (ofertas)
df = pd.DataFrame([], columns = [ 'Address' , 'Price' , 'Total Area' , 'Bathrooms' , 'Badrooms' , 'Vacancies' , 'Description' ])
for i in range (ofertas):
df = df.append({ 'Address' : result[i].address, 'Price' : result[i].price, 'Total Area' : result[i].total_area_m2, 'Bathrooms' : result[i].bathrooms, 'Badrooms' :result[i].bedrooms, 'Vacancies' :result[i].vacancies, 'Description' :result[i].description}, ignore_index = True )
df.to_csv (r 'C:\Users\bruno\Desktop\FII\Zap.csv' , index = False , header = True , sep = ';' )
print (df)
|
Posts: 56
Threads: 2
Joined: Jan 2021
Here is a script I wrote a couple days ago. Its purpose is to go out to a Web site, find the product page, and add the product description to my spreadsheet. The spreadsheet starts out with an ID number in column A and an (empty) description in column E. If the description is blank or empty, it queries the Web site. I've elided some details.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 |
import requests
import re
import sys
import openpyxl
ErrorLog = []
debug = False
def logError(s):
ErrorLog.append(s)
def msgProlog(C):
return (C.coordinate + ": \"" + str (C.value) + '" ' )
def deHTML(s):
t = s
t = t.replace( '"' ,
+ desc.value + '"=>"' + nicedesc + '"****************'
print (msg)
logError(msg)
desc.value = nicedesc
continue
msg = msgProlog(partNo) + ' already has description "' + desc.value +
print (msg)
logError(msg)
print (r)
continue
x = re.search( 'twitter:title.*content="(.*)"' , r.text)
if x ! = None :
msgProlog(partNo) +
print (msg)
caption = x.group( 1 )
caption = deHTML(caption)
desc.value = caption
else :
msg = msgProlog(partNo) + "<No attribute found>"
logError(msg)
print (msg)
except :
print ( "Exception occurred" , sys.exc_info()[ 0 ])
wb.save(filename = workspace_name)
print ( "File saved" )
print ( "=========================== Errors ===========================" )
for msg in ErrorLog:
print (msg)
|
brunolelli likes this post
Posts: 7,324
Threads: 123
Joined: Sep 2016
(Mar-17-2021, 01:50 AM)brunolelli Wrote: What do you think about it? Is it appropriate? If it give you what wanted then is okay👍
I can not run it as it not all code,line length get way to long.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
ofertas = len (result)
print (ofertas)
df = pd.DataFrame(
[],
columns = [
"Address" ,
"Price" ,
"Total Area" ,
"Bathrooms" ,
"Badrooms" ,
"Vacancies" ,
"Description" ,
],
)
for i in range (ofertas):
df = df.append(
{
"Address" : result[i].address,
"Price" : result[i].price,
"Total Area" : result[i].total_area_m2,
"Bathrooms" : result[i].bathrooms,
"Badrooms" : result[i].bedrooms,
"Vacancies" : result[i].vacancies,
"Description" : result[i].description,
},
ignore_index = True ,
)
df.to_csv(r "C:\Users\bruno\Desktop\FII\Zap.csv" , index = False , header = True , sep = ";" ,)
print (df)
|
brunolelli likes this post
Posts: 30
Threads: 11
Joined: Mar 2021
Thank you so much guys!
You guys are the best!
Thank you
|