How can I use the following script?

brunolelli · Mar-16-2021, 11:33 PM

Hello guys,

This is my first time using Python and I'm very excited about it.
I'm trying Scrape some data from a website (www.zapimoveis.com.br), and I found a developer who created a project on Pypi jus to do it. His project can be accessed via (https://pypi.org/project/zapimoveis-scraper/), and installed via (pip install zapimoveis-scraper).

My question is, how can I extract the desired data and export it to an Excel Spreadsheet? There is no tutorial or infos available on his website. Basically I would like to know how can I use his project/ codes?

At the Usage Example of his page he just says:
zapimoveis_scraper.search(localization=”go+goiania++setor-oeste”, num_pages=5)

The objects returned from search contain the following attributes:
description: property description
price: property price (monthly)
bedrooms: number of bedrooms on property
bathrooms: number of bathrooms on property
total_area_m2: property area (square meters)
vacancies: parking spots available on property
address: property address

Maybe it's a very simple and stupid question for you guys... but as I mentioned, I'm just beggining and I'm learning a lot!

Thank you

***snippsat*** · Mar-17-2021, 12:31 AM

You use it like this after install.

        
              # zap.py
import zapimoveis_scraper as zap
 
result  = zap.search(localization="go+goiania++setor-oeste", num_pages=5) 
print(result[0].description)
print(result[0].price)
print(result[0].bedrooms)

Output:λ python zap.py
A casa no bairro Setor Oeste possui 100 metros quadrados com 3 quartos sendo 1 suite e 2 banheiros, 02 salas, cozinha, area de serviço.
    mais
R$ 1.700
3

brunolelli Wrote:My question is, how can I extract the desired data and export it to an Excel Spreadsheet?

There are serval ways i find pandas most easy as get a look like Excel and has read_excel() to_excel() methods.
A other way is openpyxl

brunolelli · Mar-17-2021, 12:34 AM

Thank you so much!
It's working...

Would you mind providing me an example using pandas?

How can I extract all outputs and send the results to an excel .csv file?

Thank you!

***snippsat*** · Mar-17-2021, 01:13 AM

(Mar-17-2021, 12:34 AM)brunolelli Wrote: Would you mind providing me an example using pandas?

How can I extract all outputs and send the results to an excel .csv file?

Can look this post it will be in similar way.
So you can loop over result as show in post and save in eg a dictionary for easier import to pandas.
As this is your first time using Python,so can there be some struggle with this if lack the basic knowledge.

brunolelli · (This post was last modified: Mar-17-2021, 01:45 AM by brunolelli.)

Thank you again!
I'm using the following code. What are your thoughts?

        
              df = pd.DataFrame([], columns=['Address', 'Price', 'Total Area', 'Bathrooms', 'Badrooms', 'Vacancies', 'Description'])
 
for i in range(5):
    df = df.append({'Address': result[i].address, 'Price': result[i].price, 'Total Area': result[i].total_area_m2, 'Bathrooms': result[i].bathrooms, 'Badrooms':result[i].bedrooms, 'Vacancies':result[i].vacancies, 'Description':result[i].description}, ignore_index=True)
 
df.to_csv (r'C:\Users\bruno\Desktop\FII\Zap.csv', index = False, header=True, sep=';')
print(df)

As you can see here, I'm just looping through 5 items. How can I loop through all items? And it's not a fixed number, it varies...
Thank you

brunolelli · (This post was last modified: Mar-17-2021, 01:50 AM by brunolelli.)

I think I found a solution:

What do you think about it? Is it appropriate?

        
              ofertas = len(result)
print(ofertas)
 
df = pd.DataFrame([], columns=['Address', 'Price', 'Total Area', 'Bathrooms', 'Badrooms', 'Vacancies', 'Description'])
 
for i in range(ofertas):
    df = df.append({'Address': result[i].address, 'Price': result[i].price, 'Total Area': result[i].total_area_m2, 'Bathrooms': result[i].bathrooms, 'Badrooms':result[i].bedrooms, 'Vacancies':result[i].vacancies, 'Description':result[i].description}, ignore_index=True)
 
df.to_csv (r'C:\Users\bruno\Desktop\FII\Zap.csv', index = False, header=True, sep=';')
print(df)

supuflounder · Mar-17-2021, 08:56 AM

Here is a script I wrote a couple days ago. Its purpose is to go out to a Web site, find the product page, and add the product description to my spreadsheet. The spreadsheet starts out with an ID number in column A and an (empty) description in column E. If the description is blank or empty, it queries the Web site. I've elided some details.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

import requests
import re
import sys
import openpyxl
 
ErrorLog = []
debug = False
# *******************************************************************
# logError
# Inputs:
#    s: a string to add to the error log
# *******************************************************************
def logError(s):
    ErrorLog.append(s)
# *******************************************************************
# msgProlog
# Inputs:
#    C. which is a cell value
# Result: string
#    The standard prolog to a message, which includes a trailing space
#    'A1: "123" '
# *******************************************************************
def msgProlog(C):
    return (C.coordinate + ": \"" + str(C.value) + '" ')
# *******************************************************************
# deHTML
# Inputs:
#    s: A string
# Result: string
#    The input string with all the &...; codes replaced
# *******************************************************************
def deHTML(s):
    t = s
    t = t.replace('&quot;', '"')
    t = t.replace('&lt', '<')
    t = t.replace('&gt', ">")
    t = t.replace('&amp;', '&')
    return t
# *******************************************************************
# Main program
# *******************************************************************
# Open the workspace
workspace_name = "inventory.xlsx"
try:
   wb = openpyxl.load_workbook(filename = workspace_name)
except:
    print("Error opening file ", sys.exc_info()[0])
    raise
sheet = wb.active
 
# Set the default columns in case we can't figure out what is going on
colPartNumber = 1
captionPartNumber = "#"
colManufacturer = 2
captionManufacturer = "Mfg"
colDescription = 5
captionDescription = "Description"
 
# First, identify the columns that have the headings
# "#" colPartNumber
# "Mfg" colManufacturer
# "Description" colDescription
 
for col in range(1,20):
    cell = sheet.cell(1, col)
    caption = sheet.cell(1, col).value
    if caption == None:
       print(cell.coordinate, end=": ")
       print("<None>")
       break
 
    print(cell.coordinate + ":\"" + caption + "\"")
    if caption == captionPartNumber:
        colPartNumber = col
        print(captionPartNumber + ' = ' + str(col))
    elif caption == captionManufacturer:
        colManufacturer = col
        print(captionManufacturer + ' = ' + str(col))
    elif caption == captionDescription:
        colDescription = col
        print(captionDescription + ' = ' + str(col))
 
print("")
 
# Now we can iterate through the cells
row = 1
try: 
    while(True):
        row += 1
        # Get the part #
        partNo = sheet.cell(row = row, column = colPartNumber)
        part = partNo.value
        if(part == None):
           print(partNo.coordinate, end=": ")
           print("<None>")
           break;
 
    # do we already have a description?
        desc = sheet.cell(row = row, column = colDescription)
        if desc.value != None:
            # We have a description.  If it has not been deHTMLed, do so
            nicedesc = deHTML(desc.value)
            if(desc.value != nicedesc):
               msg = msgProlog(partNo) + '"' + desc.value + '"=>"' + nicedesc + '"****************'
               print(msg)
               logError(msg)
               desc.value = nicedesc
               continue
            msg = msgProlog(partNo) + ' already has description "' + desc.value + '"'
            print(msg)
            continue
 
    # We do not have a description.  Go find it
        try:
            r = requests.get('https://www.elided.com/products/' + str(partNo.value))
 
            # If there was an error, throw an exception
            r.raise_for_status()
        except:
            msg = msgProlog(partNo) + ' "' + str(r) + '"'
            print(msg)
            logError(msg)
            print(r)
            continue
        # The best description is found in an entry that says
        # twitter:title<useless stuff>content="(.*)"
        x = re.search('twitter:title.*content="(.*)"', r.text)
        if x != None:
            msgProlog(partNo) + '"' + x.group(1) + '"'
            print(msg)
            caption = x.group(1)
            caption = deHTML(caption)
            desc.value = caption
        else:
            msg = msgProlog(partNo) + "<No attribute found>"
            logError(msg)
            print(msg)
except:
    print("Exception occurred", sys.exc_info()[0])
 
# Now save the workbook
wb.save(filename = workspace_name)
print("File saved")
print("=========================== Errors ===========================")
for msg in ErrorLog:
   print(msg)
  

***snippsat*** · Mar-17-2021, 09:25 AM

(Mar-17-2021, 01:50 AM)brunolelli Wrote: What do you think about it? Is it appropriate?

If it give you what wanted then is okay👍
I can not run it as it not all code,line length get way to long.

        
          
          
              
              ofertas = len(result)
print(ofertas)
 
df = pd.DataFrame(
    [],
    columns=[
        "Address",
        "Price",
        "Total Area",
        "Bathrooms",
        "Badrooms",
        "Vacancies",
        "Description",
    ],
)
 
for i in range(ofertas):
    df = df.append(
        {
            "Address": result[i].address,
            "Price": result[i].price,
            "Total Area": result[i].total_area_m2,
            "Bathrooms": result[i].bathrooms,
            "Badrooms": result[i].bedrooms,
            "Vacancies": result[i].vacancies,
            "Description": result[i].description,
        },
        ignore_index=True,
    )
 
df.to_csv(r"C:\Users\bruno\Desktop\FII\Zap.csv", index=False, header=True, sep=";",)
print(df)

            

        
      

brunolelli · Mar-17-2021, 12:57 PM

Thank you so much guys!

You guys are the best!

Thank you

How can I use the following script?

User Panel Messages

Announcements