![]() |
How can I use the following script? - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: How can I use the following script? (/thread-32921.html) |
How can I use the following script? - brunolelli - Mar-16-2021 Hello guys, This is my first time using Python and I'm very excited about it. I'm trying Scrape some data from a website (www.zapimoveis.com.br), and I found a developer who created a project on Pypi jus to do it. His project can be accessed via (https://pypi.org/project/zapimoveis-scraper/), and installed via (pip install zapimoveis-scraper). My question is, how can I extract the desired data and export it to an Excel Spreadsheet? There is no tutorial or infos available on his website. Basically I would like to know how can I use his project/ codes? At the Usage Example of his page he just says: zapimoveis_scraper.search(localization=”go+goiania++setor-oeste”, num_pages=5) The objects returned from search contain the following attributes: description: property description price: property price (monthly) bedrooms: number of bedrooms on property bathrooms: number of bathrooms on property total_area_m2: property area (square meters) vacancies: parking spots available on property address: property address Maybe it's a very simple and stupid question for you guys... but as I mentioned, I'm just beggining and I'm learning a lot! Thank you RE: How can I use the following script? - snippsat - Mar-17-2021 You use it like this after install. # zap.py import zapimoveis_scraper as zap result = zap.search(localization="go+goiania++setor-oeste", num_pages=5) print(result[0].description) print(result[0].price) print(result[0].bedrooms)
brunolelli Wrote:My question is, how can I extract the desired data and export it to an Excel Spreadsheet?There are serval ways i find pandas most easy as get a look like Excel and has read_excel() to_excel() methods.A other way is openpyxl RE: How can I use the following script? - brunolelli - Mar-17-2021 Thank you so much! It's working... Would you mind providing me an example using pandas? How can I extract all outputs and send the results to an excel .csv file? Thank you! RE: How can I use the following script? - snippsat - Mar-17-2021 (Mar-17-2021, 12:34 AM)brunolelli Wrote: Would you mind providing me an example using pandas?Can look this post it will be in similar way. So you can loop over result as show in post and save in eg a dictionary for easier import to pandas. As this is your first time using Python,so can there be some struggle with this if lack the basic knowledge. RE: How can I use the following script? - brunolelli - Mar-17-2021 Thank you again! I'm using the following code. What are your thoughts? df = pd.DataFrame([], columns=['Address', 'Price', 'Total Area', 'Bathrooms', 'Badrooms', 'Vacancies', 'Description']) for i in range(5): df = df.append({'Address': result[i].address, 'Price': result[i].price, 'Total Area': result[i].total_area_m2, 'Bathrooms': result[i].bathrooms, 'Badrooms':result[i].bedrooms, 'Vacancies':result[i].vacancies, 'Description':result[i].description}, ignore_index=True) df.to_csv (r'C:\Users\bruno\Desktop\FII\Zap.csv', index = False, header=True, sep=';') print(df)As you can see here, I'm just looping through 5 items. How can I loop through all items? And it's not a fixed number, it varies... Thank you RE: How can I use the following script? - brunolelli - Mar-17-2021 I think I found a solution: What do you think about it? Is it appropriate? ofertas = len(result) print(ofertas) df = pd.DataFrame([], columns=['Address', 'Price', 'Total Area', 'Bathrooms', 'Badrooms', 'Vacancies', 'Description']) for i in range(ofertas): df = df.append({'Address': result[i].address, 'Price': result[i].price, 'Total Area': result[i].total_area_m2, 'Bathrooms': result[i].bathrooms, 'Badrooms':result[i].bedrooms, 'Vacancies':result[i].vacancies, 'Description':result[i].description}, ignore_index=True) df.to_csv (r'C:\Users\bruno\Desktop\FII\Zap.csv', index = False, header=True, sep=';') print(df) RE: How can I use the following script? - supuflounder - Mar-17-2021 Here is a script I wrote a couple days ago. Its purpose is to go out to a Web site, find the product page, and add the product description to my spreadsheet. The spreadsheet starts out with an ID number in column A and an (empty) description in column E. If the description is blank or empty, it queries the Web site. I've elided some details. import requests import re import sys import openpyxl ErrorLog = [] debug = False # ******************************************************************* # logError # Inputs: # s: a string to add to the error log # ******************************************************************* def logError(s): ErrorLog.append(s) # ******************************************************************* # msgProlog # Inputs: # C. which is a cell value # Result: string # The standard prolog to a message, which includes a trailing space # 'A1: "123" ' # ******************************************************************* def msgProlog(C): return (C.coordinate + ": \"" + str(C.value) + '" ') # ******************************************************************* # deHTML # Inputs: # s: A string # Result: string # The input string with all the &...; codes replaced # ******************************************************************* def deHTML(s): t = s t = t.replace('"', '"') t = t.replace('<', '<') t = t.replace('>', ">") t = t.replace('&', '&') return t # ******************************************************************* # Main program # ******************************************************************* # Open the workspace workspace_name = "inventory.xlsx" try: wb = openpyxl.load_workbook(filename = workspace_name) except: print("Error opening file ", sys.exc_info()[0]) raise sheet = wb.active # Set the default columns in case we can't figure out what is going on colPartNumber = 1 captionPartNumber = "#" colManufacturer = 2 captionManufacturer = "Mfg" colDescription = 5 captionDescription = "Description" # First, identify the columns that have the headings # "#" colPartNumber # "Mfg" colManufacturer # "Description" colDescription for col in range(1,20): cell = sheet.cell(1, col) caption = sheet.cell(1, col).value if caption == None: print(cell.coordinate, end=": ") print("<None>") break print(cell.coordinate + ":\"" + caption + "\"") if caption == captionPartNumber: colPartNumber = col print(captionPartNumber + ' = ' + str(col)) elif caption == captionManufacturer: colManufacturer = col print(captionManufacturer + ' = ' + str(col)) elif caption == captionDescription: colDescription = col print(captionDescription + ' = ' + str(col)) print("") # Now we can iterate through the cells row = 1 try: while(True): row += 1 # Get the part # partNo = sheet.cell(row = row, column = colPartNumber) part = partNo.value if(part == None): print(partNo.coordinate, end=": ") print("<None>") break; # do we already have a description? desc = sheet.cell(row = row, column = colDescription) if desc.value != None: # We have a description. If it has not been deHTMLed, do so nicedesc = deHTML(desc.value) if(desc.value != nicedesc): msg = msgProlog(partNo) + '"' + desc.value + '"=>"' + nicedesc + '"****************' print(msg) logError(msg) desc.value = nicedesc continue msg = msgProlog(partNo) + ' already has description "' + desc.value + '"' print(msg) continue # We do not have a description. Go find it try: r = requests.get('https://www.elided.com/products/' + str(partNo.value)) # If there was an error, throw an exception r.raise_for_status() except: msg = msgProlog(partNo) + ' "' + str(r) + '"' print(msg) logError(msg) print(r) continue # The best description is found in an entry that says # twitter:title<useless stuff>content="(.*)" x = re.search('twitter:title.*content="(.*)"', r.text) if x != None: msgProlog(partNo) + '"' + x.group(1) + '"' print(msg) caption = x.group(1) caption = deHTML(caption) desc.value = caption else: msg = msgProlog(partNo) + "<No attribute found>" logError(msg) print(msg) except: print("Exception occurred", sys.exc_info()[0]) # Now save the workbook wb.save(filename = workspace_name) print("File saved") print("=========================== Errors ===========================") for msg in ErrorLog: print(msg) RE: How can I use the following script? - snippsat - Mar-17-2021 (Mar-17-2021, 01:50 AM)brunolelli Wrote: What do you think about it? Is it appropriate?If it give you what wanted then is okay👍 I can not run it as it not all code,line length get way to long. ofertas = len(result) print(ofertas) df = pd.DataFrame( [], columns=[ "Address", "Price", "Total Area", "Bathrooms", "Badrooms", "Vacancies", "Description", ], ) for i in range(ofertas): df = df.append( { "Address": result[i].address, "Price": result[i].price, "Total Area": result[i].total_area_m2, "Bathrooms": result[i].bathrooms, "Badrooms": result[i].bedrooms, "Vacancies": result[i].vacancies, "Description": result[i].description, }, ignore_index=True, ) df.to_csv(r"C:\Users\bruno\Desktop\FII\Zap.csv", index=False, header=True, sep=";",) print(df) RE: How can I use the following script? - brunolelli - Mar-17-2021 Thank you so much guys! You guys are the best! Thank you |