Need help with saving output into an excel file - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: Need help with saving output into an excel file (/thread-32984.html) |
Need help with saving output into an excel file - Beyondfacts - Mar-21-2021 I have a python scraper for ranker.com. It scrapes the name and image url from the site. import requests import pandas as pd from pandas import DataFrame #Check "ListID" for api id r = requests.get('https://api.ranker.com/lists/538997/items?limit=100') data = r.json()['listItems'] for i in data: print(i['name'], i['image'])I wanted this output to be saved into a ranks excel file. I dont have much experience with python. Hence, can someone please help me with the excel part of the code. Thank you RE: Need help with saving output into an excel file - supuflounder - Mar-21-2021 Here is a script I wrote a couple days ago. I have elided some parts. It scrapes a site to retrieve the part description. I am using openpyxl import requests import re import sys import openpyxl ErrorLog = [] debug = False # ******************************************************************* # logError # Inputs: # s: a string to add to the error log # ******************************************************************* def logError(s): ErrorLog.append(s) # ******************************************************************* # msgProlog # Inputs: # C. which is a cell value # Result: string # The standard prolog to a message, which includes a trailing space # 'A1: "123" ' # ******************************************************************* def msgProlog(C): return (C.coordinate + ": \"" + str(C.value) + '" ') # ******************************************************************* # deHTML # Inputs: # s: A string # Result: string # The input string with all the &...; codes replaced # ******************************************************************* def deHTML(s): t = s t = t.replace('"', '"') t = t.replace('<', '<') t = t.replace('>', ">") t = t.replace('&', '&') return t # ******************************************************************* # Main program # ******************************************************************* # Open the workspace workspace_name = "inventory.xlsx" try: wb = openpyxl.load_workbook(filename = workspace_name) except: print("Error opening file ", sys.exc_info()[0]) raise sheet = wb.active # Set the default columns in case we can't figure out what is going on colPartNumber = 1 captionPartNumber = "#" colManufacturer = 2 captionManufacturer = "Mfg" colDescription = 5 captionDescription = "Description" # First, identify the columns that have the headings # "#" colPartNumber # "Mfg" colManufacturer # "Description" colDescription for col in range(1,20): cell = sheet.cell(1, col) caption = sheet.cell(1, col).value if caption == None: print(cell.coordinate, end=": ") print("<None>") break print(cell.coordinate + ":\"" + caption + "\"") if caption == captionPartNumber: colPartNumber = col print(captionPartNumber + ' = ' + str(col)) elif caption == captionManufacturer: colManufacturer = col print(captionManufacturer + ' = ' + str(col)) elif caption == captionDescription: colDescription = col print(captionDescription + ' = ' + str(col)) print("") # Now we can iterate through the cells row = 1 try: while(True): row += 1 # Get the part # partNo = sheet.cell(row = row, column = colPartNumber) part = partNo.value if(part == None): print(partNo.coordinate, end=": ") print("<None>") break; # do we already have a description? desc = sheet.cell(row = row, column = colDescription) if desc.value != None: # We have a description. If it has not been deHTMLed, do so nicedesc = deHTML(desc.value) if(desc.value != nicedesc): msg = msgProlog(partNo) + '"' + desc.value + '"=>"' + nicedesc + '"****************' print(msg) logError(msg) desc.value = nicedesc continue msg = msgProlog(partNo) + ' already has description "' + desc.value + '"' print(msg) continue # We do not have a description. Go find it try: r = requests.get('https://www.elided.com/products/' + str(partNo.value)) # If there was an error, throw an exception r.raise_for_status() except: msg = msgProlog(partNo) + ' "' + str(r) + '"' print(msg) logError(msg) print(r) continue # The best description is found in an entry that says # twitter:title<useless stuff>content="(.*)" x = re.search('twitter:title.*content="(.*)"', r.text) if x != None: msgProlog(partNo) + '"' + x.group(1) + '"' print(msg) caption = x.group(1) caption = deHTML(caption) desc.value = caption else: msg = msgProlog(partNo) + "<No attribute found>" logError(msg) print(msg) except: print("Exception occurred", sys.exc_info()[0]) # Now save the workbook wb.save(filename = workspace_name) print("File saved") print("=========================== Errors ===========================") for msg in ErrorLog: print(msg) RE: Need help with saving output into an excel file - jefsummers - Mar-21-2021 I suggest using Pandas as an intermediary. Convert your json to a Pandas dataframe, then convert the pandas dataframe to an xls file. RE: Need help with saving output into an excel file - supuflounder - Mar-22-2021 (Mar-21-2021, 12:19 PM)jefsummers Wrote: I suggest using Pandas as an intermediary. If I had JSON, this might apply, but I don't. I used openpyxl because it was the first one I found. I've heard of Pandas, what is the advantage over what I have? I'm curious, since I don't actually know anything about Pandas. RE: Need help with saving output into an excel file - jefsummers - Mar-22-2021 See OP llne 6. The request is converted to a json. Pandas is widely used for tabular information, used with AI, graphing (matplotlib), etc. Here is an introduction: Brandon Rhodes - Pandas from PyCon 2015 |