Python Forum
Need help with saving output into an excel file
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Need help with saving output into an excel file
#1
I have a python scraper for ranker.com. It scrapes the name and image url from the site.

import requests
import pandas as pd
from pandas import DataFrame
#Check "ListID" for api id
r = requests.get('https://api.ranker.com/lists/538997/items?limit=100')
data = r.json()['listItems']
for i in data: print(i['name'], i['image'])
I wanted this output to be saved into a ranks excel file. I dont have much experience with python. Hence, can someone please help me with the excel part of the code. Thank you
Reply
#2
Here is a script I wrote a couple days ago. I have elided some parts. It scrapes a site to retrieve the part description. I am using openpyxl
import requests
import re
import sys
import openpyxl
 
ErrorLog = []
debug = False
# *******************************************************************
# logError
# Inputs:
#    s: a string to add to the error log
# *******************************************************************
def logError(s):
    ErrorLog.append(s)
# *******************************************************************
# msgProlog
# Inputs:
#    C. which is a cell value
# Result: string
#    The standard prolog to a message, which includes a trailing space
#    'A1: "123" '
# *******************************************************************
def msgProlog(C):
    return (C.coordinate + ": \"" + str(C.value) + '" ')
# *******************************************************************
# deHTML
# Inputs:
#    s: A string
# Result: string
#    The input string with all the &...; codes replaced
# *******************************************************************
def deHTML(s):
    t = s
    t = t.replace('"', '"')
    t = t.replace('&lt', '<')
    t = t.replace('&gt', ">")
    t = t.replace('&amp;', '&')
    return t
# *******************************************************************
# Main program
# *******************************************************************
# Open the workspace
workspace_name = "inventory.xlsx"
try:
   wb = openpyxl.load_workbook(filename = workspace_name)
except:
    print("Error opening file ", sys.exc_info()[0])
    raise
sheet = wb.active
 
# Set the default columns in case we can't figure out what is going on
colPartNumber = 1
captionPartNumber = "#"
colManufacturer = 2
captionManufacturer = "Mfg"
colDescription = 5
captionDescription = "Description"
 
# First, identify the columns that have the headings
# "#" colPartNumber
# "Mfg" colManufacturer
# "Description" colDescription
 
for col in range(1,20):
    cell = sheet.cell(1, col)
    caption = sheet.cell(1, col).value
    if caption == None:
       print(cell.coordinate, end=": ")
       print("<None>")
       break
 
    print(cell.coordinate + ":\"" + caption + "\"")
    if caption == captionPartNumber:
        colPartNumber = col
        print(captionPartNumber + ' = ' + str(col))
    elif caption == captionManufacturer:
        colManufacturer = col
        print(captionManufacturer + ' = ' + str(col))
    elif caption == captionDescription:
        colDescription = col
        print(captionDescription + ' = ' + str(col))
 
print("")
 
# Now we can iterate through the cells
row = 1
try: 
    while(True):
        row += 1
        # Get the part #
        partNo = sheet.cell(row = row, column = colPartNumber)
        part = partNo.value
        if(part == None):
           print(partNo.coordinate, end=": ")
           print("<None>")
           break;
 
    # do we already have a description?
        desc = sheet.cell(row = row, column = colDescription)
        if desc.value != None:
            # We have a description.  If it has not been deHTMLed, do so
            nicedesc = deHTML(desc.value)
            if(desc.value != nicedesc):
               msg = msgProlog(partNo) + '"' + desc.value + '"=>"' + nicedesc + '"****************'
               print(msg)
               logError(msg)
               desc.value = nicedesc
               continue
            msg = msgProlog(partNo) + ' already has description "' + desc.value + '"'
            print(msg)
            continue
 
    # We do not have a description.  Go find it
        try:
            r = requests.get('https://www.elided.com/products/' + str(partNo.value))
 
            # If there was an error, throw an exception
            r.raise_for_status()
        except:
            msg = msgProlog(partNo) + ' "' + str(r) + '"'
            print(msg)
            logError(msg)
            print(r)
            continue
        # The best description is found in an entry that says
        # twitter:title<useless stuff>content="(.*)"
        x = re.search('twitter:title.*content="(.*)"', r.text)
        if x != None:
            msgProlog(partNo) + '"' + x.group(1) + '"'
            print(msg)
            caption = x.group(1)
            caption = deHTML(caption)
            desc.value = caption
        else:
            msg = msgProlog(partNo) + "<No attribute found>"
            logError(msg)
            print(msg)
except:
    print("Exception occurred", sys.exc_info()[0])
 
# Now save the workbook
wb.save(filename = workspace_name)
print("File saved")
print("=========================== Errors ===========================")
for msg in ErrorLog:
   print(msg)
Reply
#3
I suggest using Pandas as an intermediary.
Convert your json to a Pandas dataframe, then convert the pandas dataframe to an xls file.
Reply
#4
(Mar-21-2021, 12:19 PM)jefsummers Wrote: I suggest using Pandas as an intermediary.
Convert your json to a Pandas dataframe, then convert the pandas dataframe to an xls file.

If I had JSON, this might apply, but I don't.

I used openpyxl because it was the first one I found. I've heard of Pandas, what is the advantage over what I have? I'm curious, since I don't actually know anything about Pandas.
Reply
#5
See OP llne 6. The request is converted to a json.

Pandas is widely used for tabular information, used with AI, graphing (matplotlib), etc.

Here is an introduction: Brandon Rhodes - Pandas from PyCon 2015
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Python openyxl not updating Excel file MrBean12 1 252 Mar-03-2024, 12:16 AM
Last Post: MrBean12
  Copy Paste excel files based on the first letters of the file name Viento 2 349 Feb-07-2024, 12:24 PM
Last Post: Viento
  Search Excel File with a list of values huzzug 4 1,148 Nov-03-2023, 05:35 PM
Last Post: huzzug
  Updating sharepoint excel file odd results cubangt 1 756 Nov-03-2023, 05:13 PM
Last Post: noisefloor
  Reading data from excel file –> process it >>then write to another excel output file Jennifer_Jone 0 1,048 Mar-14-2023, 07:59 PM
Last Post: Jennifer_Jone
  Save and Close Excel File avd88 0 2,844 Feb-20-2023, 07:19 PM
Last Post: avd88
  Trying to access excel file on our sharepoint server but getting errors cubangt 0 773 Feb-16-2023, 08:11 PM
Last Post: cubangt
  Import XML file directly into Excel spreadsheet demdej 0 804 Jan-24-2023, 02:48 PM
Last Post: demdej
  how to read txt file, and write into excel with multiply sheet jacklee26 14 9,525 Jan-21-2023, 06:57 AM
Last Post: jacklee26
Thumbs Up Need to compare the Excel file name with a directory text file. veeran1991 1 1,071 Dec-15-2022, 04:32 PM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020