Using Excel Cell As A Variable In A Loop

knight2000 · Jul-18-2021, 06:59 AM

Hi Larz60+,

I'm sorry for the very late reply.

Thank you for providing me with this cool sample code to extract the html locally for each page. I've tried to go through the code and I sort of understand it (wouldn't be able to write it myself yet Rolleyes

) and I've applied it to my test project and it works very well indeed. Big Grin

I've copied the code into my notes to use if this scenario comes up again. It's nice having the local html to then play around to extract other attributes- especially when practicing.

Thanks again mate and have a great week ahead.

(Jul-09-2021, 11:04 PM)Larz60+ Wrote: here's a better way, doesn't require excel or pandas, and can be reused for any sites of the type you mention.

I have included a sample which scrapes the bird pages for species listed (compare to your 'endpart')
you can try with your url's

the pages will be downloaded and placed in directory 'renderhtml', ready to be parsed with Beautifulsoup.

note that this class used pathlib (python built-in) to create posix path.

It uses lxml parser, and beautifulsoup which you may have to install:
(from command line):

pip install lxml
pip install Beautifulsoup4

Name this module:
RenderUrl.py

import os
from pathlib import Path
import requests


class RenderUrl:
    def __init__(self, baseurl=None):
        self.base_url = baseurl

        # Create new savepath if needed
        os.chdir(os.path.abspath(os.path.dirname(__file__)))
        self.savepath = Path('.') / 'renderhtml'
        self.savepath.mkdir(exist_ok=True)

        # temp storage for url suffixes
        self.suffixlist = []

    def url_emmitter(self):
        page = None
        suffix = None
        n = len(self.suffixlist)
        i = 0
        while i < n:
            suffix = self.suffixlist[i]
            url = f"{self.base_url}{suffix}"
            yield url
            i += 1
    
    def get_pages(self, suffixlist, cache=False):
        self.suffixlist = suffixlist
        for url in self.url_emmitter():
            print(f"fetching: {url}")
            fname = (url.split('/')[-1]).replace('-','_')
            filename = self.savepath / f"{fname}.html"
            if cache and filename.exists():
                with filename.open('rb') as fp:
                    page = fp.read()
            else:
                response = requests.get(url)
                if response.status_code == 200:
                    page = response.content
                    with filename.open('wb') as fp:
                        fp.write(page)

Here's how to use it (put both files in same directory):
TryRenderUrl.py

from RenderUrl import RenderUrl

def main():
    '''
        AudubonBirds:
    '''
    baseurl = "https://www.massaudubon.org/learn/nature-wildlife/birds/"
    birdlist = ['american-goldfinches','american-kestrels','american-robins','bald-eagles',
    'baltimore-orchard-o','baltimore-orchard-orioles', 'birds-of-prey']
    rurl = RenderUrl(baseurl)
    rurl.get_pages(birdlist, cache=True)

if __name__ == '__main__':
    main()

then run python TryRenderUrl.py

Output:fetching: https://www.massaudubon.org/learn/nature-wildlife/birds/american-goldfinches
fetching: https://www.massaudubon.org/learn/nature-wildlife/birds/american-kestrels
fetching: https://www.massaudubon.org/learn/nature-wildlife/birds/american-robins
fetching: https://www.massaudubon.org/learn/nature-wildlife/birds/bald-eagles
fetching: https://www.massaudubon.org/learn/nature-wildlife/birds/baltimore-orchard-o
fetching: https://www.massaudubon.org/learn/nature-wildlife/birds/baltimore-orchard-orioles
fetching: https://www.massaudubon.org/learn/nature-wildlife/birds/birds-of-prey

Now look in the directory renderhtml
you will find:

Output:american_goldfinches.html
american_kestrels.html
american_robins.html
bald_eagles.html
baltimore_orchard_orioles.html
birds_of_prey.html

ready to be parsed.

Now try with your data.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Variable definitions inside loop / could be better?	gugarciap	2	513	Jan-09-2024, 11:11 PM Last Post: deanhystad
	How to create a variable only for use inside the scope of a while loop?	Radical	10	1,977	Nov-07-2023, 09:49 AM Last Post: buran
	Problem with print variable in print.cell (fpdf)	muconi	0	699	Dec-25-2022, 02:24 PM Last Post: muconi
	How to loop through all excel files and sheets in folder	jadelola	1	4,639	Dec-01-2022, 06:12 PM Last Post: deanhystad
	Deleting rows based on cell value in Excel	azizrasul	11	2,822	Oct-19-2022, 02:38 AM Last Post: azizrasul
	export into excel, how to implement pandas into for-loop	deneme2	6	2,606	Sep-01-2022, 05:44 AM Last Post: deneme2
	Nested for loops - help with iterating a variable outside of the main loop	dm222	4	1,687	Aug-17-2022, 10:17 PM Last Post: deanhystad
	loop (create variable where name is dependent on another variable)	brianhclo	1	1,186	Aug-05-2022, 07:46 AM Last Post: bowlofred
	Multiple Loop Statements in a Variable	Dexty	1	1,243	May-23-2022, 08:53 AM Last Post: bowlofred
	Variable flag vs code outside of for loop?(Disregard)	cubangt	2	1,220	Mar-16-2022, 08:54 PM Last Post: cubangt

Using Excel Cell As A Variable In A Loop

User Panel Messages

Announcements