Python Forum
Using Excel Cell As A Variable In A Loop
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Using Excel Cell As A Variable In A Loop
#2
here's a better way, doesn't require excel or pandas, and can be reused for any sites of the type you mention.

I have included a sample which scrapes the bird pages for species listed (compare to your 'endpart')
you can try with your url's

the pages will be downloaded and placed in directory 'renderhtml', ready to be parsed with Beautifulsoup.

note that this class used pathlib (python built-in) to create posix path.

It uses lxml parser, and beautifulsoup which you may have to install:
(from command line):
pip install lxml
pip install Beautifulsoup4
Name this module:
RenderUrl.py
import os
from pathlib import Path
import requests


class RenderUrl:
    def __init__(self, baseurl=None):
        self.base_url = baseurl

        # Create new savepath if needed
        os.chdir(os.path.abspath(os.path.dirname(__file__)))
        self.savepath = Path('.') / 'renderhtml'
        self.savepath.mkdir(exist_ok=True)

        # temp storage for url suffixes
        self.suffixlist = []

    def url_emmitter(self):
        page = None
        suffix = None
        n = len(self.suffixlist)
        i = 0
        while i < n:
            suffix = self.suffixlist[i]
            url = f"{self.base_url}{suffix}"
            yield url
            i += 1
    
    def get_pages(self, suffixlist, cache=False):
        self.suffixlist = suffixlist
        for url in self.url_emmitter():
            print(f"fetching: {url}")
            fname = (url.split('/')[-1]).replace('-','_')
            filename = self.savepath / f"{fname}.html"
            if cache and filename.exists():
                with filename.open('rb') as fp:
                    page = fp.read()
            else:
                response = requests.get(url)
                if response.status_code == 200:
                    page = response.content
                    with filename.open('wb') as fp:
                        fp.write(page)
Here's how to use it (put both files in same directory):
TryRenderUrl.py
from RenderUrl import RenderUrl

def main():
    '''
        AudubonBirds:
    '''
    baseurl = "https://www.massaudubon.org/learn/nature-wildlife/birds/"
    birdlist = ['american-goldfinches','american-kestrels','american-robins','bald-eagles',
    'baltimore-orchard-o','baltimore-orchard-orioles', 'birds-of-prey']
    rurl = RenderUrl(baseurl)
    rurl.get_pages(birdlist, cache=True)

if __name__ == '__main__':
    main()
then run python TryRenderUrl.py
Output:
fetching: https://www.massaudubon.org/learn/nature-wildlife/birds/american-goldfinches fetching: https://www.massaudubon.org/learn/nature-wildlife/birds/american-kestrels fetching: https://www.massaudubon.org/learn/nature-wildlife/birds/american-robins fetching: https://www.massaudubon.org/learn/nature-wildlife/birds/bald-eagles fetching: https://www.massaudubon.org/learn/nature-wildlife/birds/baltimore-orchard-o fetching: https://www.massaudubon.org/learn/nature-wildlife/birds/baltimore-orchard-orioles fetching: https://www.massaudubon.org/learn/nature-wildlife/birds/birds-of-prey
Now look in the directory renderhtml
you will find:
Output:
american_goldfinches.html american_kestrels.html american_robins.html bald_eagles.html baltimore_orchard_orioles.html birds_of_prey.html
ready to be parsed.

Now try with your data.
knight2000 likes this post
Reply


Messages In This Thread
RE: Using Excel Cell As A Variable In A Loop - by Larz60+ - Jul-09-2021, 11:04 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Variable definitions inside loop / could be better? gugarciap 2 477 Jan-09-2024, 11:11 PM
Last Post: deanhystad
  How to create a variable only for use inside the scope of a while loop? Radical 10 1,880 Nov-07-2023, 09:49 AM
Last Post: buran
  Problem with print variable in print.cell (fpdf) muconi 0 689 Dec-25-2022, 02:24 PM
Last Post: muconi
  How to loop through all excel files and sheets in folder jadelola 1 4,568 Dec-01-2022, 06:12 PM
Last Post: deanhystad
  Deleting rows based on cell value in Excel azizrasul 11 2,733 Oct-19-2022, 02:38 AM
Last Post: azizrasul
  export into excel, how to implement pandas into for-loop deneme2 6 2,549 Sep-01-2022, 05:44 AM
Last Post: deneme2
  Nested for loops - help with iterating a variable outside of the main loop dm222 4 1,644 Aug-17-2022, 10:17 PM
Last Post: deanhystad
  loop (create variable where name is dependent on another variable) brianhclo 1 1,170 Aug-05-2022, 07:46 AM
Last Post: bowlofred
  Multiple Loop Statements in a Variable Dexty 1 1,224 May-23-2022, 08:53 AM
Last Post: bowlofred
Big Grin Variable flag vs code outside of for loop?(Disregard) cubangt 2 1,205 Mar-16-2022, 08:54 PM
Last Post: cubangt

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020