The pages have the same URL base, with the letter added to the end.
https://www.usa.gov/federal-agencies/a
https://www.usa.gov/federal-agencies/b
etc.
pseudo code:
for char in valid_pages
within each page, the following can be used as an anchor:
<ul class="az-list group">
After that, all links (regular <a tags) up until the </ul>
are what you need.
so seems pretty simple.
https://www.usa.gov/federal-agencies/a
https://www.usa.gov/federal-agencies/b
etc.
>>> baseurl = 'https://www.usa.gov/federal-agencies/' >>> valid_pages = 'abcdefghijlmnoprstuvw' >>> for n in range(len(valid_pages)): ... url = f'{baseurl}{valid_pages[n]}' ... print(url) ... https://www.usa.gov/federal-agencies/a https://www.usa.gov/federal-agencies/b https://www.usa.gov/federal-agencies/c https://www.usa.gov/federal-agencies/d https://www.usa.gov/federal-agencies/e https://www.usa.gov/federal-agencies/f https://www.usa.gov/federal-agencies/g https://www.usa.gov/federal-agencies/h https://www.usa.gov/federal-agencies/i https://www.usa.gov/federal-agencies/j https://www.usa.gov/federal-agencies/l https://www.usa.gov/federal-agencies/m https://www.usa.gov/federal-agencies/n https://www.usa.gov/federal-agencies/o https://www.usa.gov/federal-agencies/p https://www.usa.gov/federal-agencies/r https://www.usa.gov/federal-agencies/s https://www.usa.gov/federal-agencies/t https://www.usa.gov/federal-agencies/u https://www.usa.gov/federal-agencies/v https://www.usa.gov/federal-agencies/w >>>so can iterate over this:
pseudo code:
for char in valid_pages
within each page, the following can be used as an anchor:
<ul class="az-list group">
After that, all links (regular <a tags) up until the </ul>
are what you need.
so seems pretty simple.