Ok,
Couldn't resit writing this one:
This code can be run by itself, or imported into another module.
Once run, all that's needed in a class that wants to use the index is to load the json file into
a dictionary (see testit)
create a project directory and src directory
mkdir FederalAgencies
cd FederalAgencies
mkdir src
add to FederalAgencies directory:
module __init__.py
FederalAgencies/
__init__.py
src/
__init__.py
BuildFederalAgencyIndex.py
FederalPaths.py
add to src directory:
1. create an empty __init__.py file
save this in src directory as FederalPaths.py
from pathlib import Path
import os
class FederalPaths:
def __init__(self):
# Make sure start path is properly set
self.set_starting_dir()
self.homepath = Path('.')
self.rootpath = self.homepath / '..'
self.datapath = self.rootpath / 'data'
self.datapath.mkdir(exist_ok=True)
self.outpath = self.datapath / 'json'
self.outpath.mkdir(exist_ok=True)
self.gov_urlbase = 'https://www.usa.gov/'
self.baseurl = 'https://www.usa.gov/federal-agencies/'
self.valid_pages = 'abcdefghijlmnoprstuvw'
self.fed_index_file = self.outpath / 'FedIndex.json'
def set_starting_dir(self):
path = Path(__file__).resolve()
path, file = os.path.split(path)
path = os.path.abspath(path)
os.chdir(path)
def testit():
FederalPaths()
if __name__ == '__main__':
testit()
save this one in src directory as BuildFederalAgencyIndex.py
import FederalPaths
import requests
from bs4 import BeautifulSoup
import sys
import json
class BuildFederalAgencyIndex:
def __init__(self):
self.fpath = FederalPaths.FederalPaths()
self.fed_index = {}
self.valid_pages = 'abcdefghijlmnoprstuvw'
self.build_index()
def build_index(self):
for n in range(len(self.valid_pages)):
alpha = self.valid_pages[n]
URL = f'{self.fpath.baseurl}{alpha}'
self.fed_index[alpha] = {}
try:
response = requests.get(URL)
soup = BeautifulSoup(response.content, 'lxml')
ulist = soup.find('ul', {"class": "one_column_bullet"} )
links = ulist.find_all('a')
for link in links:
suffix = link.get('href')
href = f'{self.fpath.gov_urlbase}{suffix}'
self.fed_index[alpha][link.text] = href
except:
print(f'error: {sys.exc_info()[0]}')
with self.fpath.fed_index_file.open('w') as jout:
json.dump(self.fed_index, jout)
def testit():
# Create json file
fa = BuildFederalAgencyIndex()
# test json file
with fa.fpath.fed_index_file.open() as fp:
fed_index = json.load(fp)
# Show all entries for 'c'
for name, url in fed_index['c'].items():
print(f'name: {name}, url: {url}')
# Individual entry:
print(f"\nIndividual entry url for Court of Appeals for Veterans Claims: {fed_index['c']['Court of Appeals for Veterans Claims']}")
if __name__ == '__main__':
testit()
test run:
cd FederalAgencies/src
This will create json file (in data/json directory) and print out all 'C' indexes:
directories will be created first time run
python BuildFederalAgencyIndex.py
results:
Output:
name: California, url: https://www.usa.gov//state-government/california
name: Capitol Police, url: https://www.usa.gov//federal-agencies/u-s-capitol-police
name: Capitol Visitor Center, url: https://www.usa.gov//federal-agencies/u-s-capitol-visitor-center
name: Career, Technical, and Adult Education, Office of, url: https://www.usa.gov//federal-agencies/office-of-career-technical-and-adult-education
name: Census Bureau, url: https://www.usa.gov//federal-agencies/u-s-census-bureau
name: Center for Food Safety and Applied Nutrition, url: https://www.usa.gov//federal-agencies/center-for-food-safety-and-applied-nutrition
name: Center for Nutrition Policy and Promotion (CNPP), url: https://www.usa.gov//federal-agencies/center-for-nutrition-policy-and-promotion
name: Centers for Disease Control and Prevention (CDC), url: https://www.usa.gov//federal-agencies/centers-for-disease-control-and-prevention
name: Centers for Medicare and Medicaid Services (CMS), url: https://www.usa.gov//federal-agencies/centers-for-medicare-and-medicaid-services
name: Central Command (CENTCOM), url: https://www.usa.gov//federal-agencies/u-s-central-command
name: Central Intelligence Agency (CIA), url: https://www.usa.gov//federal-agencies/central-intelligence-agency
name: Chemical Safety Board, url: https://www.usa.gov//federal-agencies/u-s-chemical-safety-board
name: Chief Acquisition Officers Council, url: https://www.usa.gov//federal-agencies/chief-acquisition-officers-council
name: Chief Financial Officers Council, url: https://www.usa.gov//federal-agencies/chief-financial-officers-council
name: Chief Human Capital Officers Council, url: https://www.usa.gov//federal-agencies/chief-human-capital-officers-council
name: Chief Information Officers Council, url: https://www.usa.gov//federal-agencies/chief-information-officers-council
name: Child Support Enforcement, Office of (OCSE), url: https://www.usa.gov//federal-agencies/office-of-child-support-enforcement
name: Circuit Courts of Appeal, url: https://www.usa.gov//federal-agencies/u-s-courts-of-appeal
name: Citizens' Stamp Advisory Committee, url: https://www.usa.gov//federal-agencies/citizens-stamp-advisory-committee
name: Citizenship and Immigration Services (USCIS), url: https://www.usa.gov//federal-agencies/u-s-citizenship-and-immigration-services
name: Civil Rights, Department of Education Office of, url: https://www.usa.gov//federal-agencies/office-for-civil-rights-department-of-education
name: Civil Rights, Department of Health and Human Services Office for, url: https://www.usa.gov//federal-agencies/office-for-civil-rights-department-of-health-and-human-services
name: Coast Guard, url: https://www.usa.gov//federal-agencies/u-s-coast-guard
name: Colorado, url: https://www.usa.gov//state-government/colorado
name: Commerce Department (DOC), url: https://www.usa.gov//federal-agencies/u-s-department-of-commerce
name: Commission of Fine Arts, url: https://www.usa.gov//federal-agencies/u-s-commission-of-fine-arts
name: Commission on Civil Rights, url: https://www.usa.gov//federal-agencies/commission-on-civil-rights
name: Commission on International Religious Freedom, url: https://www.usa.gov//federal-agencies/u-s-commission-on-international-religious-freedom
name: Commission on Presidential Scholars, url: https://www.usa.gov//federal-agencies/commission-on-presidential-scholars
name: Commission on Security and Cooperation in Europe (Helsinki Commission), url: https://www.usa.gov//federal-agencies/commission-on-security-and-cooperation-in-europe-helsinki-commission
name: Committee for the Implementation of Textile Agreements, url: https://www.usa.gov//federal-agencies/committee-for-the-implementation-of-textile-agreements
name: Committee on Foreign Investment in the United States, url: https://www.usa.gov//federal-agencies/committee-on-foreign-investment-in-the-united-states
name: Commodity Futures Trading Commission (CFTC), url: https://www.usa.gov//federal-agencies/u-s-commodity-futures-trading-commission
name: Community Oriented Policing Services (COPS), url: https://www.usa.gov//federal-agencies/community-oriented-policing-services
name: Community Planning and Development, url: https://www.usa.gov//federal-agencies/office-of-community-planning-and-development
name: Compliance, Office of, url: https://www.usa.gov//federal-agencies/office-of-compliance
name: Comptroller of the Currency, Office of (OCC), url: https://www.usa.gov//federal-agencies/office-of-the-comptroller-of-the-currency
name: Computer Emergency Readiness Team (US CERT), url: https://www.usa.gov//federal-agencies/computer-emergency-readiness-team
name: Congress—U.S. House of Representatives, url: https://www.usa.gov//federal-agencies/u-s-house-of-
representatives
name: Congress—U.S. Senate, url: https://www.usa.gov//federal-agencies/u-s-senate
name: Congressional Budget Office (CBO), url: https://www.usa.gov//federal-agencies/congressional-budget-office
name: Congressional Research Service, url: https://www.usa.gov//federal-agencies/congressional-research-service
name: Connecticut, url: https://www.usa.gov//state-government/connecticut
name: Consular Affairs, Bureau of, url: https://www.usa.gov//federal-agencies/bureau-of-consular-affairs
name: Consumer Financial Protection Bureau, url: https://www.usa.gov//federal-agencies/consumer-financial-protection-bureau
name: Consumer Product Safety Commission (CPSC), url: https://www.usa.gov//federal-agencies/consumer-product-safety-commission
name: Coordinating Council on Juvenile Justice and Delinquency Prevention, url: https://www.usa.gov//federal-agencies/coordinating-council-on-juvenile-justice-and-delinquency-prevention
name: Copyright Office, url: https://www.usa.gov//federal-agencies/copyright-office
name: Corporation for National and Community Service, url: https://www.usa.gov//federal-agencies/corporation-for-national-and-community-service
name: Corps of Engineers, url: https://www.usa.gov//federal-agencies/u-s-army-corps-of-engineers
name: Council of Economic Advisers, url: https://www.usa.gov//federal-agencies/council-of-economic-advisers
name: Council of the Inspectors General on Integrity and Efficiency, url: https://www.usa.gov//federal-agencies/council-of-the-inspectors-general-on-integrity-and-efficiency
name: Council on Environmental Quality, url: https://www.usa.gov//federal-agencies/council-on-environmental-quality
name: Court Services and Offender Supervision Agency for the District of Columbia, url: https://www.usa.gov//federal-agencies/court-services-and-offender-supervision-agency-for-the-district-of-columbia
name: Court of Appeals for Veterans Claims, url: https://www.usa.gov//federal-agencies/u-s-court-of-appeals-for-veterans-claims
name: Court of Appeals for the Armed Forces, url: https://www.usa.gov//federal-agencies/court-of-appeals-for-the-armed-forces
name: Court of Appeals for the Federal Circuit, url: https://www.usa.gov//federal-agencies/court-of-appeals-for-the-federal-circuit
name: Court of Federal Claims, url: https://www.usa.gov//federal-agencies/court-of-federal-claims
name: Court of International Trade, url: https://www.usa.gov//federal-agencies/court-of-international-trade
name: Customs and Border Protection, url: https://www.usa.gov//federal-agencies/u-s-customs-and-border-protection
Individual entry url for Court of Appeals for Veterans Claims: https://www.usa.gov//federal-agencies/u-s-court-of-appeals-for-veterans-claims