from bs4 import BeautifulSoup
import requests
url = 'http://publicpay.ca.gov/Reports/RawExport.aspx'
url_get = requests.get(url)
soup = BeautifulSoup(url_get.content, 'lxml')
col = soup.find('div', class_="column_main")
col_all = col.find_all('a')
for link in col_all:
print(link.get('href')
Output:
/RawExport/2015_CaliforniaStateUniversity.zip
/RawExport/2015_City.zip
/RawExport/2015_CommunityCollegeDistrict.zip
/RawExport/2015_County.zip
/RawExport/2015_FairsExpos.zip
/RawExport/2015_First5.zip
/RawExport/2015_K12Education.zip
/RawExport/2015_SpecialDistrict.zip
/RawExport/2015_StateDepartment.zip
/RawExport/2015_SuperiorCourt.zip
/RawExport/2015_UniversityOfCalifornia.zip
/RawExport/2014_CaliforniaStateUniversity.zip
/RawExport/2014_City.zip
..............
Url is first for all is
http://publicpay.ca.gov
+ link that i get out here.
Now can choose download method eg urlreceive() or use write 'wb' with Requests.
For larger files chunk them up can be useful.
with open(path, 'wb') as f:
for chunk in r.iter_content(1024):
f.write(chunk)