Scraping .aspx page

**Larz60+** · Mar-16-2017, 02:28 PM

How the heck do you scrape an .aspx page??
i try to get the page with requests and it seems to be stuck downloading,
or it's trying to download all links automatically.

I have zero experience with this type of web page,

Thanks again Microsoft!

wavic · Mar-16-2017, 02:41 PM

Hm! Webkit?

***snippsat*** · Mar-16-2017, 02:55 PM

Can you post a link?

**Larz60+** · (This post was last modified: Mar-16-2017, 04:04 PM by Larz60+.)

Here's the site for California Public data Catalog: http://publicpay.ca.gov/Reports/RawExport.aspx

I see some info on Scrapy being able to scrape ASP.Net stuff, but very little.
I'd rather use beautifulsoup or lxml if possible.

One thing I noticed, that makes me think there's an easy method (or at least a method) to convert to html
is that right clicking on the page while in Firefox, and selecting page source immediately brings up the page in html.

Haven't determined if that's useful or not yet.

***snippsat*** · (This post was last modified: Mar-16-2017, 04:11 PM by snippsat.)

from bs4 import BeautifulSoup
import requests

url = 'http://publicpay.ca.gov/Reports/RawExport.aspx'
url_get = requests.get(url)
soup = BeautifulSoup(url_get.content, 'lxml')
col = soup.find('div', class_="column_main")
col_all = col.find_all('a')
for link in col_all:
   print(link.get('href')

Output:/RawExport/2015_CaliforniaStateUniversity.zip
/RawExport/2015_City.zip
/RawExport/2015_CommunityCollegeDistrict.zip
/RawExport/2015_County.zip
/RawExport/2015_FairsExpos.zip
/RawExport/2015_First5.zip
/RawExport/2015_K12Education.zip
/RawExport/2015_SpecialDistrict.zip
/RawExport/2015_StateDepartment.zip
/RawExport/2015_SuperiorCourt.zip
/RawExport/2015_UniversityOfCalifornia.zip
/RawExport/2014_CaliforniaStateUniversity.zip
/RawExport/2014_City.zip
..............

Url is first for all is http://publicpay.ca.gov + link that i get out here.
Now can choose download method eg urlreceive() or use write 'wb' with Requests.
For larger files chunk them up can be useful.

 with open(path, 'wb') as f:
    for chunk in r.iter_content(1024):
        f.write(chunk)

**Larz60+** · Mar-16-2017, 06:13 PM

Snippsat,

I was thinking you were going to be the savior in this one!.

Thanks a lot, you gave me a bonus ... more than I expected!

**nilamo** · Mar-17-2017, 07:19 PM

.aspx is just html that has c# on the backend (...or visual basic, if whoever wrote the site hates themselves). If the data is on the page, it should be easy to do. If it's NOT, and instead is something like a search form to load results, then things get more difficult. ASP (or at least older versions of it) use something called a "viewstate", which is a hidden field in forms to keep track of the state of server-side variables. It's a trash way of doing things, and most people just used cookies/sessions anyway, but a lot of things snuck into the viewstate if you didn't pay too close attention.

So if you need to get data, sometimes you have to request the base page, scrape it for no reason than to grab what the viewstate value is, and THEN request the actual page, supplying the viewstate you scraped. (...and then use the new viewstate value, in case the results are paginated...)

**Larz60+** · Mar-17-2017, 07:30 PM

As in my 1st post of this thread:

Thanks again Microsoft!

**nilamo** · Mar-17-2017, 07:33 PM

I actually use asp.net at work every day. It's pretty good at what it does. It's just the older versions of it... did some odd things. It was very clear that they were trying hard to make websites feel like desktop applications, with button event handlers and whatnot.

**Larz60+** · (This post was last modified: Mar-17-2017, 07:40 PM by Larz60+.)

I think it looks great.
... I don't have to like it's structure.
Maybe someday I'll love it.

Maybe some day hell will freeze over.
Stranger things have happened

I didn't like indentation when I started using python.
Now I don't even think about it

Another file format that drove me crazy was Lotus-123

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	I am scraping a web page but got an Error	Sarmad54	3	1,456	Mar-02-2023, 08:20 PM Last Post: Sarmad54
	Scraping the page without distorting content	oleglpts	5	2,494	Dec-16-2021, 05:08 PM Last Post: oleglpts
	Scraping a page with log in data (security, proxies)	iamaghost	0	2,146	Mar-27-2021, 02:56 PM Last Post: iamaghost
	How to get registeration data from a website that uses .aspx? Help me brothers.	humble_coder	1	2,457	Feb-18-2021, 06:03 PM Last Post: Larz60+
	Scraping Whole Page Source	GJG	1	2,148	Jan-13-2021, 03:19 PM Last Post: GJG
	use Xpath in Python :: libxml2 for a page-to-page skip-setting	apollo	2	3,637	Mar-19-2020, 06:13 PM Last Post: apollo
	Scraping next page of LinkedIn jobs	RiteshMahto	6	6,419	Dec-09-2019, 09:43 PM Last Post: Larz60+
	Scraping data from ebay seller page	yuvalta	3	5,993	Sep-25-2019, 04:22 AM Last Post: sandramoraes
	Scrape ASPX data with python...	hoff1022	0	4,535	Feb-26-2019, 06:16 PM Last Post: hoff1022
	Web Page not opening while web scraping through python selenium	sumandas89	4	10,107	Nov-19-2018, 02:47 PM Last Post: snippsat

Scraping .aspx page

User Panel Messages

Announcements