Bottom Page

Thread Rating:
  • 1 Vote(s) - 4 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Scraping .aspx page
#1
How the heck do you scrape an .aspx page??
i try to get the page with requests and it seems to be stuck downloading,
or it's trying to download all links automatically.

I have zero experience with this type of web page,

Thanks again Microsoft!
Quote
#2
Hm! Webkit?
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."
https://freedns.afraid.org
Quote
#3
Can you post a link?
Quote
#4
Here's the site for California Public data Catalog: http://publicpay.ca.gov/Reports/RawExport.aspx

I see some info on Scrapy being able to scrape ASP.Net stuff, but very little.
I'd rather use beautifulsoup or lxml if possible.

One thing I noticed, that makes me think there's an easy method (or at least a method) to convert to html
is that right clicking on the page while in Firefox, and selecting page source immediately brings up the page in html.

Haven't determined if that's useful or not yet.
Quote
#5
from bs4 import BeautifulSoup
import requests

url = 'http://publicpay.ca.gov/Reports/RawExport.aspx'
url_get = requests.get(url)
soup = BeautifulSoup(url_get.content, 'lxml')
col = soup.find('div', class_="column_main")
col_all = col.find_all('a')
for link in col_all:
   print(link.get('href')
Output:
/RawExport/2015_CaliforniaStateUniversity.zip /RawExport/2015_City.zip /RawExport/2015_CommunityCollegeDistrict.zip /RawExport/2015_County.zip /RawExport/2015_FairsExpos.zip /RawExport/2015_First5.zip /RawExport/2015_K12Education.zip /RawExport/2015_SpecialDistrict.zip /RawExport/2015_StateDepartment.zip /RawExport/2015_SuperiorCourt.zip /RawExport/2015_UniversityOfCalifornia.zip /RawExport/2014_CaliforniaStateUniversity.zip /RawExport/2014_City.zip ..............
Url is first for all is http://publicpay.ca.gov + link that i get out here.
Now can choose download method eg urlreceive() or use write 'wb' with Requests.
For larger files chunk them up can be useful.
 with open(path, 'wb') as f:
    for chunk in r.iter_content(1024):
        f.write(chunk)
zivoni and Larz60+ like this post
Quote
#6
Snippsat,

I was thinking you were going to be the savior in this one!.

Thanks a lot, you gave me a bonus ... more than I expected!
Quote
#7
.aspx is just html that has c# on the backend (...or visual basic, if whoever wrote the site hates themselves). If the data is on the page, it should be easy to do. If it's NOT, and instead is something like a search form to load results, then things get more difficult. ASP (or at least older versions of it) use something called a "viewstate", which is a hidden field in forms to keep track of the state of server-side variables. It's a trash way of doing things, and most people just used cookies/sessions anyway, but a lot of things snuck into the viewstate if you didn't pay too close attention.

So if you need to get data, sometimes you have to request the base page, scrape it for no reason than to grab what the viewstate value is, and THEN request the actual page, supplying the viewstate you scraped. (...and then use the new viewstate value, in case the results are paginated...)
Quote
#8
As in my 1st post of this thread:

Thanks again Microsoft!
Quote
#9
I actually use asp.net at work every day. It's pretty good at what it does. It's just the older versions of it... did some odd things. It was very clear that they were trying hard to make websites feel like desktop applications, with button event handlers and whatnot.
Quote
#10
I think it looks great.
... I don't have to like it's structure.
Maybe someday I'll love it.

Maybe some day hell will freeze over.
Stranger things have happened

I didn't like indentation when I started using python.
Now I don't even think about it

Another file format that drove me crazy was Lotus-123
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  Failure in web scraping by Beautiful Soup yeungcase 4 241 Mar-23-2019, 12:36 PM
Last Post: metulburr
  page navigation & form filling rudolphyaber 0 88 Mar-13-2019, 06:31 PM
Last Post: rudolphyaber
  Scrape ASPX data with python... hoff1022 0 240 Feb-26-2019, 06:16 PM
Last Post: hoff1022
  python scraping all cookies halberdd 1 175 Feb-26-2019, 12:10 PM
Last Post: halberdd
  Django Two blocks of dynamic content on one page iFunKtion 3 191 Feb-20-2019, 07:35 PM
Last Post: iFunKtion
  Sorting getting off, when I switch page Django 1.11 m0ntecr1st0 0 111 Feb-12-2019, 06:40 PM
Last Post: m0ntecr1st0
  Scraping a webpage with BS4 SBF12345 3 222 Jan-30-2019, 12:47 AM
Last Post: Larz60+
  Web scraping (selenium (i think)) Larz60+ 10 490 Jan-27-2019, 02:57 AM
Last Post: Larz60+
  web scraping help in getting the output kapilan15 1 206 Jan-15-2019, 04:52 PM
Last Post: Larz60+
  Selenium Parsing (unable to Parse page after loading) oneclick 6 408 Jan-13-2019, 03:10 AM
Last Post: oneclick

Forum Jump:


Users browsing this thread: 1 Guest(s)