Python Forum
Thread Rating:
  • 2 Vote(s) - 4 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Scraping .aspx page
#1
How the heck do you scrape an .aspx page??
i try to get the page with requests and it seems to be stuck downloading,
or it's trying to download all links automatically.

I have zero experience with this type of web page,

Thanks again Microsoft!
Reply
#2
Hm! Webkit?
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."
https://freedns.afraid.org
Reply
#3
Can you post a link?
Reply
#4
Here's the site for California Public data Catalog: http://publicpay.ca.gov/Reports/RawExport.aspx

I see some info on Scrapy being able to scrape ASP.Net stuff, but very little.
I'd rather use beautifulsoup or lxml if possible.

One thing I noticed, that makes me think there's an easy method (or at least a method) to convert to html
is that right clicking on the page while in Firefox, and selecting page source immediately brings up the page in html.

Haven't determined if that's useful or not yet.
Reply
#5
from bs4 import BeautifulSoup
import requests

url = 'http://publicpay.ca.gov/Reports/RawExport.aspx'
url_get = requests.get(url)
soup = BeautifulSoup(url_get.content, 'lxml')
col = soup.find('div', class_="column_main")
col_all = col.find_all('a')
for link in col_all:
   print(link.get('href')
Output:
/RawExport/2015_CaliforniaStateUniversity.zip /RawExport/2015_City.zip /RawExport/2015_CommunityCollegeDistrict.zip /RawExport/2015_County.zip /RawExport/2015_FairsExpos.zip /RawExport/2015_First5.zip /RawExport/2015_K12Education.zip /RawExport/2015_SpecialDistrict.zip /RawExport/2015_StateDepartment.zip /RawExport/2015_SuperiorCourt.zip /RawExport/2015_UniversityOfCalifornia.zip /RawExport/2014_CaliforniaStateUniversity.zip /RawExport/2014_City.zip ..............
Url is first for all is http://publicpay.ca.gov + link that i get out here.
Now can choose download method eg urlreceive() or use write 'wb' with Requests.
For larger files chunk them up can be useful.
 with open(path, 'wb') as f:
    for chunk in r.iter_content(1024):
        f.write(chunk)
Reply
#6
Snippsat,

I was thinking you were going to be the savior in this one!.

Thanks a lot, you gave me a bonus ... more than I expected!
Reply
#7
.aspx is just html that has c# on the backend (...or visual basic, if whoever wrote the site hates themselves). If the data is on the page, it should be easy to do. If it's NOT, and instead is something like a search form to load results, then things get more difficult. ASP (or at least older versions of it) use something called a "viewstate", which is a hidden field in forms to keep track of the state of server-side variables. It's a trash way of doing things, and most people just used cookies/sessions anyway, but a lot of things snuck into the viewstate if you didn't pay too close attention.

So if you need to get data, sometimes you have to request the base page, scrape it for no reason than to grab what the viewstate value is, and THEN request the actual page, supplying the viewstate you scraped. (...and then use the new viewstate value, in case the results are paginated...)
Reply
#8
As in my 1st post of this thread:

Thanks again Microsoft!
Reply
#9
I actually use asp.net at work every day. It's pretty good at what it does. It's just the older versions of it... did some odd things. It was very clear that they were trying hard to make websites feel like desktop applications, with button event handlers and whatnot.
Reply
#10
I think it looks great.
... I don't have to like it's structure.
Maybe someday I'll love it.

Maybe some day hell will freeze over.
Stranger things have happened

I didn't like indentation when I started using python.
Now I don't even think about it

Another file format that drove me crazy was Lotus-123
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  I am scraping a web page but got an Error Sarmad54 3 1,452 Mar-02-2023, 08:20 PM
Last Post: Sarmad54
  Scraping the page without distorting content oleglpts 5 2,486 Dec-16-2021, 05:08 PM
Last Post: oleglpts
  Scraping a page with log in data (security, proxies) iamaghost 0 2,144 Mar-27-2021, 02:56 PM
Last Post: iamaghost
  How to get registeration data from a website that uses .aspx? Help me brothers. humble_coder 1 2,452 Feb-18-2021, 06:03 PM
Last Post: Larz60+
  Scraping Whole Page Source GJG 1 2,139 Jan-13-2021, 03:19 PM
Last Post: GJG
  use Xpath in Python :: libxml2 for a page-to-page skip-setting apollo 2 3,626 Mar-19-2020, 06:13 PM
Last Post: apollo
  Scraping next page of LinkedIn jobs RiteshMahto 6 6,394 Dec-09-2019, 09:43 PM
Last Post: Larz60+
  Scraping data from ebay seller page yuvalta 3 5,986 Sep-25-2019, 04:22 AM
Last Post: sandramoraes
  Scrape ASPX data with python... hoff1022 0 4,529 Feb-26-2019, 06:16 PM
Last Post: hoff1022
  Web Page not opening while web scraping through python selenium sumandas89 4 10,097 Nov-19-2018, 02:47 PM
Last Post: snippsat

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020