Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Right modules to use ?
#4
(Nov-01-2019, 11:02 AM)buran Wrote: there is no problem to parse this page

import requests
from bs4 import BeautifulSoup
url = 'https://www.microsoft.com/en-us/download/details.aspx?id=56261'

resp = requests.get(url)
soup = BeautifulSoup(resp.text, 'html.parser')
file_info = soup.find('div', {'class':'fileinfo'})
for p in file_info.find_all('p'):
    print(p.text)
Output:
1.0 SurfaceBook2_Win10_18362_19.101.13994.0.msi SurfaceBook2_Win10_15063_1802509_3.msi SurfaceBook2_Win10_16299_1803509_3.msi SurfaceBook2_Win10_17134_19.101.14240.0.msi SurfaceBook2_Win10_17763_1805009_0.msi 10/14/2019 976.4 MB 622.1 MB 956.9 MB 985.4 MB 985.4 MB
Now, you need to refine it, because there are plenty of nested divs... I leave this to you
Thanks !
Hmm that's weird when I tried it ( searched for the the first div id instead of the class it didnt work. Seems like i messed that badly up haha :D ) . Didnt knew that bs get the dom from dropdowns aswell.
Paquettg (PHP) wasn't able to do so.

Many thanks for the reply.
date and version are now easy to extract.


(Nov-01-2019, 01:21 PM)snippsat Wrote: I changed from PHP to Python some weeks ago, and Im totally into web-scraping.
Take a look at.
Web-Scraping part-1
Web-Scraping part-2

Thanks ! I went through it in the past 2 hours.
Seems like its not possible to manipulate ( check/uncheck) checkboxes without selenium.

But as long as I can run Selenium in headless mode it shouldnt affect the runtime as badly as when a new brwoser popsup ...

I used selenium in past that way. And this was simply painful when the script opens a brwoser and moves the mouse automatically ..

Thanks for the great help guys !
Both +rep'ed

Have a nice weekend
Reply


Messages In This Thread
Right modules to use ? - by Fre3k - Nov-01-2019, 09:49 AM
RE: Right modules to use ? - by buran - Nov-01-2019, 11:02 AM
RE: Right modules to use ? - by snippsat - Nov-01-2019, 01:21 PM
RE: Right modules to use ? - by Fre3k - Nov-01-2019, 03:38 PM
RE: Right modules to use ? - by nilamo - Nov-01-2019, 04:12 PM

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020