Python Forum

Hello guys !

Well I'll dive straight into my question.

I changed from PHP to Python some weeks ago, and Im totally into web-scraping.

As my first 'real-need-project' I'm trying to get the latest driver Upload and the version from surface devices.
-> https://www.microsoft.com/en-us/download...x?id=56261

[Image: 6Y4Xr5P]

Due to the versions are hidden under a dropdown ( js ) i cant access the values directly ...

Well Now the question is there a possabilty in python to open the dropdown ?

I know i can do it with selenium. But im searching for some other solutions aswell.

In php I used requests and and http scrapper module.

I want to try to work with requests only. I heared its the most efficient way.

If there is no other way then selenium I need to use I guess ^^.
Its just some pain in the ass to install everything, include it to path and stuff like that until its properly working x.x

Thanks for the answer!

there is no problem to parse this page

import requests
from bs4 import BeautifulSoup
url = 'https://www.microsoft.com/en-us/download/details.aspx?id=56261'

resp = requests.get(url)
soup = BeautifulSoup(resp.text, 'html.parser')
file_info = soup.find('div', {'class':'fileinfo'})
for p in file_info.find_all('p'):
    print(p.text)

Output:1.0
SurfaceBook2_Win10_18362_19.101.13994.0.msi
SurfaceBook2_Win10_15063_1802509_3.msi
SurfaceBook2_Win10_16299_1803509_3.msi
SurfaceBook2_Win10_17134_19.101.14240.0.msi
SurfaceBook2_Win10_17763_1805009_0.msi
10/14/2019
976.4 MB
622.1 MB
956.9 MB
985.4 MB
985.4 MB

Now, you need to refine it, because there are plenty of nested divs... I leave this to you

I changed from PHP to Python some weeks ago, and Im totally into web-scraping.
Take a look at.
Web-Scraping part-1
Web-Scraping part-2

(Nov-01-2019, 11:02 AM)buran Wrote: [ -> ]there is no problem to parse this page

import requests
from bs4 import BeautifulSoup
url = 'https://www.microsoft.com/en-us/download/details.aspx?id=56261'

resp = requests.get(url)
soup = BeautifulSoup(resp.text, 'html.parser')
file_info = soup.find('div', {'class':'fileinfo'})
for p in file_info.find_all('p'):
    print(p.text)

Output:1.0
SurfaceBook2_Win10_18362_19.101.13994.0.msi
SurfaceBook2_Win10_15063_1802509_3.msi
SurfaceBook2_Win10_16299_1803509_3.msi
SurfaceBook2_Win10_17134_19.101.14240.0.msi
SurfaceBook2_Win10_17763_1805009_0.msi
10/14/2019
976.4 MB
622.1 MB
956.9 MB
985.4 MB
985.4 MB

Now, you need to refine it, because there are plenty of nested divs... I leave this to you

Thanks !
Hmm that's weird when I tried it ( searched for the the first div id instead of the class it didnt work. Seems like i messed that badly up haha :D ) . Didnt knew that bs get the dom from dropdowns aswell.
Paquettg (PHP) wasn't able to do so.

Many thanks for the reply.
date and version are now easy to extract.

(Nov-01-2019, 01:21 PM)snippsat Wrote: [ -> ]I changed from PHP to Python some weeks ago, and Im totally into web-scraping.
Take a look at.
Web-Scraping part-1
Web-Scraping part-2

Thanks ! I went through it in the past 2 hours.
Seems like its not possible to manipulate ( check/uncheck) checkboxes without selenium.

But as long as I can run Selenium in headless mode it shouldnt affect the runtime as badly as when a new brwoser popsup ...

I used selenium in past that way. And this was simply painful when the script opens a brwoser and moves the mouse automatically ..

Thanks for the great help guys !
Both +rep'ed

Have a nice weekend

(Nov-01-2019, 03:38 PM)Fre3k Wrote: [ -> ]Seems like its not possible to manipulate ( check/uncheck) checkboxes without selenium.

A checkbox is just an input field with a "checked" attribute. There's no reason you can't check it without selenium before saving the page to disk.

Fre3k

buran

snippsat

Fre3k

nilamo