Python Forum
Get latest version off website and save it as variable [SOLVED] - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Get latest version off website and save it as variable [SOLVED] (/thread-35534.html)



Get latest version off website and save it as variable [SOLVED] - AlphaInc - Nov-14-2021

Hello everybody,

I try to get the latest version off an website (not the download but just the version).
For example this is the download site "https://gpac.wp.imt.fr/downloads/" and when manually press on Windows 64 bits it downloads version 1.0.1.

Is there a way to detect this information and save it as a variable in a python script?

So far, I have used this but it comes with two flaws (1) it's not entirely python and 2) it only saves the entire download link):

import os

os.system (curl -s https://gpac.wp.imt.fr/downloads/ | grep x64)
I would get the information for this particular software from somewhere else but there a few sites I would like to grab the release version.


RE: Get latest version off website and save it as variable - snippsat - Nov-14-2021

A more normal way is to web-scrape the info you want.
Example.
import requests
from bs4 import BeautifulSoup

url = 'https://gpac.wp.imt.fr/downloads/'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'lxml')
print(soup.select_one('#post-147 > div > p:nth-child(3)').text)
# Just version
version = soup.select_one('#post-147 > div > p:nth-child(3) > strong').text
print(version)
Output:
The current GPAC release is 1.0.1 (released in September 2020). 1.0.1
This info #post-147 > div > p:nth-child(3) can just copy from Browser right click copy selector.
Then in BS use the CSS selector way with select or select_one.


RE: Get latest version off website and save it as variable - AlphaInc - Nov-14-2021

(Nov-14-2021, 10:55 AM)snippsat Wrote: A more normal way is to web-scrape the info you want.
Example.
import requests
from bs4 import BeautifulSoup

url = 'https://gpac.wp.imt.fr/downloads/'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'lxml')
print(soup.select_one('#post-147 > div > p:nth-child(3)').text)
# Just version
version = soup.select_one('#post-147 > div > p:nth-child(3) > strong').text
print(version)
Output:
The current GPAC release is 1.0.1 (released in September 2020). 1.0.1
This info #post-147 > div > p:nth-child(3) can just copy from Browser right click copy selector.
Then in BS use the CSS selector way with select or select_one.

Okay yeah that work's thanks.
I haven't understood how I can do it for other sites tho. For another example, how do I get the information for this site: https://www.makemkv.com/download/
Do I need to find out what CSS part I'm looking for using a browser?

Edit: Alright I got it (sorry getting it so late). My output is "#content > ul:nth-child(3) > li > a" and it prints "MakeMKV 1.16.5 for Windows". How do i cut it to only show the video? I didn't understand that in your part


RE: Get latest version off website and save it as variable - snippsat - Nov-14-2021

(Nov-14-2021, 08:02 PM)AlphaInc Wrote: I haven't understood how I can do it for other sites tho. For another example, how do I get the information for this site: https://www.makemkv.com/download/
Do I need to find out what CSS part I'm looking for using a browser?
It will be same way do some training in web-scaring can look at Web-Scraping part-1
Here a example using two ways.
import requests
from bs4 import BeautifulSoup

url = 'https://www.makemkv.com/download/'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'lxml')
li_ver = soup.find_all('li')[5]
print(li_ver.text)
print(soup.select_one('#content > li:nth-child(6)').text)
Output:
MakeMKV v1.16.5 (1.11.2021 ) MakeMKV v1.16.5 (1.11.2021 )



RE: Get latest version off website and save it as variable - AlphaInc - Nov-14-2021

(Nov-14-2021, 08:37 PM)snippsat Wrote:
(Nov-14-2021, 08:02 PM)AlphaInc Wrote: I haven't understood how I can do it for other sites tho. For another example, how do I get the information for this site: https://www.makemkv.com/download/
Do I need to find out what CSS part I'm looking for using a browser?
It will be same way do some training in web-scaring can look at Web-Scraping part-1
Here a example using two ways.
import requests
from bs4 import BeautifulSoup

url = 'https://www.makemkv.com/download/'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'lxml')
li_ver = soup.find_all('li')[5]
print(li_ver.text)
print(soup.select_one('#content > li:nth-child(6)').text)
Output:
MakeMKV v1.16.5 (1.11.2021 ) MakeMKV v1.16.5 (1.11.2021 )

Yeah sorry it took me a second to get what you mean. I have it like this:

import requests
from bs4 import BeautifulSoup

url = 'https://www.makemkv.com/download/'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'lxml')
version = soup.select_one('#content > ul:nth-child(3) > li > a').text
print(version)
It gets the output:
MakeMKV 1.16.5 for Windows
Was there a way to only get 1.16.5 (Without the other strings) or is this the best I get?


RE: Get latest version off website and save it as variable - DeaD_EyE - Nov-14-2021

import re
from collections import namedtuple

import bs4
import requests

MAKEMKV_BASE = "https://www.makemkv.com/download/"
VERSION_REG = re.compile(r"(\d+\.\d+\.\d+)")


def parse_version(file_name: str) -> str:
    if match := VERSION_REG.search(file_name):
        return match.group(1)
    else:
        return ""


MakeMKV = namedtuple("makemkv", "url version version_tuple os")


def get_makemv():
    content = requests.get(MAKEMKV_BASE).content
    doc = bs4.BeautifulSoup(content, "lxml")
    selector = "div#content > ul.bullets > li > a"
    for element in doc.select(selector, href=True):
        href = element["href"]
        if href.endswith(".txt"):
            continue

        version_str = parse_version(href)
        version_tuple = tuple(map(int, version_str.split(".")))
        name = element.text.lower()
        if "windows" in name:
            os_type = "windows"
        elif "mac os x" in name:
            os_type = "macos"
        else:
            os_type = "unkown"

        yield MakeMKV(href, version_str, version_tuple, os_type)


for result in get_makemv():
    print(result.os, result.version, result.url)
The inspector from Firefox helps a lot to find the elements.
I used this information to make the selector.