Python Forum

Full Version: Getting from <td> tag by using urllib,Beautifulsoup
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I would like to create a program to check for updates on a regular basis.
In the "VMware ESXi" release notes, the version is in a table (i.e., in a <td> tag).
To do this, I want to scrape from urllib and then use BeautifulSoup to filter the information in the <td> tag,
so I wrote the following code, but it returned "None".

import urllib.request, urllib.error, urllib.parse, re
from bs4 import BeautifulSoup
import binascii

header = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36'}
#Spoofing
root = 'https://kb.vmware.com/s/article/2143832'
url = urllib.request.Request(root,headers=header)
response = urllib.request.urlopen(url).read().decode('utf-8')
soup = BeautifulSoup(response)
corn_soup = soup.find('td')

print(corn_soup)
I think I'm accessing the site correctly, but I don't think I'm getting the information I need in the soup.
(Aug-18-2021, 02:58 AM)KuroBuster Wrote: [ -> ]I think I'm accessing the site correctly, but I don't think I'm getting the information I need in the soup.
Information is generated bye JavaScript,then Selenium is a option.

An other more advance way is to look at source and what's send over network.
Here catch JSON response,as a advice use Requests and not urllib
import requests
from pprint import pprint

url = 'https://kb.vmware.com/services/apexrest/v1/article?docid=2143832&lang=en_us'
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36'
}
response = requests.get(url, headers=headers)
data = response.json()
products = data['meta']['articleProducts']['relatedProducts']
versions = data['meta']['articleProducts']['relatedVersions']
pprint(products)
print('-' * 30)
pprint(versions) 
Output:
['VMware vSphere ESXi', 'VMware vSphere ESX', 'VMware vSphere', 'VMware ESXi', 'VMware ESX Server', 'VMware ESX'] ------------------------------ ['VMware vSphere ESXi 7.0.0', 'VMware vSphere ESXi 6.7', 'VMware vSphere ESXi 6.5', 'VMware vSphere ESXi 6.0', 'VMware vSphere ESXi 5.5', 'VMware vSphere ESXi 5.1', 'VMware vSphere ESXi 5.0', 'VMware vSphere ESX 4.x', 'VMware ESXi 4.1.x Installable', 'VMware ESXi 4.1.x Embedded', 'VMware ESXi 4.0.x Installable', 'VMware ESXi 4.0.x Embedded', 'VMware ESX Server 3.5.x', 'VMware ESX Server 3.0.x', 'VMware ESX Server 2.5.x', 'VMware ESX Server 2.1.x', 'VMware ESX Server 2.0.x', 'VMware ESX Server 1.x', 'VMware ESX Server 1.5.x']
Ohh exactly what I was looking for!
Thanks! Big Grin