Python Forum
Getting from <td> tag by using urllib,Beautifulsoup
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Getting from <td> tag by using urllib,Beautifulsoup
#1
I would like to create a program to check for updates on a regular basis.
In the "VMware ESXi" release notes, the version is in a table (i.e., in a <td> tag).
To do this, I want to scrape from urllib and then use BeautifulSoup to filter the information in the <td> tag,
so I wrote the following code, but it returned "None".

import urllib.request, urllib.error, urllib.parse, re
from bs4 import BeautifulSoup
import binascii

header = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36'}
#Spoofing
root = 'https://kb.vmware.com/s/article/2143832'
url = urllib.request.Request(root,headers=header)
response = urllib.request.urlopen(url).read().decode('utf-8')
soup = BeautifulSoup(response)
corn_soup = soup.find('td')

print(corn_soup)
I think I'm accessing the site correctly, but I don't think I'm getting the information I need in the soup.
Reply
#2
(Aug-18-2021, 02:58 AM)KuroBuster Wrote: I think I'm accessing the site correctly, but I don't think I'm getting the information I need in the soup.
Information is generated bye JavaScript,then Selenium is a option.

An other more advance way is to look at source and what's send over network.
Here catch JSON response,as a advice use Requests and not urllib
import requests
from pprint import pprint

url = 'https://kb.vmware.com/services/apexrest/v1/article?docid=2143832&lang=en_us'
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36'
}
response = requests.get(url, headers=headers)
data = response.json()
products = data['meta']['articleProducts']['relatedProducts']
versions = data['meta']['articleProducts']['relatedVersions']
pprint(products)
print('-' * 30)
pprint(versions) 
Output:
['VMware vSphere ESXi', 'VMware vSphere ESX', 'VMware vSphere', 'VMware ESXi', 'VMware ESX Server', 'VMware ESX'] ------------------------------ ['VMware vSphere ESXi 7.0.0', 'VMware vSphere ESXi 6.7', 'VMware vSphere ESXi 6.5', 'VMware vSphere ESXi 6.0', 'VMware vSphere ESXi 5.5', 'VMware vSphere ESXi 5.1', 'VMware vSphere ESXi 5.0', 'VMware vSphere ESX 4.x', 'VMware ESXi 4.1.x Installable', 'VMware ESXi 4.1.x Embedded', 'VMware ESXi 4.0.x Installable', 'VMware ESXi 4.0.x Embedded', 'VMware ESX Server 3.5.x', 'VMware ESX Server 3.0.x', 'VMware ESX Server 2.5.x', 'VMware ESX Server 2.1.x', 'VMware ESX Server 2.0.x', 'VMware ESX Server 1.x', 'VMware ESX Server 1.5.x']
Reply
#3
Ohh exactly what I was looking for!
Thanks! Big Grin
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Beginner: urllib error tomfry 7 6,522 May-03-2020, 04:35 AM
Last Post: Larz60+
  SSLCertVerificationError using urllib (urlopen) FalseFact 1 5,880 Mar-31-2019, 08:34 AM
Last Post: snippsat
  urllib request urlopen? nutgut 4 5,488 Apr-14-2018, 01:12 PM
Last Post: nutgut

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020