Python Forum
How to extract links from grid located on webpage
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to extract links from grid located on webpage
#1
Hello,

How to extract links from this url:
Une sélection de concerts électroniques et électrisants

I tried different tags (using selenium) ... nothing works.
Thanks in advance.

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By

url = 'https://www.arte.tv/fr/videos/RC-019798/electro-chillout/'
pattern = '.css-zqu0w1'
pattern_class = 'css-1tqpy7w'
pattern_class = 'css-1wbmdb2'
pattern_css = 'div.css-1tqpy7w:nth-child(1) > a:nth-child(1)'
pattern_class1 = 'css-1wbmdb2 [herf]'
#div.css-1tqpy7w:nth-child(2) > a:nth-child(1)
pattern_id = 'teaserItemLink'

options = Options()
options.add_argument("--headless")

driver = webdriver.Chrome(options=options)
driver.get(url)
grid = driver.find_elements(By.CLASS_NAME, pattern_class)
for item in grid:
    print(item.text)

aaa = driver.find_elements(By.CLASS_NAME, pattern_class1)
print(aaa)

bbb = driver.find_elements(By.ID, pattern_id)
print(bbb)
Reply
#2
Like this,and copy CSS selector from browse,then get the correct selector.
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time

#--| Setup
options = Options()
options.add_argument("--headless")
ser = Service(r"C:\cmder\bin\chromedriver.exe")
browser = webdriver.Chrome(service=ser, options=options)
#--| Parse or automation
url = 'https://www.arte.tv/fr/videos/RC-019798/electro-chillout/'
browser.get(url)
time.sleep(2)
link_1 =  browser.find_element(By.CSS_SELECTOR, '#__next > div > main > div.css-pb7yb6 > div > div:nth-child(1)')
>>> print(link_1.text)
Regarder Superpoze ARTE Concert Festival 2022 52 min
52 min
Superpoze
ARTE Concert Festival 2022
Link 2 will be div:nth-child(2)
>>> print(link_2.text)
Regarder La Fine Equipe Fête de l’Humanité 2020 60 min
60 min
La Fine Equipe
Fête de l’Humanité 2020
Reply
#3
What I'm looking for are links ... not text:
[Image: arte-links-in-grid.jpg]
Reply
#4
Use get_attribute() to get the href attribute.
link = browser.find_element(By.CSS_SELECTOR, '#__next > div > main > div.css-pb7yb6 > div > div:nth-child(1) > a')
>>> link.get_attribute('href')
'https://www.arte.tv/fr/videos/110984-006-A/superpoze/'
Pavel_47 likes this post
Reply
#5
Thanks !
Reply
#6
It seems that you are trying to extract links from the given URL using Selenium and the Chrome WebDriver. However, there are some issues in the way you are trying to locate the elements.

Use By.CSS_SELECTOR to locate the elements by the CSS selector. The class 'css-1wbmdb2' seems to be the correct class that contains the links.Then find all anchor elements (links) within this class and extract their 'href' attribute to get the URLs.

Here is have updated your code:-

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By

url = 'https://www.arte.tv/fr/videos/RC-019798/electro-chillout/'

options = Options()
options.add_argument("--headless")

driver = webdriver.Chrome(options=options)
driver.get(url)

# Find all anchor elements within the specific class 'css-1wbmdb2'
links = driver.find_elements(By.CSS_SELECTOR, '.css-1wbmdb2 a')

# Extract and print the href attribute of each link
for link in links:
    href = link.get_attribute('href')
    print(href)

driver.quit()
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  extract javascript links Larz60+ 0 1,794 Feb-16-2022, 10:49 AM
Last Post: Larz60+
  getting links from webpage and store it into an array apollo 1 2,562 May-22-2021, 03:35 PM
Last Post: perfringo
  Want to extract 4 tables from webpage - Nubee Stuck :( andrewjmdata1 0 1,741 Apr-19-2020, 05:42 PM
Last Post: andrewjmdata1
  Extract data from a webpage cycloneseb 5 2,889 Apr-04-2020, 10:17 AM
Last Post: alekson
  webscrapping links and then enter those links to scrape data kirito85 2 3,225 Jun-13-2019, 02:23 AM
Last Post: kirito85

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020