Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Exclude unwanted match-result
#1
Hello guys!
I encountered a situation which i think is good to learn.
I get 2 matches when im trying to find_all with Selenium.
Im trying to scrape views from Youtube.

Here is my code:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup
import time
 
# Browser setup
options = Options()
options.add_argument("--headless")
#options.add_argument("--window-size=1980,1020")
browser = webdriver.Chrome(executable_path=r'/usr/bin/chromedriver', options=options)

url = "https://www.youtube.com/channel/UCwTrHPEglCkDz54iSg9ss9Q/videos"
browser.get(url)

# Click Allow-button on Youtube consent-page
consent = WebDriverWait(browser, 120).until(EC.element_to_be_clickable((By.XPATH, '//div[@class="VfPpkd-RLmnJb"]')))
consent.click()

# Send to BS
soup = BeautifulSoup(browser.page_source, 'lxml')
views = soup.find_all('span', {'class':'style-scope ytd-grid-video-renderer'})

for v in views:
    print("---------")
    print(v.text)
There are 2 exactly the same blocks of code on Youtube:
<span class="style-scope ytd-grid-video-renderer">14&nbsp;817 views</span>
...and...
<span class="style-scope ytd-grid-video-renderer">7 hours ago</span>
The output of my code:
---------
42K views
---------
1 month ago
---------
119K views
---------
1 month ago
...
...
...
I just want to get the views, is there a way to exclude the second output ("1 month ago"-output) ?

Thank you!
Reply
#2
(Apr-25-2021, 01:57 PM)jimsxxl Wrote: There are 2 exactly the same blocks of code on Youtube:
Use CCS selector to get exact tag,in browser first inspect right click Copy ➡ Copy ➡ Selector
In BS then you use select() or select_one().
view_count = soup.select_one('#metadata-line > span:nth-child(1)')
print(view_count.text)
Output:
Sett 18k ganger
Reply
#3
(Apr-25-2021, 04:36 PM)snippsat Wrote:
(Apr-25-2021, 01:57 PM)jimsxxl Wrote: There are 2 exactly the same blocks of code on Youtube:
Use CCS selector to get exact tag,in browser first inspect right click Copy ➡ Copy ➡ Selector
In BS then you use select() or select_one().
view_count = soup.select_one('#metadata-line > span:nth-child(1)')
print(view_count.text)
Output:
Sett 18k ganger

Wow, thank you snippsat !
Im learning alot, thanks for all the help.

I also used .replace to remove the text "views".
for t, v in zip(title, views):

    links = t['href']       # Get video URL
    view = v.text
    
    print("Title:", t.text, "| URL: http://www.youtube.com" + links, "| Views:", view.replace("views", ""))
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  How to exclude certain links while webscraping basis on keywords Prince_Bhatia 0 3,199 Oct-31-2018, 07:00 AM
Last Post: Prince_Bhatia

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020