Exclude unwanted match-result

jimsxxl · Apr-25-2021, 01:57 PM

Hello guys!
I encountered a situation which i think is good to learn.
I get 2 matches when im trying to find_all with Selenium.
Im trying to scrape views from Youtube.

Here is my code:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup
import time
 
# Browser setup
options = Options()
options.add_argument("--headless")
#options.add_argument("--window-size=1980,1020")
browser = webdriver.Chrome(executable_path=r'/usr/bin/chromedriver', options=options)

url = "https://www.youtube.com/channel/UCwTrHPEglCkDz54iSg9ss9Q/videos"
browser.get(url)

# Click Allow-button on Youtube consent-page
consent = WebDriverWait(browser, 120).until(EC.element_to_be_clickable((By.XPATH, '//div[@class="VfPpkd-RLmnJb"]')))
consent.click()

# Send to BS
soup = BeautifulSoup(browser.page_source, 'lxml')
views = soup.find_all('span', {'class':'style-scope ytd-grid-video-renderer'})

for v in views:
    print("---------")
    print(v.text)

There are 2 exactly the same blocks of code on Youtube:

<span class="style-scope ytd-grid-video-renderer">14&nbsp;817 views</span>
...and...
<span class="style-scope ytd-grid-video-renderer">7 hours ago</span>

The output of my code:

---------
42K views
---------
1 month ago
---------
119K views
---------
1 month ago
...
...
...

I just want to get the views, is there a way to exclude the second output ("1 month ago"-output) ?

Thank you!

***snippsat*** · Apr-25-2021, 04:36 PM

(Apr-25-2021, 01:57 PM)jimsxxl Wrote: There are 2 exactly the same blocks of code on Youtube:

Use CCS selector to get exact tag,in browser first inspect right click Copy ➡ Copy ➡ Selector
In BS then you use select() or select_one().

view_count = soup.select_one('#metadata-line > span:nth-child(1)')
print(view_count.text)

Output:
Sett 18k ganger

jimsxxl · Apr-25-2021, 05:45 PM

(Apr-25-2021, 04:36 PM)snippsat Wrote:
(Apr-25-2021, 01:57 PM)jimsxxl Wrote: There are 2 exactly the same blocks of code on Youtube:
Use CCS selector to get exact tag,in browser first inspect right click Copy ➡ Copy ➡ Selector
In BS then you use select() or select_one().
view_count = soup.select_one('#metadata-line > span:nth-child(1)')
print(view_count.text)
Output:
Sett 18k ganger

Wow, thank you snippsat !
Im learning alot, thanks for all the help.

I also used .replace to remove the text "views".

for t, v in zip(title, views):

    links = t['href']       # Get video URL
    view = v.text
    
    print("Title:", t.text, "| URL: http://www.youtube.com" + links, "| Views:", view.replace("views", ""))

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	How to exclude certain links while webscraping basis on keywords	Prince_Bhatia	0	3,910	Oct-31-2018, 07:00 AM Last Post: Prince_Bhatia

Exclude unwanted match-result

User Panel Messages

Announcements