Python Forum
Make good dataset from youtube video - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: Make good dataset from youtube video (/thread-24592.html)



Make good dataset from youtube video - constantin01 - Feb-21-2020

Hi!

I would like to collect about 200-300 videos from youtube where numbers are pronounced (like 1500, 212431, 32121 etc).

but I want to have collection of video with high devirsity: exracts from news, talk-show, movies, blogs etc.

I can find ids of videos by just query, like

from youtube_transcript_api import YouTubeTranscriptApi as yt_trns
from youtube_api import YouTubeDataAPI
searches = yt.search(q="how to make a cake", max_results=5)
but how I can check just random videos? Is it possible to make it automaticly? Perhaps, there are some channels that collect such videos for machine learning?


RE: Make good dataset from youtube video - snippsat - Feb-21-2020

Some missing code yt.search have to defined somewhere Wink
To take a random sample out what's already found.
from youtube_transcript_api import YouTubeTranscriptApi as yt_trns
from youtube_api import YouTubeDataAPI
import random

api_key = 'xxxxxxxxxxxxxx'
yt = YouTubeDataAPI(api_key)
searches = yt.search(q="how to make a cake", max_results=10)
rand_vid = random.sample(searches, 2)
Looking deeper can also change the source,now the search use order_by="relevance"
Start line 590 in youtube_api.py
def search(self, q=None, channel_id=None,
           max_results=5, order_by="relevance", next_page_token=None,
           published_after=datetime.datetime(2000,1,1),
           published_before=datetime.datetime(3000,1,1),
           location=None, location_radius='1km', region_code=None,
           safe_search=None, relevance_language=None, event_type=None,
           topic_id=None, video_duration=None, search_type="video",
           parser=P.parse_rec_video_metadata, part=['snippet']

In the doc sting under:
:param order_by: Return search results ordered by either ``relevance``, ``date``, ``rating``, ``title``, ``videoCount``, ``viewCount``. 
So can eg change to order_by="rating" and save,then the search will be different.

These libraries are wrapper around the real API eg search list.
Here will find under order the same parameters relevance,rating,date...ect
They make it easier than using this big API directly which also is possible a demo under.
Here do directly call to the API,eg find how many subscriber a channel has.
import requests

api_key = 'xxxxxxxxxxxxxx'
channel_id = 'UC5kS0l76kC0xOzMPtOmSFGw'
response = requests.get(f'https://www.googleapis.com/youtube/v3/channels?part=statistics&id={channel_id}&key={api_key}')
js = response.json()
print(js['items'][0]['statistics']['subscriberCount'])
Output:
275000