Make good dataset from youtube video - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: Make good dataset from youtube video (/thread-24592.html) |
Make good dataset from youtube video - constantin01 - Feb-21-2020 Hi! I would like to collect about 200-300 videos from youtube where numbers are pronounced (like 1500, 212431, 32121 etc). but I want to have collection of video with high devirsity: exracts from news, talk-show, movies, blogs etc. I can find ids of videos by just query, like from youtube_transcript_api import YouTubeTranscriptApi as yt_trns from youtube_api import YouTubeDataAPI searches = yt.search(q="how to make a cake", max_results=5)but how I can check just random videos? Is it possible to make it automaticly? Perhaps, there are some channels that collect such videos for machine learning? RE: Make good dataset from youtube video - snippsat - Feb-21-2020 Some missing code yt .search have to defined somewhere To take a random sample out what's already found. from youtube_transcript_api import YouTubeTranscriptApi as yt_trns from youtube_api import YouTubeDataAPI import random api_key = 'xxxxxxxxxxxxxx' yt = YouTubeDataAPI(api_key) searches = yt.search(q="how to make a cake", max_results=10) rand_vid = random.sample(searches, 2)Looking deeper can also change the source,now the search use order_by="relevance" Start line 590 in youtube_api.py def search(self, q=None, channel_id=None, max_results=5, order_by="relevance", next_page_token=None, published_after=datetime.datetime(2000,1,1), published_before=datetime.datetime(3000,1,1), location=None, location_radius='1km', region_code=None, safe_search=None, relevance_language=None, event_type=None, topic_id=None, video_duration=None, search_type="video", parser=P.parse_rec_video_metadata, part=['snippet'] In the doc sting under: :param order_by: Return search results ordered by either ``relevance``, ``date``, ``rating``, ``title``, ``videoCount``, ``viewCount``.So can eg change to order_by="rating" and save,then the search will be different.These libraries are wrapper around the real API eg search list. Here will find under order the same parameters relevance,rating,date...ectThey make it easier than using this big API directly which also is possible a demo under. Here do directly call to the API,eg find how many subscriber a channel has. import requests api_key = 'xxxxxxxxxxxxxx' channel_id = 'UC5kS0l76kC0xOzMPtOmSFGw' response = requests.get(f'https://www.googleapis.com/youtube/v3/channels?part=statistics&id={channel_id}&key={api_key}') js = response.json() print(js['items'][0]['statistics']['subscriberCount'])
|