Youtube video transcripts - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: Youtube video transcripts (/thread-24459.html) |
Youtube video transcripts - jehoshua - Feb-15-2020 Have been testing a Python API that can fetch transcripts from Youtube videos - https://github.com/jdepoix/youtube-transcript-api Some test scripts based on the documentation. #!/usr/bin/python from youtube_transcript_api import YouTubeTranscriptApi video_id = 'nTg6Rqlz6ts' # retrieve the available transcripts transcript_list = YouTubeTranscriptApi.get_transcript(video_id) # iterate over all available transcripts for transcript in transcript_list: # the Transcript object provides metadata properties print( transcript.video_id, transcript.language, transcript.language_code, # whether it has been manually created or generated by YouTube transcript.is_generated, # whether this transcript can be translated or not transcript.is_translatable, # a list of languages the transcript can be translated to transcript.translation_languages, ) # fetch the actual transcript data print(transcript.fetch()) # translating the transcript will return another transcript object print(transcript.translate('en').fetch()) # you can also directly filter for the language you are looking for, using the transcript list transcript = transcript_list.find_transcript(['de', 'en']) # or just filter for manually created transcripts transcript = transcript_list.find_manually_created_transcript(['de', 'en']) # or automatically generated ones transcript = transcript_list.find_generated_transcript(['de', 'en'])when I run it I get this error message Quote:$ python3 test6.py and the second script is very basic .. #!/usr/bin/python from youtube_transcript_api import YouTubeTranscriptApi video_id = 'nTg6Rqlz6ts' transcript_list = YouTubeTranscriptApi.get_transcript(video_id) print(transcript_list[0]) print(transcript_list[1]) print(transcript_list[2])and returns .. Quote:$ python3 test7.py 1. Can you please advise what is causing the error in the first script 2. How can the 2nd script be modified to search through the transcript for specific keyword/s ? There is a search example at https://stackoverflow.com/questions/54283003/how-to-get-maxresults-for-search-from-youtube-data-api-v3 . It searches on video title and works okay. Just referencing that as it may help to add code to that 2nd script to include a search on the transcription. RE: Youtube video transcripts - DeaD_EyE - Feb-15-2020 They key video_id is not in transcript .Replace line 14: transcript.video_id, with video_id, Then try again. If the next error occurs, you know that this attribute is also not provided by the current transcript element. By the way, video_id is always the same for the different translations of transcriptions.
RE: Youtube video transcripts - jehoshua - Feb-15-2020 (Feb-15-2020, 10:03 AM)DeaD_EyE Wrote: They key Thanks for your reply. Yes, that fixed it, but I had to also remove all of those other attributes. Possibly I need to make sure that package has those attributes. Have been adding extra code to the other script, so that it writes to a file and prints; helps understand the data coming back. #!/usr/bin/python from youtube_transcript_api import YouTubeTranscriptApi video_id = 'My3QHayBBfQ' transcript_list = YouTubeTranscriptApi.get_transcript(video_id) print(transcript_list[0]) print(transcript_list[1]) print(transcript_list[2]) with open('your_file.txt', 'w') as f: for item in transcript_list: f.write("%s\n" % item)Is there a debug feature with python ? I'm used to stepping through code and examining variables, etc. |