Python Forum

Full Version: How to access text files, hidden behind 'm3u8' resources
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello,

yt-dlp downloaded doesn't differentiate between 2 subtitles of the same languages: when specifying language code (e.g. 'fr') it takes by default the file let's say "for the deafs", where all conversations are subtitles - not just conversation in different (from 'fr') languages.
When exploring video resource info (with the same yt-dlp) we can differentiate two version of the same languages. They are hidden behind some 'm3u8' resources. Does someone aware of a method how to exploit these 'm3u8'.
Here is example:
import yt_dlp
link = 'https://www.arte.tv/fr/videos/107115-001-A/la-fille-de-kiev-1-6/'
ydl_opts = {}
with yt_dlp.YoutubeDL(ydl_opts) as ydl:
    info = ydl.extract_info(link, download=False)

for k, v in info.items():
	if k == 'subtitles':
		for k1, v1 in v.items():
			if k1 == 'fr':
				for item in v1:
					for k2, v2 in item.items():
						print(f'{k2:<20}{v2}')
Here is output:
Output:
(env_video_dnld) pavel@MISSURI:~/env_video_dnld$ python check_subtitles.py [ArteTV] Extracting URL: https://www.arte.tv/fr/videos/107115-001-A/la-fille-de-kiev-1-6/ [ArteTV] 107115-001-A: Downloading JSON metadata [ArteTV] 107115-001-A: Downloading m3u8 information [ArteTV] 107115-001-A: Downloading m3u8 information [ArteTV] 107115-001-A: Downloading m3u8 information [ArteTV] 107115-001-A: Downloading m3u8 information [ArteTV] 107115-001-A: Downloading m3u8 information <class 'dict'> url https://arte-cmafhls.akamaized.net/am/cmaf/107000/107100/107115-001-A/230214064237/medias/107115-001-A_st_VF-FRA.m3u8 ext vtt protocol m3u8_native url https://arte-cmafhls.akamaized.net/am/cmaf/107000/107100/107115-001-A/230214064237/medias/107115-001-A_st_VO-FRA.m3u8 ext vtt protocol m3u8_native (env_video_dnld) pavel@MISSURI:~/env_video_dnld$
Any suggestions ?
Thanks.
(Feb-19-2023, 11:06 AM)Pavel_47 Wrote: [ -> ]Any suggestions ?
No need to write own code for this.
Look at doc Subtitle Options.
# list available sub formats
(arte_env) G:\1_youtube\arte_env
λ yt-dlp --list-subs --skip-download https://www.arte.tv/fr/videos/107115-001-A/la-fille-de-kiev-1-6
[ArteTV] 107115-001-A: Downloading JSON metadata
WARNING: [ArteTV] Video is geo restricted. Retrying extraction with fake IP 90.85.161.221 (FR) as X-Forwarded-For.
[ArteTV] 107115-001-A: Downloading JSON metadata
[ArteTV] 107115-001-A: Downloading m3u8 information
[ArteTV] 107115-001-A: Downloading m3u8 information
[ArteTV] 107115-001-A: Downloading m3u8 information
[ArteTV] 107115-001-A: Downloading m3u8 information
[ArteTV] 107115-001-A: Downloading m3u8 information
[info] Available subtitles for 107115-001-A:
Language Formats
fr       vtt, vtt
de       vtt, vtt, vtt

# Download the sub(.vtt) format
(arte_env) G:\1_youtube\arte_env
λ yt-dlp --write-sub --all-subs --skip-download https://www.arte.tv/fr/videos/107115-001-A/la-fille-de-kiev-1-6/
[ArteTV] 107115-001-A: Downloading JSON metadata
WARNING: [ArteTV] Video is geo restricted. Retrying extraction with fake IP 53.182.113.183 (DE) as X-Forwarded-For.
[ArteTV] 107115-001-A: Downloading JSON metadata
[ArteTV] 107115-001-A: Downloading m3u8 information
[ArteTV] 107115-001-A: Downloading m3u8 information
[ArteTV] 107115-001-A: Downloading m3u8 information
[ArteTV] 107115-001-A: Downloading m3u8 information
[ArteTV] 107115-001-A: Downloading m3u8 information
[info] 107115-001-A: Downloading 1 format(s): VF-STF-2109+VF-STF-program_audio_0-VF
[info] Writing video subtitles to: La fille de Kiev (1_6) [107115-001-A].fr.vtt
[hlsnative] Downloading m3u8 manifest
[hlsnative] Total fragments: 1
[download] Destination: La fille de Kiev (1_6) [107115-001-A].fr.vtt
[download] 100% of 55.96KiB in 00:00
[info] Writing video subtitles to: La fille de Kiev (1_6) [107115-001-A].de.vtt
[hlsnative] Downloading m3u8 manifest
[hlsnative] Total fragments: 1
[download] Destination: La fille de Kiev (1_6) [107115-001-A].de.vtt
[download] 100% of 98.16KiB in 00:00
Doesn't work !
This version of 'fr' subtitles is for people with hearing impairment, where all conversations are subtitles, including where speech in french.
But there is other vtt file, that can be downloaded using browser "Web Developer Tools" facilities, where only different-from-french conversations are subtitles.
This file isn't accessible from yt-dlp.
yt-dlp will get what arte.tv make availble trough there API.
If Web Developer Tools show diffent sub-titlels then you have parse youself which can be difficult.
Diffent videos will have more or less sub-titles available.
(arte_env) G:\1_youtube\arte_env
λ yt-dlp --list-subs --skip-download https://www.arte.tv/en/videos/104026-000-A/the-old-woman-and-the-lake/
[ArteTV] 104026-000-A: Downloading JSON metadata
[ArteTV] 104026-000-A: Downloading m3u8 information
[ArteTV] 104026-000-A: Downloading m3u8 information
[ArteTV] 104026-000-A: Downloading m3u8 information
[ArteTV] 104026-000-A: Downloading m3u8 information
[ArteTV] 104026-000-A: Downloading m3u8 information
[ArteTV] 104026-000-A: Downloading m3u8 information
[ArteTV] 104026-000-A: Downloading m3u8 information
[info] Available subtitles for 104026-000-A:
Language Formats
en       vtt
fr       vtt
de       vtt
es       vtt
pl       vtt
it       vtt
The link you mentioned in your example is an example of a let's say "SIMPLE" link... in the sense that there is ONLY ONE version of subtitles for each language.
But on ARTE there are "COMPLICATED" links, where TWO versions of subtitles are associated with ONE language.
Here is example of "COMPLICATED" link:
Output:
(env_video_dnld) pavel@MISSURI:~/env_video_dnld$ yt-dlp --list-subs https://www.arte.tv/fr/videos/107115-001-A/la-fille-de-kiev-1-6/ [ArteTV] Extracting URL: https://www.arte.tv/fr/videos/107115-001-A/la-fille-de-kiev-1-6/ [ArteTV] 107115-001-A: Downloading JSON metadata [ArteTV] 107115-001-A: Downloading m3u8 information [ArteTV] 107115-001-A: Downloading m3u8 information [ArteTV] 107115-001-A: Downloading m3u8 information [ArteTV] 107115-001-A: Downloading m3u8 information [ArteTV] 107115-001-A: Downloading m3u8 information [info] Available subtitles for 107115-001-A: Language Formats fr vtt, vtt de vtt, vtt de_sdh vtt (env_video_dnld) pavel@MISSURI:~/env_video_dnld$
As you can see there are 2 version of subtitles for French and German.
yt-dlp can't distinguish two version of subtitles.
That's why I asked there maybe another method to get subtitles (not based on yt-dlp) that can explore .m3u8 links