Python Forum
How to access text files, hidden behind 'm3u8' resources
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to access text files, hidden behind 'm3u8' resources
#1
Hello,

yt-dlp downloaded doesn't differentiate between 2 subtitles of the same languages: when specifying language code (e.g. 'fr') it takes by default the file let's say "for the deafs", where all conversations are subtitles - not just conversation in different (from 'fr') languages.
When exploring video resource info (with the same yt-dlp) we can differentiate two version of the same languages. They are hidden behind some 'm3u8' resources. Does someone aware of a method how to exploit these 'm3u8'.
Here is example:
import yt_dlp
link = 'https://www.arte.tv/fr/videos/107115-001-A/la-fille-de-kiev-1-6/'
ydl_opts = {}
with yt_dlp.YoutubeDL(ydl_opts) as ydl:
    info = ydl.extract_info(link, download=False)

for k, v in info.items():
	if k == 'subtitles':
		for k1, v1 in v.items():
			if k1 == 'fr':
				for item in v1:
					for k2, v2 in item.items():
						print(f'{k2:<20}{v2}')
Here is output:
Output:
(env_video_dnld) pavel@MISSURI:~/env_video_dnld$ python check_subtitles.py [ArteTV] Extracting URL: https://www.arte.tv/fr/videos/107115-001-A/la-fille-de-kiev-1-6/ [ArteTV] 107115-001-A: Downloading JSON metadata [ArteTV] 107115-001-A: Downloading m3u8 information [ArteTV] 107115-001-A: Downloading m3u8 information [ArteTV] 107115-001-A: Downloading m3u8 information [ArteTV] 107115-001-A: Downloading m3u8 information [ArteTV] 107115-001-A: Downloading m3u8 information <class 'dict'> url https://arte-cmafhls.akamaized.net/am/cmaf/107000/107100/107115-001-A/230214064237/medias/107115-001-A_st_VF-FRA.m3u8 ext vtt protocol m3u8_native url https://arte-cmafhls.akamaized.net/am/cmaf/107000/107100/107115-001-A/230214064237/medias/107115-001-A_st_VO-FRA.m3u8 ext vtt protocol m3u8_native (env_video_dnld) pavel@MISSURI:~/env_video_dnld$
Any suggestions ?
Thanks.
Reply


Messages In This Thread
How to access text files, hidden behind 'm3u8' resources - by Pavel_47 - Feb-19-2023, 11:06 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  m3u8 using build-in browser downloader? kucingkembar 3 536 Mar-29-2024, 01:47 AM
Last Post: kucingkembar
  Automatic login hidden form Andra111 0 1,666 Mar-26-2020, 08:06 AM
Last Post: Andra111
  Access my webpage and download files from Python Pedroski55 7 5,775 May-26-2019, 12:08 PM
Last Post: snippsat
  Can't extract hidden input using Selenium!!! Shaggy89 1 2,866 Jul-12-2017, 12:24 PM
Last Post: metulburr

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020