How to to extract urls across multple webpages at once?

ilovewacha · (This post was last modified: Jun-17-2024, 08:05 AM by ilovewacha.)

I am trying to download videos from a site, which requires extracting 1 "download url" that resides on each "video url".

Example:

"video url": https://www.example.com/video/[string1]

"download url" (1 url on each video url): https://www.example.com/get_file/[string2]

Each "video url" has 1 "download url", so if I have 100 video urls, I will have 100 download urls.

There is 1 issue: The "download url" only becomes available on the "video url" if the account to the domain is signed in. Is signing in on my default browser (Chrome) enough?

I want the code to read a list of video urls (.txt), then produce a list of download urls (txt).

Pedroski55 · Jun-18-2024, 04:58 PM

Let's have a look at your video_urls.txt, at least a few lines, then people can see what needs to be done!

***snippsat*** · (This post was last modified: Jun-18-2024, 06:04 PM by snippsat.)

(Jun-17-2024, 08:05 AM)ilovewacha Wrote: The "download url" only becomes available on the "video url" if the account to the domain is signed in. Is signing in on my default browser (Chrome) enough?

No,signing in browser is separate and have nothing common to eg your Python code(has do all login stuff in code),this is the only tricky part of this task.
So for this can use eg Requests or Selenium(if login is difficulty).
To give a example just using this site.

import requests
from bs4 import BeautifulSoup
  
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36'
}
  
params = {
    "username": "your_username",
    "password": "xxxxxxx",
    "remember": "yes",
    "submit": "Login",
    "action": "do_login",
}
  
with requests.Session() as s:
    s.post('https://python-forum.io/member.php?action=login', headers=headers, params=params)
    # logged in! session cookies saved for future requests
    response = s.get('https://python-forum.io/index.php')
    # cookies sent automatically!
    soup = BeautifulSoup(response.content, 'lxml')
    welcome = soup.find('span', class_="welcome").text
    print(welcome)

Output:
Welcome back, snippsat. You last visited: Today, 01:40 PM Log Out

Some points i use Session so login info don't get lost,then i can parse data as logged in user.
You most test similar stuff first login(inspect login request)then try to parse download url as you say only becomes available when logged in.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Matplotlib - close multple plots with user input	Positron79	0	2,442	Dec-01-2021, 05:26 PM Last Post: Positron79
	Downloading images from webpages	H84Gabor	2	2,817	Sep-29-2021, 05:39 PM Last Post: snippsat
	Urls in a file to be executed	pyseeker	2	2,721	Sep-09-2019, 03:38 PM Last Post: pyseeker
	Looping URLs breaks them	PythonStudent	2	3,699	Apr-21-2018, 02:54 PM Last Post: PythonStudent

How to to extract urls across multple webpages at once?

User Panel Messages

Announcements