Python Forum
How to to extract urls across multple webpages at once?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to to extract urls across multple webpages at once?
#1
I am trying to download videos from a site, which requires extracting 1 "download url" that resides on each "video url".

Example:

"video url": https://www.example.com/video/[string1]

"download url" (1 url on each video url): https://www.example.com/get_file/[string2]

Each "video url" has 1 "download url", so if I have 100 video urls, I will have 100 download urls.

There is 1 issue: The "download url" only becomes available on the "video url" if the account to the domain is signed in. Is signing in on my default browser (Chrome) enough?

I want the code to read a list of video urls (.txt), then produce a list of download urls (txt).
Reply
#2
Let's have a look at your video_urls.txt, at least a few lines, then people can see what needs to be done!
Reply
#3
(Jun-17-2024, 08:05 AM)ilovewacha Wrote: The "download url" only becomes available on the "video url" if the account to the domain is signed in. Is signing in on my default browser (Chrome) enough?
No,signing in browser is separate and have nothing common to eg your Python code(has do all login stuff in code),this is the only tricky part of this task.
So for this can use eg Requests or Selenium(if login is difficulty).
To give a example just using this site.
import requests
from bs4 import BeautifulSoup
  
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36'
}
  
params = {
    "username": "your_username",
    "password": "xxxxxxx",
    "remember": "yes",
    "submit": "Login",
    "action": "do_login",
}
  
with requests.Session() as s:
    s.post('https://python-forum.io/member.php?action=login', headers=headers, params=params)
    # logged in! session cookies saved for future requests
    response = s.get('https://python-forum.io/index.php')
    # cookies sent automatically!
    soup = BeautifulSoup(response.content, 'lxml')
    welcome = soup.find('span', class_="welcome").text
    print(welcome)
Output:
Welcome back, snippsat. You last visited: Today, 01:40 PM Log Out
Some points i use Session so login info don't get lost,then i can parse data as logged in user.
You most test similar stuff first login(inspect login request)then try to parse download url as you say only becomes available when logged in.
Pedroski55 likes this post
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Matplotlib - close multple plots with user input Positron79 0 1,856 Dec-01-2021, 05:26 PM
Last Post: Positron79
  Downloading images from webpages H84Gabor 2 2,050 Sep-29-2021, 05:39 PM
Last Post: snippsat
  Urls in a file to be executed pyseeker 2 2,177 Sep-09-2019, 03:38 PM
Last Post: pyseeker
  Looping URLs breaks them PythonStudent 2 3,065 Apr-21-2018, 02:54 PM
Last Post: PythonStudent

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020