Python Forum
Thread Rating:
  • 2 Vote(s) - 3 Average
  • 1
  • 2
  • 3
  • 4
  • 5
BeautifulSoup and pagination.
#1
I have a script that download MP3  on the site. I don't know how to pass other page to download other music after downloading of all music on the page. It seem to me that I should add some nambers to url but I realy cannot figure it out.
Here is my code:
#!/usr/bin/python3.4
#-*- coding: utf-8 -*-

from bs4 import BeautifulSoup
import requests
import os
from datetime import datetime


def main():

    start = datetime.now()    

    url = 'https://muzofond.org/search/e%20mantra'
    html = requests.get(url).text

    soup = BeautifulSoup(html, 'lxml')
    
    os.system('clear')
    print('-----' * 10)

    name = []
    href = []
    for x in soup.find_all('a', 'dl'):
        #print(x.get('download'))
        #print(x.get('download'), x.get('href'))
        name.append(x.get('download'))
        href.append(x.get('href'))

    name.pop(0)
    href.pop(0)
    
    dirname = '/home/mikefromru/music/E-mantra/'

    print('Download...')
    numberSong = len(name)
    i = 0
    while i != len(name):
        e_url = requests.get(href[-1], stream=True)
        f = open(dirname + name[-1], 'wb')
        f.write(e_url.content)
        f.close()
        print(numberSong, '-', name[-1])
        href.pop()
        name.pop()
        numberSong -=1

    end = datetime.now()
    total = end - start
    print('The program was warking for {} min'.format(str(total)))
    print('done')


if __name__ == '__main__':
    main()
Reply
#2
Can do it in a loop with string formatting.
import time

for page in range(1,5):
    time.sleep(2)
    url = 'https://muzofond.org/search/e%20mantra/{}'.format(page)
    print(url)
Output:
https://muzofond.org/search/e%20mantra/1 https://muzofond.org/search/e%20mantra/2 https://muzofond.org/search/e%20mantra/3 https://muzofond.org/search/e%20mantra/4
If something block when downloading more than one page,
then it's fine task to train on launching parallel tasks.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  BeautifulSoup pagination using href rhat398 1 2,354 Jun-30-2021, 10:55 AM
Last Post: snippsat
  Python beautifulsoup pagination error The61 5 3,401 Apr-09-2020, 09:17 PM
Last Post: Larz60+
  Pagination prejni 2 2,360 Nov-18-2019, 10:45 AM
Last Post: alekson
  Scrapy Javascript Pagination (next_page) nazmulfinance 2 2,983 Nov-18-2019, 01:01 AM
Last Post: nazmulfinance
  pagination for non standarded pages zarize 12 5,898 Sep-02-2019, 12:35 PM
Last Post: zarize
  Python - Scrapy Javascript Pagination (next_page) Baggelhsk95 3 9,919 Oct-08-2018, 01:20 PM
Last Post: stranac
  Filtering and pagination garynobles 0 34,268 Jun-14-2018, 08:11 PM
Last Post: garynobles

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020