Python Forum
I Want To Download Many Files Of Same File Extension With Either Wget Or Python,
Thread Rating:
  • 1 Vote(s) - 2 Average
  • 1
  • 2
  • 3
  • 4
  • 5
I Want To Download Many Files Of Same File Extension With Either Wget Or Python,
#1
I would like to download Files of the same File types .utu and .zip from the Following Microsoft Flight Simulator AI Traffic Websites :-

http://web.archive.org/web/2005031511271....php?cat=6 *(Current Repaints)

http://web.archive.org/web/2005031511294....php?cat=1 (Vintage Repaints)

On each of those pages there are Subcatagories for Airbus Boeing etc for the AI Aircraft types, and the repaints .zip Files choices are shown when you click on the Aircraft image.

The Folder name then becomes http://web.archive.org/web/2004111419514...t=(number) Then when you click the downloads repaints.php? becomes download.php?fileid=(4 digit number)

What do I need to type to download all the .zip Files at once ? As clicking on them individually to download would take ages.

Also I would like to download all .utu File extension File, For Flight 1 ultimate Traffic AI Aircraft repaints. from the Following Webpage :-

http://web.archive.org/web/2006051216123...=1&index=0

Then When you click to download the Ultimate Traffic Aircraft Texture :- The last Folder Path becomes /utfiles.asp?mode=download&id=F1AIRepaintNumbers-Numbers-Numbers.utu And I would like to do the same as for the other Websites.

I used the following written code in Python 2.79, found on a video on Youtube, inserting my info to achieve my aim, but it unsurprisingly didn't work when I ran it timeouts and errors etc, probably due to it's simplicity :-

 import requests

from bs4 import BeautifulSoup

import wget

def download_links(url):

source_code = requests.get(url)

plain_text = source_code.text

soup = BeautifulSoup(plain_text, "html.parser")

for link in soup.findAll('a'):

href = link.get('href')

print(href)

wget.download(href)

download_links('http://web.archive.org/web/20041225023002/http://www.projectai.com:80/libraries/acfiles.php?cat=6')
Traceback Error Readout from Run Code :-

Error:
Traceback (most recent call last): File "C:\Users\Owner\Downloads\Desktop\Misc 2\Python Misc Download File Program\Project AI Files 1.py", line 13, in <module> download_links("http://web.archive.org/web/20041225023002/http://www.projectai.com:80/libraries/acfiles.php?cat=6") File "C:\Users\Owner\Downloads\Desktop\Misc 2\Python Misc Download File Program\Project AI Files 1.py", line 12, in download_links wget.download(href) File "C:\Python27\lib\site-packages\wget.py", line 526, in download (tmpfile, headers) = ulib.urlretrieve(binurl, tmpfile, callback) File "C:\Python27\lib\urllib.py", line 98, in urlretrieve return opener.retrieve(url, filename, reporthook, data) File "C:\Python27\lib\urllib.py", line 245, in retrieve fp = self.open(url, data) File "C:\Python27\lib\urllib.py", line 213, in open return getattr(self, name)(url) File "C:\Python27\lib\urllib.py", line 469, in open_file return self.open_local_file(url) File "C:\Python27\lib\urllib.py", line 479, in open_local_file localname = url2pathname(file) File "C:\Python27\lib\nturl2path.py", line 26, in url2pathname raise IOError, error IOError: Bad URL: /web/20041225023002/http|//www.projectai.com|80/index.php
Any help would be much appreciated

Eddie
Reply
#2
Put your code in Python code tags and full error traceback message in error tags. You can find help here.
Reply
#3
wget -r -np -A zip,utu -c -U Mozilla http://example.com
This should work. Move to the desired destination directory before that.
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."
https://freedns.afraid.org
Reply
#4
Hi there wavic, many thanks for your help, I tried out your suggestion, in the wget program :-

wget -e robots=off -r -np -A utu -c -U Mozilla http://web.archive.org/web/2007061123204...=1&index=0

But got the following error messages :-

Error:
HTTP request sent, awaiting response... 404 NOT FOUND 2018-05-19 12:4527 ERROR 404:NOT FOUND 'index' is not recognized as an internal or external command, operable program or batch file.
Some of the .utu File downloads might have Broken links, could that be what is causing this problem ? How do I get round that ?
Reply
#5
Try enclosing the address in quotes.
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."
https://freedns.afraid.org
Reply
#6
How do I do that ? what shall i type ?
Reply
#7
Double quotes around the address.
"http://web.archive.org/web/20070611232047/http://ultimatetraffic.flight1.net:80/utfiles.asp?mode=1&index=0"
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."
https://freedns.afraid.org
Reply
#8
I see, many thanks I will try that and get back to you.

Hi wavic, I did what you suggested, and no .utu files were downloaded still, only a .tmp file
Reply
#9
I tested it. And got nothing. I see that the link to the file starts with tp://. I think that means transport protocol and I am not sure if wget can handle it.

Basically, this is how I download part of a website or bunch of other files but here doesn't work that way. Perhaps some web scrapping has to be involved.
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."
https://freedns.afraid.org
Reply
#10
(May-19-2018, 10:55 AM)eddywinch82 Wrote: I used the following written code in Python 2.79, found on a video on Youtube, inserting my info to achieve my aim, but it unsurprisingly didn't work when I ran it timeouts and errors etc, probably due to it's simplicity :-
Yes,i guess there are some lacking in understating this of topic,and maybe Python in general Wink

The wget get all files method,may work or not.
Not work then back to look at site source for an other method.
As i took a look,it's not so difficult to get all .UTU files.
from bs4 import BeautifulSoup
import requests

url = 'http://web.archive.org/web/20070611232047/http://ultimatetraffic.flight1.net:80/utfiles.asp?mode=1&index=0'
url_get = requests.get(url)
soup = BeautifulSoup(url_get.content, 'lxml')
b_tag = soup.find_all('b')
for a in b_tag:
    link = a.find('a')['href']
    #print(link)
    f_name = link.split('id=')[-1]
    with open(f_name, 'wb')as f:
        f.write(requests.get(link).content)
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Creating a Browser Extension using Python j49857 3 3,996 Feb-13-2024, 10:49 PM
Last Post: j49857
  Login and download an exported csv file within a ribbon/button in a website Alekhya 0 3,205 Feb-26-2021, 04:15 PM
Last Post: Alekhya
  Download some JPG files and make it a single PDF & share it rompdeck 5 6,439 Jul-31-2020, 01:15 AM
Last Post: Larz60+
  Cannot download latest version of a file TheTechRobo 3 2,808 May-20-2020, 08:33 PM
Last Post: TheTechRobo
  Expose chrome extension buttons to Python robertjaxe 2 3,069 May-12-2020, 07:52 PM
Last Post: robertjaxe
  download pdf file from website m_annur2001 1 3,416 Jun-21-2019, 05:03 AM
Last Post: j.crater
  Access my webpage and download files from Python Pedroski55 7 6,567 May-26-2019, 12:08 PM
Last Post: snippsat
  XML file to multiple txt files in PYTHON ayjay516 1 2,636 Jan-31-2019, 10:21 PM
Last Post: Larz60+
  Flask generating a file for download darktitan 0 3,791 Dec-30-2018, 02:02 PM
Last Post: darktitan
  I wan't to Download all .zip Files From A Website (Project AI) eddywinch82 68 44,828 Oct-28-2018, 02:13 PM
Last Post: eddywinch82

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020