Python Forum

Full Version: I Want To Download Many Files Of Same File Extension With Either Wget Or Python,
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2
I would like to download Files of the same File types .utu and .zip from the Following Microsoft Flight Simulator AI Traffic Websites :-

http://web.archive.org/web/2005031511271....php?cat=6 *(Current Repaints)

http://web.archive.org/web/2005031511294....php?cat=1 (Vintage Repaints)

On each of those pages there are Subcatagories for Airbus Boeing etc for the AI Aircraft types, and the repaints .zip Files choices are shown when you click on the Aircraft image.

The Folder name then becomes http://web.archive.org/web/2004111419514...t=(number) Then when you click the downloads repaints.php? becomes download.php?fileid=(4 digit number)

What do I need to type to download all the .zip Files at once ? As clicking on them individually to download would take ages.

Also I would like to download all .utu File extension File, For Flight 1 ultimate Traffic AI Aircraft repaints. from the Following Webpage :-

http://web.archive.org/web/2006051216123...=1&index=0

Then When you click to download the Ultimate Traffic Aircraft Texture :- The last Folder Path becomes /utfiles.asp?mode=download&id=F1AIRepaintNumbers-Numbers-Numbers.utu And I would like to do the same as for the other Websites.

I used the following written code in Python 2.79, found on a video on Youtube, inserting my info to achieve my aim, but it unsurprisingly didn't work when I ran it timeouts and errors etc, probably due to it's simplicity :-

 import requests

from bs4 import BeautifulSoup

import wget

def download_links(url):

source_code = requests.get(url)

plain_text = source_code.text

soup = BeautifulSoup(plain_text, "html.parser")

for link in soup.findAll('a'):

href = link.get('href')

print(href)

wget.download(href)

download_links('http://web.archive.org/web/20041225023002/http://www.projectai.com:80/libraries/acfiles.php?cat=6')
Traceback Error Readout from Run Code :-

Error:
Traceback (most recent call last): File "C:\Users\Owner\Downloads\Desktop\Misc 2\Python Misc Download File Program\Project AI Files 1.py", line 13, in <module> download_links("http://web.archive.org/web/20041225023002/http://www.projectai.com:80/libraries/acfiles.php?cat=6") File "C:\Users\Owner\Downloads\Desktop\Misc 2\Python Misc Download File Program\Project AI Files 1.py", line 12, in download_links wget.download(href) File "C:\Python27\lib\site-packages\wget.py", line 526, in download (tmpfile, headers) = ulib.urlretrieve(binurl, tmpfile, callback) File "C:\Python27\lib\urllib.py", line 98, in urlretrieve return opener.retrieve(url, filename, reporthook, data) File "C:\Python27\lib\urllib.py", line 245, in retrieve fp = self.open(url, data) File "C:\Python27\lib\urllib.py", line 213, in open return getattr(self, name)(url) File "C:\Python27\lib\urllib.py", line 469, in open_file return self.open_local_file(url) File "C:\Python27\lib\urllib.py", line 479, in open_local_file localname = url2pathname(file) File "C:\Python27\lib\nturl2path.py", line 26, in url2pathname raise IOError, error IOError: Bad URL: /web/20041225023002/http|//www.projectai.com|80/index.php
Any help would be much appreciated

Eddie
Put your code in Python code tags and full error traceback message in error tags. You can find help here.
wget -r -np -A zip,utu -c -U Mozilla http://example.com
This should work. Move to the desired destination directory before that.
Hi there wavic, many thanks for your help, I tried out your suggestion, in the wget program :-

wget -e robots=off -r -np -A utu -c -U Mozilla http://web.archive.org/web/2007061123204...=1&index=0

But got the following error messages :-

Error:
HTTP request sent, awaiting response... 404 NOT FOUND 2018-05-19 12:4527 ERROR 404:NOT FOUND 'index' is not recognized as an internal or external command, operable program or batch file.
Some of the .utu File downloads might have Broken links, could that be what is causing this problem ? How do I get round that ?
Try enclosing the address in quotes.
How do I do that ? what shall i type ?
Double quotes around the address.
"http://web.archive.org/web/20070611232047/http://ultimatetraffic.flight1.net:80/utfiles.asp?mode=1&index=0"
I see, many thanks I will try that and get back to you.

Hi wavic, I did what you suggested, and no .utu files were downloaded still, only a .tmp file
I tested it. And got nothing. I see that the link to the file starts with tp://. I think that means transport protocol and I am not sure if wget can handle it.

Basically, this is how I download part of a website or bunch of other files but here doesn't work that way. Perhaps some web scrapping has to be involved.
(May-19-2018, 10:55 AM)eddywinch82 Wrote: [ -> ]I used the following written code in Python 2.79, found on a video on Youtube, inserting my info to achieve my aim, but it unsurprisingly didn't work when I ran it timeouts and errors etc, probably due to it's simplicity :-
Yes,i guess there are some lacking in understating this of topic,and maybe Python in general Wink

The wget get all files method,may work or not.
Not work then back to look at site source for an other method.
As i took a look,it's not so difficult to get all .UTU files.
from bs4 import BeautifulSoup
import requests

url = 'http://web.archive.org/web/20070611232047/http://ultimatetraffic.flight1.net:80/utfiles.asp?mode=1&index=0'
url_get = requests.get(url)
soup = BeautifulSoup(url_get.content, 'lxml')
b_tag = soup.find_all('b')
for a in b_tag:
    link = a.find('a')['href']
    #print(link)
    f_name = link.split('id=')[-1]
    with open(f_name, 'wb')as f:
        f.write(requests.get(link).content)
Pages: 1 2