I Want To Download Many Files Of Same File Extension With Either Wget Or Python,

eddywinch82 · (This post was last modified: May-19-2018, 11:04 AM by eddywinch82.)

I would like to download Files of the same File types .utu and .zip from the Following Microsoft Flight Simulator AI Traffic Websites :-

http://web.archive.org/web/2005031511271....php?cat=6 *(Current Repaints)

http://web.archive.org/web/2005031511294....php?cat=1 (Vintage Repaints)

On each of those pages there are Subcatagories for Airbus Boeing etc for the AI Aircraft types, and the repaints .zip Files choices are shown when you click on the Aircraft image.

The Folder name then becomes http://web.archive.org/web/2004111419514...t=(number) Then when you click the downloads repaints.php? becomes download.php?fileid=(4 digit number)

What do I need to type to download all the .zip Files at once ? As clicking on them individually to download would take ages.

Also I would like to download all .utu File extension File, For Flight 1 ultimate Traffic AI Aircraft repaints. from the Following Webpage :-

http://web.archive.org/web/2006051216123...=1&index=0

Then When you click to download the Ultimate Traffic Aircraft Texture :- The last Folder Path becomes /utfiles.asp?mode=download&id=F1AIRepaintNumbers-Numbers-Numbers.utu And I would like to do the same as for the other Websites.

I used the following written code in Python 2.79, found on a video on Youtube, inserting my info to achieve my aim, but it unsurprisingly didn't work when I ran it timeouts and errors etc, probably due to it's simplicity :-

 import requests

from bs4 import BeautifulSoup

import wget

def download_links(url):

source_code = requests.get(url)

plain_text = source_code.text

soup = BeautifulSoup(plain_text, "html.parser")

for link in soup.findAll('a'):

href = link.get('href')

print(href)

wget.download(href)

download_links('http://web.archive.org/web/20041225023002/http://www.projectai.com:80/libraries/acfiles.php?cat=6')

Traceback Error Readout from Run Code :-

Error: Traceback (most recent call last):
  File "C:\Users\Owner\Downloads\Desktop\Misc 2\Python Misc Download File Program\Project AI Files 1.py", line 13, in <module>
    download_links("http://web.archive.org/web/20041225023002/http://www.projectai.com:80/libraries/acfiles.php?cat=6")
  File "C:\Users\Owner\Downloads\Desktop\Misc 2\Python Misc Download File Program\Project AI Files 1.py", line 12, in download_links
    wget.download(href)
  File "C:\Python27\lib\site-packages\wget.py", line 526, in download
    (tmpfile, headers) = ulib.urlretrieve(binurl, tmpfile, callback)
  File "C:\Python27\lib\urllib.py", line 98, in urlretrieve
    return opener.retrieve(url, filename, reporthook, data)
  File "C:\Python27\lib\urllib.py", line 245, in retrieve
    fp = self.open(url, data)
  File "C:\Python27\lib\urllib.py", line 213, in open
    return getattr(self, name)(url)
  File "C:\Python27\lib\urllib.py", line 469, in open_file
    return self.open_local_file(url)
  File "C:\Python27\lib\urllib.py", line 479, in open_local_file
    localname = url2pathname(file)
  File "C:\Python27\lib\nturl2path.py", line 26, in url2pathname
    raise IOError, error
IOError: Bad URL: /web/20041225023002/http|//www.projectai.com|80/index.php

Any help would be much appreciated

Eddie

**j.crater** · May-19-2018, 10:59 AM

Put your code in Python code tags and full error traceback message in error tags. You can find help here.

wavic · (This post was last modified: May-19-2018, 11:13 AM by wavic.)

wget -r -np -A zip,utu -c -U Mozilla http://example.com

This should work. Move to the desired destination directory before that.

eddywinch82 · (This post was last modified: May-19-2018, 11:54 AM by eddywinch82.)

Hi there wavic, many thanks for your help, I tried out your suggestion, in the wget program :-

wget -e robots=off -r -np -A utu -c -U Mozilla http://web.archive.org/web/2007061123204...=1&index=0

But got the following error messages :-

Error: HTTP request sent, awaiting response... 404 NOT FOUND
2018-05-19 12:4527 ERROR 404:NOT FOUND

'index' is not recognized as an internal or external command, operable program or batch file.

Some of the .utu File downloads might have Broken links, could that be what is causing this problem ? How do I get round that ?

wavic · May-19-2018, 02:29 PM

Try enclosing the address in quotes.

eddywinch82 · May-19-2018, 02:31 PM

How do I do that ? what shall i type ?

wavic · May-19-2018, 02:32 PM

Double quotes around the address.
"http://web.archive.org/web/20070611232047/http://ultimatetraffic.flight1.net:80/utfiles.asp?mode=1&index=0"

eddywinch82 · (This post was last modified: May-19-2018, 02:43 PM by eddywinch82.)

I see, many thanks I will try that and get back to you.

Hi wavic, I did what you suggested, and no .utu files were downloaded still, only a .tmp file

wavic · (This post was last modified: May-19-2018, 03:15 PM by wavic.)

I tested it. And got nothing. I see that the link to the file starts with tp://. I think that means transport protocol and I am not sure if wget can handle it.

Basically, this is how I download part of a website or bunch of other files but here doesn't work that way. Perhaps some web scrapping has to be involved.

***snippsat*** · May-19-2018, 03:21 PM

(May-19-2018, 10:55 AM)eddywinch82 Wrote: I used the following written code in Python 2.79, found on a video on Youtube, inserting my info to achieve my aim, but it unsurprisingly didn't work when I ran it timeouts and errors etc, probably due to it's simplicity :-

Yes,i guess there are some lacking in understating this of topic,and maybe Python in general Wink

The wget get all files method,may work or not.
Not work then back to look at site source for an other method.
As i took a look,it's not so difficult to get all .UTU files.

from bs4 import BeautifulSoup
import requests

url = 'http://web.archive.org/web/20070611232047/http://ultimatetraffic.flight1.net:80/utfiles.asp?mode=1&index=0'
url_get = requests.get(url)
soup = BeautifulSoup(url_get.content, 'lxml')
b_tag = soup.find_all('b')
for a in b_tag:
    link = a.find('a')['href']
    #print(link)
    f_name = link.split('id=')[-1]
    with open(f_name, 'wb')as f:
        f.write(requests.get(link).content)

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Creating a Browser Extension using Python	j49857	3	953	Feb-13-2024, 10:49 PM Last Post: j49857
	Login and download an exported csv file within a ribbon/button in a website	Alekhya	0	2,654	Feb-26-2021, 04:15 PM Last Post: Alekhya
	Download some JPG files and make it a single PDF & share it	rompdeck	5	5,645	Jul-31-2020, 01:15 AM Last Post: Larz60+
	Cannot download latest version of a file	TheTechRobo	3	2,278	May-20-2020, 08:33 PM Last Post: TheTechRobo
	Expose chrome extension buttons to Python	robertjaxe	2	2,342	May-12-2020, 07:52 PM Last Post: robertjaxe
	download pdf file from website	m_annur2001	1	2,985	Jun-21-2019, 05:03 AM Last Post: j.crater
	Access my webpage and download files from Python	Pedroski55	7	5,614	May-26-2019, 12:08 PM Last Post: snippsat
	XML file to multiple txt files in PYTHON	ayjay516	1	2,176	Jan-31-2019, 10:21 PM Last Post: Larz60+
	Flask generating a file for download	darktitan	0	3,346	Dec-30-2018, 02:02 PM Last Post: darktitan
	I wan't to Download all .zip Files From A Website (Project AI)	eddywinch82	68	37,837	Oct-28-2018, 02:13 PM Last Post: eddywinch82

I Want To Download Many Files Of Same File Extension With Either Wget Or Python,

User Panel Messages

Announcements