Python Forum

Full Version: Downloaded file corrupted
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello,

I am trying to download a file from a web page, but when I try to open it, it shows an error[attachment=2578]


This is my code:
session = requests.Session()
        retries = Retry(total=5, backoff_factor=1, status_forcelist=[500, 502, 503, 504])
        session.mount('https://', HTTPAdapter(max_retries=retries))
        sw = 1
        for row in rows[4:]:
            if sw == 1:
                cells = row.find_elements(By.TAG_NAME, "td")
                if not cells:
                    continue
                else:
                    image_elements = cells[0].find_elements(By.TAG_NAME, "img")
                    image_to_click = image_elements[0]
                    link_element = cells[0].find_element(By.TAG_NAME, "a")
                    filename = link_element.get_attribute("href")
                    file_2b_downloaded = file_path+ "/" + filename.split("=")[-1]

                    response = requests.get(filename, verify=False)
                    if response.status_code == 200:
                        open(file_2b_downloaded, 'wb').write(response.content) 

                        #Creates a folder and move the respective data into it
                        
                        source_file = "C:/Users/"+ username +"/Downloads"
                        destination_folder = fp + "/FO" + rpf[j]
                        if not os.path.exists(destination_folder):
                            os.makedirs(destination_folder)
                        shutil.move(file_2b_downloaded, destination_folder)
                        
                    else:
                        print(f"Failed tp download. Status code: {response.status_code}")
                sw = 0
            else:
                sw = 1
How can I fix this?

Thanks in advance for the help.
(Sep-30-2023, 05:10 AM)buran Wrote: [ -> ]cross-posted a https://stackoverflow.com/q/77205487/4046632
Yes, I need urgent assistance on this and I deleted the post from the other one.
Can you post url to page you try to download from?
It is hard to advice if don't know what webpage use for download code.
(Sep-30-2023, 01:21 PM)snippsat Wrote: [ -> ]Can you post url to page you try to download from?
It is hard to advice if don't know what webpage use for download code.

That would be the problem, that URL cannot accessed from Internet but an Intranet.
emont Wrote:That would be the problem, that URL cannot accessed from Internet but an Intranet.
Ok,to look a little more at this,dos file_2b_downloaded give you the real download link?
To give a example,you should test like this no loop or other stuff,just test the download.
Sample zip files download
import requests

url = 'https://drive.google.com/uc?export=download&id=1o9DtaYEb1N-C_L7kqCAgfE0D5RaEwbZH'
response = requests.get(url)
with open('5mb.zip', 'wb') as f:
    f.write(response.content)
So this works now because here i use the real download link,for the 5 mb sample zip file.

If parse out the link,look like this
import requests
from bs4 import BeautifulSoup

url = 'https://web-utility.com/en/sample/files/sample-zip-file-download'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'lxml')
link = soup.select_one('div.col-md-3.col-3.text-right > a').get('href')
response = requests.get(link)
with open('5mb.zip', 'wb') as f:
    f.write(response.content) 
So this is a simpler case because download link is exposed and all can see it.
One more here i look at what is send over network Chrome dev-tool(inspect -> network),sometime can need to do this.
import requests

headers = {
    'authority': 'drive.google.com',
    'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
}

params = {
    'export': 'download',
    'id': '1o9DtaYEb1N-C_L7kqCAgfE0D5RaEwbZH',
}

response = requests.get('https://drive.google.com/uc', params=params, headers=headers)
with open('5mb.zip', 'wb') as f:
    f.write(response.content)
Also some cases can be using Selenium be easier,as if have to press a button before get the real download address.