Python Forum

Full Version: python selenium downloading embedded pdf
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I've navigated to a page to download a pdf that is a report showing information I've asked for. However, I can't seem to download it because of the way the information is being displayed. When I inspect the pdf, here is what I see:
<embed id="plugin" type="application/x-google-chrome-pdf" src="https://uk.ixl.com/analytics/students-quickview-pdf" stream-url="chrome-extension://mhjfbmdgcfjbbpaeojofohoefgiehjai/0129a32b-174c-4e06-b6e5-0f332f853591" headers="cache-control: no-cache
cf-cache-status: DYNAMIC
cf-ray: 6263e4d4bdf940f6-LHR
cf-request-id: 08724d58f1000040f6860eb000000001
content-disposition: inline;filename=&quot;IXL-Students-Quickview_2021-02-23.pdf&quot;
content-language: en-GB
content-type: application/pdf
date: Tue, 23 Feb 2021 21:03:30 GMT
expect-ct: max-age=604800, report-uri=&quot;https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct&quot;
link: <https://uk.ixl.com/analytics/students-quickview-pdf>; rel=&quot;canonical&quot;
server: cloudflare
strict-transport-security: max-age=31536000
vary: User-Agent
x-content-type-options: nosniff
x-frame-options: SAMEORIGIN
x-ixl-trace: P1EwUHZMdG 9PaXFvPWJr RmVvMzZjY2 RlZXF4eGVv ZExrTEtPUF pzNndKSTAy YlQzYWFLOV Vzb0VqLzNk akVRaVdzOU 1VenZ4Tmd6 cGVFT21uZn J5ZVpmdnlE Vm4wNkxqdD hRb1g2cWNo bmRZK2xNbm tMazAyNHlp V0JIdnpjWk ttUHpXYi9G YTlkMjdDN2 ZicnF2cldT WitBVStUK3 ZSM2kyRkxo dVUxM256Ni 9EUHhPai9x dHVCUHpneW RrNUpNazJM Z1JSdXpjaU JaU2wwdklL RXF0eGxRaj UyNmMxbVJh bWd5MVZ0Kz VqVGthUHh4 bzU0dTBNc0 JCOU9iV0Jo NGJEZFZQdj FCSzY0ZHU5 dHpiV1g5TD hEMD0=
x-xss-protection: 1; mode=block
" background-color="0xFF525659" top-toolbar-height="0" javascript="allow" full-frame="" pdf-viewer-update-enabled="">
so when I try to download the pdf using urllib.request.urlretrieve(pdfUrl, pdfname) (with the pdfUrl = to the src in the code above) I get the error urllib.error.HTTPError: HTTP Error 403: Forbidden and when I load the the source page you can tell from the link that it is a generic page and for all reports and actually just gives you an error, so it seems as if the data is being streamed to the pdf if I'm right looking at where it says stream-url="chrome-extension://mhjfbmdgcfjbbpaeojofohoefgiehjai/d5f1b753-2bf4-468c-8c35-be6f1629ecc8". Any ideas how I can download this page? I might be missing something but as a home user I'm up to speed with python too much, thanks.