Short link URL - Printable Version

Short link URL - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html)
+--- Thread: Short link URL (/thread-28202.html)

Short link URL - Evil_Patrick - Jul-09-2020

Let's say I open a Short link url using Requests
Like:

source = requests.get("short.url/k123h3")

is there any way I can know what's the main or long url link behind it?

Sorry for bad Explanation and Example Tongue

RE: Short link URL - nuffink - Jul-09-2020

I would suggest that you don't open short links, they have long been associated with dodgy sites, criminal activity, phishing and all the other web based nasties that you can come across, most companies that have any sort of decent staff training on keeping their systems safe will use a number of helpful techniques to guide their users to not fall foul of scams - one of those is never without knowing for definite what is behind the link do you open "weird" urls, those that contain random numbers like you have posted, ones without defining the exact domain and context of the resource you are about to access.
This is obvs not foolproof, but if presented with the link you display in your code there the only place it would be doing any directing would be the Trash and permanently. Sorry, define your urls or only request from urls that tell you exactly what you are expecting to receive.
Just my 2p

RE: Short link URL - snippsat - Jul-09-2020

Can use Requests like this.

import requests

url = 'http://bit.ly/cXEInp'
session = requests.Session()
resp = session.head(url, allow_redirects=True)
print(resp.url)

Output:
https://www.flickr.com/photos/26432908@N00/346615997/sizes/l/

RE: Short link URL - Evil_Patrick - Jul-09-2020

(Jul-09-2020, 02:14 PM)snippsat Wrote: Can use Requests like this.

import requests

url = 'http://bit.ly/cXEInp'
session = requests.Session()
resp = session.head(url, allow_redirects=True)
print(resp.url)

Output:
https://www.flickr.com/photos/26432908@N00/346615997/sizes/l/

Can you explain what's happening here?

RE: Short link URL - snippsat - Jul-09-2020

(Jul-09-2020, 02:41 PM)Evil_Patrick Wrote: Can you explain what's happening here?

URL shortening work bye redirect to the web page that has the original long URL.
When using allow_redirects=True it will follow all redirects.
The info info will be in the Location header.
To see more whats going on.

>>> import requests
>>> 
>>> url = 'http://t.co/hAplNMmSTg'
>>> session = requests.Session()
>>> resp = session.head(url, allow_redirects=True)
>>> resp.history
[<Response [301]>,
 <Response [301]>,
 <Response [301]>,
 <Response [301]>,
 <Response [301]>]
>>> 
>>> # We see that it gets redirect 5 times
>>> # Look at content of headers
>>> resp.history[0].headers
{'cache-control': 'no-cache, no-store, max-age=0', 'content-length': '0', 'date': 'Thu, 09 Jul 2020 16:50:07 GMT', 'location': 'https://t.co/hAplNMmSTg', 'server': 'tsa_o', 'x-connection-hash': '1a5db93459d4a04a1c4bef977f5ccbe5', 'x-response-time': '107'}
>>> resp.history[1].headers
{'cache-control': 'private,max-age=300', 'content-length': '0', 'date': 'Thu, 09 Jul 2020 16:50:07 GMT', 'expires': 'Thu, 09 Jul 2020 16:55:07 GMT', 'location': 'https://bit.ly/1kb2qbf', 'server': 'tsa_o', 'set-cookie': 'muc=1d5424b7-9c14-4a7f-a83b-093aed6c273f; Max-Age=63072000; Expires=Sat, 9 Jul 2022 16:50:07 GMT; Domain=t.co; Secure; SameSite=None', 'strict-transport-security': 'max-age=0', 'vary': 'Origin', 'x-connection-hash': '419be23ac2d73c03cafff5391745329d', 'x-response-time': '109'}
>>> resp.history[3].headers
{'Server': 'CloudFront', 'Date': 'Thu, 09 Jul 2020 16:50:08 GMT', 'Content-Type': 'text/html', 'Content-Length': '183', 'Connection': 'keep-alive', 'Location': 'https://www.wtatennis.com/players/player/13516/title/simona-halep', 'X-Cache': 'Redirect from cloudfront', 'Via': '1.1 8ddb6d7670d8c5a85c04a10525a71b91.cloudfront.net (CloudFront)', 'X-Amz-Cf-Pop': 'OSL50-C1', 'X-Amz-Cf-Id': 'tqFBbmJSSfqO5YtsV_vJih8avgaWFdHf1NIQEGLAc-BlgquOqHrmCg=='}
>>> resp.history[4].headers
{'Content-Length': '0', 'Connection': 'keep-alive', 'Date': 'Thu, 09 Jul 2020 16:50:08 GMT', 'Location': '/players/314320/simona-halep', 'Server': 'nginx', 'X-Cache': 'Miss from cloudfront', 'Via': '1.1 1d8cf7c8865ed1078c19a98771ad34cb.cloudfront.net (CloudFront)', 'X-Amz-Cf-Pop': 'OSL50-C1', 'X-Amz-Cf-Id': 'SnGpLonXzNkOnw2LpqTWdHp_I-3h4YkeYSQFr5WHjeyG7Dfy2mlPdw=='}
>>> resp.history[4].headers['Location']
'/players/314320/simona-halep'

So if run with this short url the end url will be.

Output:
https://www.wtatennis.com/players/314320/simona-halep

RE: Short link URL - steve_shambles - Jul-10-2020

Can I suggest un-shortening the link? I just covered this
in my latest code snippets:

pip install unshortenit

from unshortenit import UnshortenIt
 
unsh = UnshortenIt()
uri = unsh.unshorten('https://wp.me/Pa5TU8-2yD')
print(uri)