Python Forum

Full Version: Short link URL
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Let's say I open a Short link url using Requests
Like:


source = requests.get("short.url/k123h3")

is there any way I can know what's the main or long url link behind it?

Sorry for bad Explanation and Example Tongue Tongue
I would suggest that you don't open short links, they have long been associated with dodgy sites, criminal activity, phishing and all the other web based nasties that you can come across, most companies that have any sort of decent staff training on keeping their systems safe will use a number of helpful techniques to guide their users to not fall foul of scams - one of those is never without knowing for definite what is behind the link do you open "weird" urls, those that contain random numbers like you have posted, ones without defining the exact domain and context of the resource you are about to access.
This is obvs not foolproof, but if presented with the link you display in your code there the only place it would be doing any directing would be the Trash and permanently. Sorry, define your urls or only request from urls that tell you exactly what you are expecting to receive.
Just my 2p
Can use Requests like this.
import requests

url = 'http://bit.ly/cXEInp'
session = requests.Session()
resp = session.head(url, allow_redirects=True)
print(resp.url)
Output:
https://www.flickr.com/photos/26432908@N00/346615997/sizes/l/
(Jul-09-2020, 02:14 PM)snippsat Wrote: [ -> ]Can use Requests like this.
import requests

url = 'http://bit.ly/cXEInp'
session = requests.Session()
resp = session.head(url, allow_redirects=True)
print(resp.url)
Output:
https://www.flickr.com/photos/26432908@N00/346615997/sizes/l/

Can you explain what's happening here?
(Jul-09-2020, 02:41 PM)Evil_Patrick Wrote: [ -> ]Can you explain what's happening here?
URL shortening work bye redirect to the web page that has the original long URL.
When using allow_redirects=True it will follow all redirects.
The info info will be in the Location header.
To see more whats going on.
>>> import requests
>>> 
>>> url = 'http://t.co/hAplNMmSTg'
>>> session = requests.Session()
>>> resp = session.head(url, allow_redirects=True)
>>> resp.history
[<Response [301]>,
 <Response [301]>,
 <Response [301]>,
 <Response [301]>,
 <Response [301]>]
>>> 
>>> # We see that it gets redirect 5 times
>>> # Look at content of headers
>>> resp.history[0].headers
{'cache-control': 'no-cache, no-store, max-age=0', 'content-length': '0', 'date': 'Thu, 09 Jul 2020 16:50:07 GMT', 'location': 'https://t.co/hAplNMmSTg', 'server': 'tsa_o', 'x-connection-hash': '1a5db93459d4a04a1c4bef977f5ccbe5', 'x-response-time': '107'}
>>> resp.history[1].headers
{'cache-control': 'private,max-age=300', 'content-length': '0', 'date': 'Thu, 09 Jul 2020 16:50:07 GMT', 'expires': 'Thu, 09 Jul 2020 16:55:07 GMT', 'location': 'https://bit.ly/1kb2qbf', 'server': 'tsa_o', 'set-cookie': 'muc=1d5424b7-9c14-4a7f-a83b-093aed6c273f; Max-Age=63072000; Expires=Sat, 9 Jul 2022 16:50:07 GMT; Domain=t.co; Secure; SameSite=None', 'strict-transport-security': 'max-age=0', 'vary': 'Origin', 'x-connection-hash': '419be23ac2d73c03cafff5391745329d', 'x-response-time': '109'}
>>> resp.history[3].headers
{'Server': 'CloudFront', 'Date': 'Thu, 09 Jul 2020 16:50:08 GMT', 'Content-Type': 'text/html', 'Content-Length': '183', 'Connection': 'keep-alive', 'Location': 'https://www.wtatennis.com/players/player/13516/title/simona-halep', 'X-Cache': 'Redirect from cloudfront', 'Via': '1.1 8ddb6d7670d8c5a85c04a10525a71b91.cloudfront.net (CloudFront)', 'X-Amz-Cf-Pop': 'OSL50-C1', 'X-Amz-Cf-Id': 'tqFBbmJSSfqO5YtsV_vJih8avgaWFdHf1NIQEGLAc-BlgquOqHrmCg=='}
>>> resp.history[4].headers
{'Content-Length': '0', 'Connection': 'keep-alive', 'Date': 'Thu, 09 Jul 2020 16:50:08 GMT', 'Location': '/players/314320/simona-halep', 'Server': 'nginx', 'X-Cache': 'Miss from cloudfront', 'Via': '1.1 1d8cf7c8865ed1078c19a98771ad34cb.cloudfront.net (CloudFront)', 'X-Amz-Cf-Pop': 'OSL50-C1', 'X-Amz-Cf-Id': 'SnGpLonXzNkOnw2LpqTWdHp_I-3h4YkeYSQFr5WHjeyG7Dfy2mlPdw=='}
>>> resp.history[4].headers['Location']
'/players/314320/simona-halep'
So if run with this short url the end url will be.
Output:
https://www.wtatennis.com/players/314320/simona-halep
Can I suggest un-shortening the link? I just covered this
in my latest code snippets:

pip install unshortenit

from unshortenit import UnshortenIt
 
unsh = UnshortenIt()
uri = unsh.unshorten('https://wp.me/Pa5TU8-2yD')
print(uri)