Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Short link URL
#1
Let's say I open a Short link url using Requests
Like:

source = requests.get("short.url/k123h3")
is there any way I can know what's the main or long url link behind it?

Sorry for bad Explanation and Example Tongue Tongue
Reply
#2
I would suggest that you don't open short links, they have long been associated with dodgy sites, criminal activity, phishing and all the other web based nasties that you can come across, most companies that have any sort of decent staff training on keeping their systems safe will use a number of helpful techniques to guide their users to not fall foul of scams - one of those is never without knowing for definite what is behind the link do you open "weird" urls, those that contain random numbers like you have posted, ones without defining the exact domain and context of the resource you are about to access.
This is obvs not foolproof, but if presented with the link you display in your code there the only place it would be doing any directing would be the Trash and permanently. Sorry, define your urls or only request from urls that tell you exactly what you are expecting to receive.
Just my 2p
Regards
-------- *
“Outside of a dog, a book is man's best friend. Inside of a dog it's too dark to read.”
Reply
#3
Can use Requests like this.
import requests

url = 'http://bit.ly/cXEInp'
session = requests.Session()
resp = session.head(url, allow_redirects=True)
print(resp.url)
Output:
https://www.flickr.com/photos/26432908@N00/346615997/sizes/l/
Reply
#4
(Jul-09-2020, 02:14 PM)snippsat Wrote: Can use Requests like this.
import requests

url = 'http://bit.ly/cXEInp'
session = requests.Session()
resp = session.head(url, allow_redirects=True)
print(resp.url)
Output:
https://www.flickr.com/photos/26432908@N00/346615997/sizes/l/

Can you explain what's happening here?
Reply
#5
(Jul-09-2020, 02:41 PM)Evil_Patrick Wrote: Can you explain what's happening here?
URL shortening work bye redirect to the web page that has the original long URL.
When using allow_redirects=True it will follow all redirects.
The info info will be in the Location header.
To see more whats going on.
>>> import requests
>>> 
>>> url = 'http://t.co/hAplNMmSTg'
>>> session = requests.Session()
>>> resp = session.head(url, allow_redirects=True)
>>> resp.history
[<Response [301]>,
 <Response [301]>,
 <Response [301]>,
 <Response [301]>,
 <Response [301]>]
>>> 
>>> # We see that it gets redirect 5 times
>>> # Look at content of headers
>>> resp.history[0].headers
{'cache-control': 'no-cache, no-store, max-age=0', 'content-length': '0', 'date': 'Thu, 09 Jul 2020 16:50:07 GMT', 'location': 'https://t.co/hAplNMmSTg', 'server': 'tsa_o', 'x-connection-hash': '1a5db93459d4a04a1c4bef977f5ccbe5', 'x-response-time': '107'}
>>> resp.history[1].headers
{'cache-control': 'private,max-age=300', 'content-length': '0', 'date': 'Thu, 09 Jul 2020 16:50:07 GMT', 'expires': 'Thu, 09 Jul 2020 16:55:07 GMT', 'location': 'https://bit.ly/1kb2qbf', 'server': 'tsa_o', 'set-cookie': 'muc=1d5424b7-9c14-4a7f-a83b-093aed6c273f; Max-Age=63072000; Expires=Sat, 9 Jul 2022 16:50:07 GMT; Domain=t.co; Secure; SameSite=None', 'strict-transport-security': 'max-age=0', 'vary': 'Origin', 'x-connection-hash': '419be23ac2d73c03cafff5391745329d', 'x-response-time': '109'}
>>> resp.history[3].headers
{'Server': 'CloudFront', 'Date': 'Thu, 09 Jul 2020 16:50:08 GMT', 'Content-Type': 'text/html', 'Content-Length': '183', 'Connection': 'keep-alive', 'Location': 'https://www.wtatennis.com/players/player/13516/title/simona-halep', 'X-Cache': 'Redirect from cloudfront', 'Via': '1.1 8ddb6d7670d8c5a85c04a10525a71b91.cloudfront.net (CloudFront)', 'X-Amz-Cf-Pop': 'OSL50-C1', 'X-Amz-Cf-Id': 'tqFBbmJSSfqO5YtsV_vJih8avgaWFdHf1NIQEGLAc-BlgquOqHrmCg=='}
>>> resp.history[4].headers
{'Content-Length': '0', 'Connection': 'keep-alive', 'Date': 'Thu, 09 Jul 2020 16:50:08 GMT', 'Location': '/players/314320/simona-halep', 'Server': 'nginx', 'X-Cache': 'Miss from cloudfront', 'Via': '1.1 1d8cf7c8865ed1078c19a98771ad34cb.cloudfront.net (CloudFront)', 'X-Amz-Cf-Pop': 'OSL50-C1', 'X-Amz-Cf-Id': 'SnGpLonXzNkOnw2LpqTWdHp_I-3h4YkeYSQFr5WHjeyG7Dfy2mlPdw=='}
>>> resp.history[4].headers['Location']
'/players/314320/simona-halep'
So if run with this short url the end url will be.
Output:
https://www.wtatennis.com/players/314320/simona-halep
Reply
#6
Can I suggest un-shortening the link? I just covered this
in my latest code snippets:

pip install unshortenit

from unshortenit import UnshortenIt
 
unsh = UnshortenIt()
uri = unsh.unshorten('https://wp.me/Pa5TU8-2yD')
print(uri)
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  python how to check short url working? Pyguys 10 4,564 Mar-18-2020, 01:42 AM
Last Post: Pyguys
  get link and link text from table metulburr 5 6,187 Jun-13-2019, 07:50 PM
Last Post: snippsat

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020