Python Forum
remove string character from url
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
remove string character from url
#1
I have a url https://www.facebook.com/xxxxxx?test1q223, does anyone knows any method to remove from ? to the end of the string.
So I expected it should be https://www.facebook.com/xxxxxx.
Reply
#2
>>> url = 'https://www.facebook.com/xxxxxx?test1q223'
>>> url = url.split('?')[:-1][0]
>>> url
'https://www.facebook.com/xxxxxx'
>>> 
Reply
#3
There are also ways to do it using the stdlib's urllib.parse module:
>>> from urllib.parse import urlparse
>>> parsed = urlparse('https://www.facebook.com/xxxxxx?test1q223')
>>> parsed._replace(query='').geturl()
'https://www.facebook.com/xxxxxx'
Note that _replace is an undocumented "private" method, so its semantics might change in the future (although it's pretty unlikely).
You might also want to replace params and fragment, depending on your exact needs.
Reply
#4
HI thanks.
if i wants to write in loop, is it possible be like this:
i put the url in the text file, and export it to a new file.
Example: 'orgginal.txt' would be like https://www.facebook.com/xxxxxx?test1q223, and out.txt would be like https://www.facebook.com/xxxxxx

with open('orgginal.txt') as f,open('out.txt', 'w') as f_out:
    for line in f:
        line = line.strip()
	line=line.split('?')[:-1][0]
        f_out.write('{}\n'.format(line))
Reply
#5
i have figure it out

with open('orgginal.txt') as f,open('out.txt', 'w') as f_out:
    for line in f:
        line = line.strip()
        print(line)
        line=line.split('?')[:-1][0]
        print(line)
        f_out.write('{}\n'.format(line))
	
or
from urllib.parse import urlparse
with open('orgginal.txt') as f,open('out.txt', 'w') as f_out:
    for line in f:
        line = line.strip()
        parsed = urlparse(line)
        #print(line)
        newline=parsed._replace(query='').geturl()
        print(newline)
        #f_out.write('{}\n'.format(line))
	
Reply
#6
HI sorry,
i have another question to ask you guys.
what if my url look like this https://www.facebook.com/xxxxxx/test1q223/, how do i remove the last /, it should be like this https://www.facebook.com/xxxxxx/test1q223.
i saw there is one method we can use by st = st[:-1].
But how to determine if sometime have a ? or have a / at the end.
For example
https://www.facebook.com/xxxxxx/test1q223/
https://www.facebook.com/xxxxxx/?test1q223

how to let it changed to
https://www.facebook.com/xxxxxx/test1q223
https://www.facebook.com/xxxxxx

is there any way we can check if we meet this 2 condition, remove the ? and / at the end




>>> parsed = urlparse('https://www.facebook.com/xxxxxx/est1q223/')
>>> parsed._replace(query='').geturl()
'https://www.facebook.com/xxxxxx/est1q223/'
>>>
Reply
#7
You really need to study up on slicing, see: https://www.python-course.eu/python3_seq..._types.php

This will handle all cases:
url1 = 'https://www.facebook.com/xxxxxx/test1q223/'
url2 = 'https://www.facebook.com/xxxxxx/?test1q223'
url3 = 'https://www.facebook.com/xxxxxx/test1q223'

def change_url(url):
    urlx = url.split('/')
    if url[-1] == '/':
        return url[:-1]
    if urlx[-1].startswith('?'):
        urlx[-1] = urlx[-1][1:]
        return '/'.join(urlx)
    return url

print(f'url1: {change_url(url1)}')
print(f'url2: {change_url(url2)}')
print(f'url3: {change_url(url3)}')
results:
Output:
url1: https://www.facebook.com/xxxxxx/test1q223 url2: https://www.facebook.com/xxxxxx/test1q223 url3: https://www.facebook.com/xxxxxx/test1q223
Reply
#8
hi, i have a question about this script.
Why does my first url1 didn't cut off the last string" /test1q223 "?


url1 = 'https://www.facebook.com/xxxxxx/test1q223/'
url2 = 'https://www.facebook.com/xxxxxx/?test1q223'
url3 = 'https://www.facebook.com/xxxxxx/test1q223'
url4 = 'https://www.facebook.com/xxxxxx/test1q223' 
def change_url(url):
    urlx = url.split('/')
    #print (urlx)
    #print(urlx[4])
#    if url[-1] == '/':
#        print("yes1")
  #      return ''.join(urlx[:-1])
        
    if urlx[-1].startswith('?'):
        urlx[-1] = urlx[-1][1:]
        #print("yes2")
        return '/'.join(urlx[:-1])
        
    if urlx[2]!='':
        urlx[-1] = urlx[:4][3]
        #print(urlx[:3][2])
        #print(urlx[2])
        #print("yes3")
        #print("ttt"+urlx[:3][1])
        #print(urlx)
        return '/'.join(urlx[:-1])




    return url
print(f'url1: {change_url(url1)}')
print(f'url2: {change_url(url2)}')
print(f'url3: {change_url(url3)}')
print(f'url4: {change_url(url4)}')
#output is :
url1: https://www.facebook.com/xxxxxx/test1q223
url2: https://www.facebook.com/xxxxxx
url3: https://www.facebook.com/xxxxxx
url4: https://www.facebook.com/xxxxxx
Reply
#9
Quote:Why does my first url1 didn't cut off the last string" /test1q223 "?
Break it down step by step (uses f-string which requires python 3.6 or newer)
>>> url1 = 'https://www.facebook.com/xxxxxx/test1q223/'
>>> url2 = 'https://www.facebook.com/xxxxxx/?test1q223'
>>> url3 = 'https://www.facebook.com/xxxxxx/test1q223'
>>> def change_url(url):
...     urlx = url.split('/')
...     print(f'url: {url}, urlx: {urlx}')
...     if url[-1] == '/':
...         print(f'returning url[-1]: {url[-1]}')
...         return url[:-1]
...     if urlx[-1].startswith('?'):
...         print(f'urlx[-1][1:]: {urlx[-1][1:]}')
...         print(f"returning '/'.join(urlx): {'/'.join(urlx)}")
...     return '/'.join(urlx)
...     # No change needed
...     return url
... 
>>> print(f'url1: {change_url(url1)}')
url: https://www.facebook.com/xxxxxx/test1q223/, urlx: ['https:', '', 'www.facebook.com', 'xxxxxx', 'test1q223', '']
returning url[-1]: /
url1: https://www.facebook.com/xxxxxx/test1q223
>>> # ------------------------------------------
... 
>>> print(f'url2: {change_url(url2)}')
url: https://www.facebook.com/xxxxxx/?test1q223, urlx: ['https:', '', 'www.facebook.com', 'xxxxxx', '?test1q223']
urlx[-1][1:]: test1q223
returning '/'.join(urlx): https://www.facebook.com/xxxxxx/?test1q223
url2: https://www.facebook.com/xxxxxx/?test1q223
>>> # ------------------------------------------
... 
>>> print(f'url3: {change_url(url3)}')
url: https://www.facebook.com/xxxxxx/test1q223, urlx: ['https:', '', 'www.facebook.com', 'xxxxxx', 'test1q223']
url3: https://www.facebook.com/xxxxxx/test1q223
>>> 
Reply
#10
Hi
i still don't get it.
why url1, url3 and url4 look the same except with the / for url1.
If i wants to remove the url1's /test1q223 and print the same as url3 and url4, what do i have to changed in the if statement.
i wants to print like this
url1: https://www.facebook.com/xxxxxx
url2: https://www.facebook.com/xxxxxx
url3: https://www.facebook.com/xxxxxx
url4: https://www.facebook.com/xxxxxx

but right now it's print like this
url1: https://www.facebook.com/xxxxxx/test1q223
url2: https://www.facebook.com/xxxxxx
url3: https://www.facebook.com/xxxxxx
url4: https://www.facebook.com/xxxxxx
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  remove gilberishs from a "string" kucingkembar 2 202 Mar-15-2024, 08:51 AM
Last Post: kucingkembar
Smile please help me remove error for string.strip() jamie_01 3 1,149 Oct-14-2022, 07:48 AM
Last Post: Pedroski55
  Writing string to file results in one character per line RB76SFJPsJJDu3bMnwYM 4 1,305 Sep-27-2022, 01:38 PM
Last Post: buran
  Remove a space between a string and variable in print sie 5 1,706 Jul-27-2022, 02:36 PM
Last Post: deanhystad
  How do I remove spurious "." from a string? Zuhan 7 1,962 Apr-12-2022, 02:06 PM
Last Post: Pedroski55
  Regex: a string does not starts and ends with the same character Melcu54 5 2,367 Jul-04-2021, 07:51 PM
Last Post: Melcu54
  [solved] unexpected character after line continuation character paul18fr 4 3,292 Jun-22-2021, 03:22 PM
Last Post: deanhystad
  How to remove char from string?? ridgerunnersjw 2 2,481 Sep-30-2020, 03:49 PM
Last Post: ridgerunnersjw
  SyntaxError: unexpected character after line continuation character siteshkumar 2 3,105 Jul-13-2020, 07:05 PM
Last Post: snippsat
  Remove from end of string up to and including some character lbtdne 2 2,284 May-17-2020, 09:24 AM
Last Post: menator01

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020