remove string character from url

remove string character from url - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: remove string character from url (/thread-16520.html)

Pages: 1 2

remove string character from url - jacklee26 - Mar-03-2019

I have a url https://www.facebook.com/xxxxxx?test1q223, does anyone knows any method to remove from ? to the end of the string.
So I expected it should be https://www.facebook.com/xxxxxx.

RE: remove string character from url - Larz60+ - Mar-03-2019

>>> url = 'https://www.facebook.com/xxxxxx?test1q223'
>>> url = url.split('?')[:-1][0]
>>> url
'https://www.facebook.com/xxxxxx'
>>>

RE: remove string character from url - stranac - Mar-03-2019

There are also ways to do it using the stdlib's urllib.parse module:

>>> from urllib.parse import urlparse
>>> parsed = urlparse('https://www.facebook.com/xxxxxx?test1q223')
>>> parsed._replace(query='').geturl()
'https://www.facebook.com/xxxxxx'

Note that _replace is an undocumented "private" method, so its semantics might change in the future (although it's pretty unlikely).
You might also want to replace params and fragment, depending on your exact needs.

RE: remove string character from url - jacklee26 - Mar-04-2019

HI thanks.
if i wants to write in loop, is it possible be like this:
i put the url in the text file, and export it to a new file.
Example: 'orgginal.txt' would be like https://www.facebook.com/xxxxxx?test1q223, and out.txt would be like https://www.facebook.com/xxxxxx

with open('orgginal.txt') as f,open('out.txt', 'w') as f_out:
    for line in f:
        line = line.strip()
	line=line.split('?')[:-1][0]
        f_out.write('{}\n'.format(line))

RE: remove string character from url - jacklee26 - Mar-04-2019

i have figure it out

with open('orgginal.txt') as f,open('out.txt', 'w') as f_out:
    for line in f:
        line = line.strip()
        print(line)
        line=line.split('?')[:-1][0]
        print(line)
        f_out.write('{}\n'.format(line))

from urllib.parse import urlparse
with open('orgginal.txt') as f,open('out.txt', 'w') as f_out:
    for line in f:
        line = line.strip()
        parsed = urlparse(line)
        #print(line)
        newline=parsed._replace(query='').geturl()
        print(newline)
        #f_out.write('{}\n'.format(line))

RE: remove string character from url - jacklee26 - Mar-09-2019

HI sorry,
i have another question to ask you guys.
what if my url look like this https://www.facebook.com/xxxxxx/test1q223/, how do i remove the last /, it should be like this https://www.facebook.com/xxxxxx/test1q223.
i saw there is one method we can use by st = st[:-1].
But how to determine if sometime have a ? or have a / at the end.
For example
https://www.facebook.com/xxxxxx/test1q223/
https://www.facebook.com/xxxxxx/?test1q223

how to let it changed to
https://www.facebook.com/xxxxxx/test1q223
https://www.facebook.com/xxxxxx

is there any way we can check if we meet this 2 condition, remove the ? and / at the end

>>> parsed = urlparse('https://www.facebook.com/xxxxxx/est1q223/')
>>> parsed._replace(query='').geturl()
'https://www.facebook.com/xxxxxx/est1q223/'
>>>

RE: remove string character from url - Larz60+ - Mar-09-2019

You really need to study up on slicing, see: https://www.python-course.eu/python3_sequential_data_types.php

This will handle all cases:

url1 = 'https://www.facebook.com/xxxxxx/test1q223/'
url2 = 'https://www.facebook.com/xxxxxx/?test1q223'
url3 = 'https://www.facebook.com/xxxxxx/test1q223'

def change_url(url):
    urlx = url.split('/')
    if url[-1] == '/':
        return url[:-1]
    if urlx[-1].startswith('?'):
        urlx[-1] = urlx[-1][1:]
        return '/'.join(urlx)
    return url

print(f'url1: {change_url(url1)}')
print(f'url2: {change_url(url2)}')
print(f'url3: {change_url(url3)}')

results:

Output:url1: https://www.facebook.com/xxxxxx/test1q223
url2: https://www.facebook.com/xxxxxx/test1q223
url3: https://www.facebook.com/xxxxxx/test1q223

RE: remove string character from url - jacklee26 - Mar-24-2019

hi, i have a question about this script.
Why does my first url1 didn't cut off the last string" /test1q223 "?

url1 = 'https://www.facebook.com/xxxxxx/test1q223/'
url2 = 'https://www.facebook.com/xxxxxx/?test1q223'
url3 = 'https://www.facebook.com/xxxxxx/test1q223'
url4 = 'https://www.facebook.com/xxxxxx/test1q223' 
def change_url(url):
    urlx = url.split('/')
    #print (urlx)
    #print(urlx[4])
#    if url[-1] == '/':
#        print("yes1")
  #      return ''.join(urlx[:-1])
        
    if urlx[-1].startswith('?'):
        urlx[-1] = urlx[-1][1:]
        #print("yes2")
        return '/'.join(urlx[:-1])
        
    if urlx[2]!='':
        urlx[-1] = urlx[:4][3]
        #print(urlx[:3][2])
        #print(urlx[2])
        #print("yes3")
        #print("ttt"+urlx[:3][1])
        #print(urlx)
        return '/'.join(urlx[:-1])




    return url
print(f'url1: {change_url(url1)}')
print(f'url2: {change_url(url2)}')
print(f'url3: {change_url(url3)}')
print(f'url4: {change_url(url4)}')

#output is :
url1: https://www.facebook.com/xxxxxx/test1q223
url2: https://www.facebook.com/xxxxxx
url3: https://www.facebook.com/xxxxxx
url4: https://www.facebook.com/xxxxxx

RE: remove string character from url - Larz60+ - Mar-24-2019

Quote:Why does my first url1 didn't cut off the last string" /test1q223 "?

Break it down step by step (uses f-string which requires python 3.6 or newer)

>>> url1 = 'https://www.facebook.com/xxxxxx/test1q223/'
>>> url2 = 'https://www.facebook.com/xxxxxx/?test1q223'
>>> url3 = 'https://www.facebook.com/xxxxxx/test1q223'
>>> def change_url(url):
...     urlx = url.split('/')
...     print(f'url: {url}, urlx: {urlx}')
...     if url[-1] == '/':
...         print(f'returning url[-1]: {url[-1]}')
...         return url[:-1]
...     if urlx[-1].startswith('?'):
...         print(f'urlx[-1][1:]: {urlx[-1][1:]}')
...         print(f"returning '/'.join(urlx): {'/'.join(urlx)}")
...     return '/'.join(urlx)
...     # No change needed
...     return url
... 
>>> print(f'url1: {change_url(url1)}')
url: https://www.facebook.com/xxxxxx/test1q223/, urlx: ['https:', '', 'www.facebook.com', 'xxxxxx', 'test1q223', '']
returning url[-1]: /
url1: https://www.facebook.com/xxxxxx/test1q223
>>> # ------------------------------------------
... 
>>> print(f'url2: {change_url(url2)}')
url: https://www.facebook.com/xxxxxx/?test1q223, urlx: ['https:', '', 'www.facebook.com', 'xxxxxx', '?test1q223']
urlx[-1][1:]: test1q223
returning '/'.join(urlx): https://www.facebook.com/xxxxxx/?test1q223
url2: https://www.facebook.com/xxxxxx/?test1q223
>>> # ------------------------------------------
... 
>>> print(f'url3: {change_url(url3)}')
url: https://www.facebook.com/xxxxxx/test1q223, urlx: ['https:', '', 'www.facebook.com', 'xxxxxx', 'test1q223']
url3: https://www.facebook.com/xxxxxx/test1q223
>>>

RE: remove string character from url - jacklee26 - Mar-25-2019

Hi
i still don't get it.
why url1, url3 and url4 look the same except with the / for url1.
If i wants to remove the url1's /test1q223 and print the same as url3 and url4, what do i have to changed in the if statement.
i wants to print like this
url1: https://www.facebook.com/xxxxxx
url2: https://www.facebook.com/xxxxxx
url3: https://www.facebook.com/xxxxxx
url4: https://www.facebook.com/xxxxxx

but right now it's print like this
url1: https://www.facebook.com/xxxxxx/test1q223
url2: https://www.facebook.com/xxxxxx
url3: https://www.facebook.com/xxxxxx
url4: https://www.facebook.com/xxxxxx