Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 remove string character from url
#1
I have a url https://www.facebook.com/xxxxxx?test1q223, does anyone knows any method to remove from ? to the end of the string.
So I expected it should be https://www.facebook.com/xxxxxx.
Quote
#2
>>> url = 'https://www.facebook.com/xxxxxx?test1q223'
>>> url = url.split('?')[:-1][0]
>>> url
'https://www.facebook.com/xxxxxx'
>>> 
Quote
#3
There are also ways to do it using the stdlib's urllib.parse module:
>>> from urllib.parse import urlparse
>>> parsed = urlparse('https://www.facebook.com/xxxxxx?test1q223')
>>> parsed._replace(query='').geturl()
'https://www.facebook.com/xxxxxx'
Note that _replace is an undocumented "private" method, so its semantics might change in the future (although it's pretty unlikely).
You might also want to replace params and fragment, depending on your exact needs.
Quote
#4
HI thanks.
if i wants to write in loop, is it possible be like this:
i put the url in the text file, and export it to a new file.
Example: 'orgginal.txt' would be like https://www.facebook.com/xxxxxx?test1q223, and out.txt would be like https://www.facebook.com/xxxxxx

with open('orgginal.txt') as f,open('out.txt', 'w') as f_out:
    for line in f:
        line = line.strip()
	line=line.split('?')[:-1][0]
        f_out.write('{}\n'.format(line))

Quote
#5
i have figure it out

with open('orgginal.txt') as f,open('out.txt', 'w') as f_out:
    for line in f:
        line = line.strip()
        print(line)
        line=line.split('?')[:-1][0]
        print(line)
        f_out.write('{}\n'.format(line))
	
or
from urllib.parse import urlparse
with open('orgginal.txt') as f,open('out.txt', 'w') as f_out:
    for line in f:
        line = line.strip()
        parsed = urlparse(line)
        #print(line)
        newline=parsed._replace(query='').geturl()
        print(newline)
        #f_out.write('{}\n'.format(line))
	
Quote
#6
HI sorry,
i have another question to ask you guys.
what if my url look like this https://www.facebook.com/xxxxxx/test1q223/, how do i remove the last /, it should be like this https://www.facebook.com/xxxxxx/test1q223.
i saw there is one method we can use by st = st[:-1].
But how to determine if sometime have a ? or have a / at the end.
For example
https://www.facebook.com/xxxxxx/test1q223/
https://www.facebook.com/xxxxxx/?test1q223

how to let it changed to
https://www.facebook.com/xxxxxx/test1q223
https://www.facebook.com/xxxxxx

is there any way we can check if we meet this 2 condition, remove the ? and / at the end




>>> parsed = urlparse('https://www.facebook.com/xxxxxx/est1q223/')
>>> parsed._replace(query='').geturl()
'https://www.facebook.com/xxxxxx/est1q223/'
>>>
Quote
#7
You really need to study up on slicing, see: https://www.python-course.eu/python3_seq..._types.php

This will handle all cases:
url1 = 'https://www.facebook.com/xxxxxx/test1q223/'
url2 = 'https://www.facebook.com/xxxxxx/?test1q223'
url3 = 'https://www.facebook.com/xxxxxx/test1q223'

def change_url(url):
    urlx = url.split('/')
    if url[-1] == '/':
        return url[:-1]
    if urlx[-1].startswith('?'):
        urlx[-1] = urlx[-1][1:]
        return '/'.join(urlx)
    return url

print(f'url1: {change_url(url1)}')
print(f'url2: {change_url(url2)}')
print(f'url3: {change_url(url3)}')
results:
Output:
url1: https://www.facebook.com/xxxxxx/test1q223 url2: https://www.facebook.com/xxxxxx/test1q223 url3: https://www.facebook.com/xxxxxx/test1q223
Quote
#8
hi, i have a question about this script.
Why does my first url1 didn't cut off the last string" /test1q223 "?


url1 = 'https://www.facebook.com/xxxxxx/test1q223/'
url2 = 'https://www.facebook.com/xxxxxx/?test1q223'
url3 = 'https://www.facebook.com/xxxxxx/test1q223'
url4 = 'https://www.facebook.com/xxxxxx/test1q223' 
def change_url(url):
    urlx = url.split('/')
    #print (urlx)
    #print(urlx[4])
#    if url[-1] == '/':
#        print("yes1")
  #      return ''.join(urlx[:-1])
        
    if urlx[-1].startswith('?'):
        urlx[-1] = urlx[-1][1:]
        #print("yes2")
        return '/'.join(urlx[:-1])
        
    if urlx[2]!='':
        urlx[-1] = urlx[:4][3]
        #print(urlx[:3][2])
        #print(urlx[2])
        #print("yes3")
        #print("ttt"+urlx[:3][1])
        #print(urlx)
        return '/'.join(urlx[:-1])




    return url
print(f'url1: {change_url(url1)}')
print(f'url2: {change_url(url2)}')
print(f'url3: {change_url(url3)}')
print(f'url4: {change_url(url4)}')


#output is :
url1: https://www.facebook.com/xxxxxx/test1q223
url2: https://www.facebook.com/xxxxxx
url3: https://www.facebook.com/xxxxxx
url4: https://www.facebook.com/xxxxxx
Quote
#9
Quote:Why does my first url1 didn't cut off the last string" /test1q223 "?
Break it down step by step (uses f-string which requires python 3.6 or newer)
>>> url1 = 'https://www.facebook.com/xxxxxx/test1q223/'
>>> url2 = 'https://www.facebook.com/xxxxxx/?test1q223'
>>> url3 = 'https://www.facebook.com/xxxxxx/test1q223'
>>> def change_url(url):
...     urlx = url.split('/')
...     print(f'url: {url}, urlx: {urlx}')
...     if url[-1] == '/':
...         print(f'returning url[-1]: {url[-1]}')
...         return url[:-1]
...     if urlx[-1].startswith('?'):
...         print(f'urlx[-1][1:]: {urlx[-1][1:]}')
...         print(f"returning '/'.join(urlx): {'/'.join(urlx)}")
...     return '/'.join(urlx)
...     # No change needed
...     return url
... 
>>> print(f'url1: {change_url(url1)}')
url: https://www.facebook.com/xxxxxx/test1q223/, urlx: ['https:', '', 'www.facebook.com', 'xxxxxx', 'test1q223', '']
returning url[-1]: /
url1: https://www.facebook.com/xxxxxx/test1q223
>>> # ------------------------------------------
... 
>>> print(f'url2: {change_url(url2)}')
url: https://www.facebook.com/xxxxxx/?test1q223, urlx: ['https:', '', 'www.facebook.com', 'xxxxxx', '?test1q223']
urlx[-1][1:]: test1q223
returning '/'.join(urlx): https://www.facebook.com/xxxxxx/?test1q223
url2: https://www.facebook.com/xxxxxx/?test1q223
>>> # ------------------------------------------
... 
>>> print(f'url3: {change_url(url3)}')
url: https://www.facebook.com/xxxxxx/test1q223, urlx: ['https:', '', 'www.facebook.com', 'xxxxxx', 'test1q223']
url3: https://www.facebook.com/xxxxxx/test1q223
>>> 
Quote
#10
Hi
i still don't get it.
why url1, url3 and url4 look the same except with the / for url1.
If i wants to remove the url1's /test1q223 and print the same as url3 and url4, what do i have to changed in the if statement.
i wants to print like this
url1: https://www.facebook.com/xxxxxx
url2: https://www.facebook.com/xxxxxx
url3: https://www.facebook.com/xxxxxx
url4: https://www.facebook.com/xxxxxx

but right now it's print like this
url1: https://www.facebook.com/xxxxxx/test1q223
url2: https://www.facebook.com/xxxxxx
url3: https://www.facebook.com/xxxxxx
url4: https://www.facebook.com/xxxxxx
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  how can i handle "expected a character " type error , when I input no character vivekagrey 2 150 Jan-05-2020, 11:50 AM
Last Post: vivekagrey
  Highlight and remove specific string of text itsalmade 5 260 Dec-11-2019, 11:58 PM
Last Post: micseydel
  How to get the index of a character from a string chihaya 1 135 Dec-03-2019, 12:54 PM
Last Post: buran
  Cannot Remove the Double Quotes on a Certain Word (String) Python BeautifulSoup soothsayerpg 5 520 Oct-27-2019, 09:53 AM
Last Post: newbieAuggie2019
  python gives wrong string length and wrong character thienson30 2 246 Oct-15-2019, 08:54 PM
Last Post: Gribouillis
  with input remove a string from the list konsular 3 235 Oct-12-2019, 09:25 AM
Last Post: konsular
  Remove \n at the end of a character from a list judkil 2 355 Jun-24-2019, 12:15 AM
Last Post: DeaD_EyE
  Find string and add character - newbi PyDK 1 283 May-15-2019, 01:22 PM
Last Post: ichabod801
  Replace changing string including uppercase character with lowercase character silfer 11 1,025 Mar-25-2019, 12:54 PM
Last Post: silfer
  Changing a character in a string Livne_ye 4 502 Mar-13-2019, 12:39 PM
Last Post: Larz60+

Forum Jump:


Users browsing this thread: 1 Guest(s)