remove string character from url - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: remove string character from url (/thread-16520.html) Pages:
1
2
|
remove string character from url - jacklee26 - Mar-03-2019 I have a url https://www.facebook.com/xxxxxx?test1q223, does anyone knows any method to remove from ? to the end of the string. So I expected it should be https://www.facebook.com/xxxxxx. RE: remove string character from url - Larz60+ - Mar-03-2019 >>> url = 'https://www.facebook.com/xxxxxx?test1q223' >>> url = url.split('?')[:-1][0] >>> url 'https://www.facebook.com/xxxxxx' >>> RE: remove string character from url - stranac - Mar-03-2019 There are also ways to do it using the stdlib's urllib.parse module: >>> from urllib.parse import urlparse >>> parsed = urlparse('https://www.facebook.com/xxxxxx?test1q223') >>> parsed._replace(query='').geturl() 'https://www.facebook.com/xxxxxx'Note that _replace is an undocumented "private" method, so its semantics might change in the future (although it's pretty unlikely).You might also want to replace params and fragment , depending on your exact needs.
RE: remove string character from url - jacklee26 - Mar-04-2019 HI thanks. if i wants to write in loop, is it possible be like this: i put the url in the text file, and export it to a new file. Example: 'orgginal.txt' would be like https://www.facebook.com/xxxxxx?test1q223, and out.txt would be like https://www.facebook.com/xxxxxx with open('orgginal.txt') as f,open('out.txt', 'w') as f_out: for line in f: line = line.strip() line=line.split('?')[:-1][0] f_out.write('{}\n'.format(line)) RE: remove string character from url - jacklee26 - Mar-04-2019 i have figure it out with open('orgginal.txt') as f,open('out.txt', 'w') as f_out: for line in f: line = line.strip() print(line) line=line.split('?')[:-1][0] print(line) f_out.write('{}\n'.format(line))or from urllib.parse import urlparse with open('orgginal.txt') as f,open('out.txt', 'w') as f_out: for line in f: line = line.strip() parsed = urlparse(line) #print(line) newline=parsed._replace(query='').geturl() print(newline) #f_out.write('{}\n'.format(line)) RE: remove string character from url - jacklee26 - Mar-09-2019 HI sorry, i have another question to ask you guys. what if my url look like this https://www.facebook.com/xxxxxx/test1q223/, how do i remove the last /, it should be like this https://www.facebook.com/xxxxxx/test1q223. i saw there is one method we can use by st = st[:-1]. But how to determine if sometime have a ? or have a / at the end. For example https://www.facebook.com/xxxxxx/test1q223/ https://www.facebook.com/xxxxxx/?test1q223 how to let it changed to https://www.facebook.com/xxxxxx/test1q223 https://www.facebook.com/xxxxxx is there any way we can check if we meet this 2 condition, remove the ? and / at the end >>> parsed = urlparse('https://www.facebook.com/xxxxxx/est1q223/') >>> parsed._replace(query='').geturl() 'https://www.facebook.com/xxxxxx/est1q223/' >>> RE: remove string character from url - Larz60+ - Mar-09-2019 You really need to study up on slicing, see: https://www.python-course.eu/python3_sequential_data_types.php This will handle all cases: url1 = 'https://www.facebook.com/xxxxxx/test1q223/' url2 = 'https://www.facebook.com/xxxxxx/?test1q223' url3 = 'https://www.facebook.com/xxxxxx/test1q223' def change_url(url): urlx = url.split('/') if url[-1] == '/': return url[:-1] if urlx[-1].startswith('?'): urlx[-1] = urlx[-1][1:] return '/'.join(urlx) return url print(f'url1: {change_url(url1)}') print(f'url2: {change_url(url2)}') print(f'url3: {change_url(url3)}')results:
RE: remove string character from url - jacklee26 - Mar-24-2019 hi, i have a question about this script. Why does my first url1 didn't cut off the last string" /test1q223 "? url1 = 'https://www.facebook.com/xxxxxx/test1q223/' url2 = 'https://www.facebook.com/xxxxxx/?test1q223' url3 = 'https://www.facebook.com/xxxxxx/test1q223' url4 = 'https://www.facebook.com/xxxxxx/test1q223' def change_url(url): urlx = url.split('/') #print (urlx) #print(urlx[4]) # if url[-1] == '/': # print("yes1") # return ''.join(urlx[:-1]) if urlx[-1].startswith('?'): urlx[-1] = urlx[-1][1:] #print("yes2") return '/'.join(urlx[:-1]) if urlx[2]!='': urlx[-1] = urlx[:4][3] #print(urlx[:3][2]) #print(urlx[2]) #print("yes3") #print("ttt"+urlx[:3][1]) #print(urlx) return '/'.join(urlx[:-1]) return url print(f'url1: {change_url(url1)}') print(f'url2: {change_url(url2)}') print(f'url3: {change_url(url3)}') print(f'url4: {change_url(url4)}')#output is : url1: https://www.facebook.com/xxxxxx/test1q223 url2: https://www.facebook.com/xxxxxx url3: https://www.facebook.com/xxxxxx url4: https://www.facebook.com/xxxxxx RE: remove string character from url - Larz60+ - Mar-24-2019 Quote:Why does my first url1 didn't cut off the last string" /test1q223 "?Break it down step by step (uses f-string which requires python 3.6 or newer) >>> url1 = 'https://www.facebook.com/xxxxxx/test1q223/' >>> url2 = 'https://www.facebook.com/xxxxxx/?test1q223' >>> url3 = 'https://www.facebook.com/xxxxxx/test1q223' >>> def change_url(url): ... urlx = url.split('/') ... print(f'url: {url}, urlx: {urlx}') ... if url[-1] == '/': ... print(f'returning url[-1]: {url[-1]}') ... return url[:-1] ... if urlx[-1].startswith('?'): ... print(f'urlx[-1][1:]: {urlx[-1][1:]}') ... print(f"returning '/'.join(urlx): {'/'.join(urlx)}") ... return '/'.join(urlx) ... # No change needed ... return url ... >>> print(f'url1: {change_url(url1)}') url: https://www.facebook.com/xxxxxx/test1q223/, urlx: ['https:', '', 'www.facebook.com', 'xxxxxx', 'test1q223', ''] returning url[-1]: / url1: https://www.facebook.com/xxxxxx/test1q223 >>> # ------------------------------------------ ... >>> print(f'url2: {change_url(url2)}') url: https://www.facebook.com/xxxxxx/?test1q223, urlx: ['https:', '', 'www.facebook.com', 'xxxxxx', '?test1q223'] urlx[-1][1:]: test1q223 returning '/'.join(urlx): https://www.facebook.com/xxxxxx/?test1q223 url2: https://www.facebook.com/xxxxxx/?test1q223 >>> # ------------------------------------------ ... >>> print(f'url3: {change_url(url3)}') url: https://www.facebook.com/xxxxxx/test1q223, urlx: ['https:', '', 'www.facebook.com', 'xxxxxx', 'test1q223'] url3: https://www.facebook.com/xxxxxx/test1q223 >>> RE: remove string character from url - jacklee26 - Mar-25-2019 Hi i still don't get it. why url1, url3 and url4 look the same except with the / for url1. If i wants to remove the url1's /test1q223 and print the same as url3 and url4, what do i have to changed in the if statement. i wants to print like this url1: https://www.facebook.com/xxxxxx url2: https://www.facebook.com/xxxxxx url3: https://www.facebook.com/xxxxxx url4: https://www.facebook.com/xxxxxx but right now it's print like this url1: https://www.facebook.com/xxxxxx/test1q223 url2: https://www.facebook.com/xxxxxx url3: https://www.facebook.com/xxxxxx url4: https://www.facebook.com/xxxxxx |