Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How do I parse the string?
#1
hello

There is a string below.
FTP://21.105.28.15/abc/test1.txt
Cut this string
/abc/
I just want to get the path.

How do I parse the string?

import pymssql
conn = pymssql.connect('10.21.100.21', 'sa', 'abc', 'DBA', as_dict=False) 
cur = conn.cursor()

cur.execute('select path From table')
 
fetch = cur.fetchall()
 
for i in fetch:
	print (i) <==== FTP://21.105.28.15/abc/test1.txt (Cut this string /abc/ 
I just want to get the path.)

conn.close()
Gribouillis write Apr-10-2024, 07:32 AM:
Please post all code, output and errors (it it's entirety) between their respective tags. Refer to BBCode help topic on how to post. Use the "Preview Post" button to make sure the code is presented as you expect before hitting the "Post Reply/Thread" button.
Reply
#2
You could use urllib.parse.urlparse
>>> from urllib.parse import urlparse
>>> urlparse('FTP://21.105.28.15/abc/test1.txt')
ParseResult(scheme='ftp', netloc='21.105.28.15', path='/abc/test1.txt', params='', query='', fragment='')
>>> result = urlparse('FTP://21.105.28.15/abc/test1.txt')
>>> result.path
'/abc/test1.txt'
>>> 
« We can solve any problem by introducing an extra level of indirection »
Reply
#3
Maybe like this, without using any modules: find the third / and the last /

mystring = 'FTP://21.105.28.15/abc/test1.txt'
mystring2 = 'FTP://21.105.28.15/abc/def/ghi/test1.txt'

# get the index of the third /
def getFirst():
    count = 0
    for s in range(1, len(mystring)):
        # skip the first 2 /
        if mystring[s] == '/':
            count +=1            
            # get the third /
            if count == 3:
                index_first = s
                return index_first

# get the index of the last /
def getLast():
    for s in range(1, len(mystring)):
        if mystring[-s] == '/':
            index_last = -(s - 1)
            return index_last

wanted = mystring[getFirst():getLast()]
wanted = mystring2[getFirst():getLast()]
Gives:

Output:
wanted '/abc/'
or

Output:
wanted '/abc/def/ghi/'
Reply
#4
Another way
astring = 'FTP://21.105.28.15/abc/test1.txt'
astring2 = 'FTP://21.105.28.15/abc/def/test1.txt'

def remover(astring):
    astring = astring.split('/')
    for index, item in enumerate(astring):
        if item == '':
            astring.remove(item)
        elif index in (0, 1):
            astring.remove(item)
    return '/'.join(astring)

print(remover(astring))
print(remover(astring2))
output
Output:
/abc/test1.txt /abc/def/test1.txt
little adjustment can remove the file at the end as well
I welcome all feedback.
The only dumb question, is one that doesn't get asked.
My Github
How to post code using bbtags
Download my project scripts


Reply
#5
If you need a path, a Path object is maybe what you want.

Code is based on Gribouillis example:
from urllib.parse import urlparse
from pathlib import PurePosixPath


def url2path(url: str) -> PurePosixPath:
    return PurePosixPath(urlparse(url).path)


path = url2path('FTP://21.105.28.15/abc/test1.txt')

print("path:", path)
print("path.parent", path.parent)

print("path.parts", path.parts)
print("path.parents", list(path.parents))

print("path.name:", path.name)
print("path.stem:", path.stem)
print("path.suffix", path.suffix)


# Path objects are printed as strings
# but it's a different type
# here how to do it explicit
print("Path as str:", str(path))

# not all 3rd party libraries handle the conversion from `Path` to `str` implicit
Output:
path: /abc/test1.txt path.parent /abc path.parts ('/', 'abc', 'test1.txt') path.parents [PurePosixPath('/abc'), PurePosixPath('/')] path.name: test1.txt path.stem: test1 path.suffix .txt Path as str: /abc/test1.txt
This also works with other non-FTP urls:
urls = (
    'FTP://21.105.28.15/abc/test1.txt',
    'FTPS://21.105.28.15/abc/test1.txt',
    'HTTP://21.105.28.15/abc/test1.txt',
    'HTTPS://21.105.28.15/abc/test1.txt',
    'MyOwnUselessURL://xyz.de/abc/test1.txt',
)


for url in urls:
    print(url2path(url))
Boring output:
Output:
/abc/test1.txt /abc/test1.txt /abc/test1.txt /abc/test1.txt /abc/test1.txt
Documents you should read:
Gribouillis and Larz60+ like this post
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#6
mystring = 'FTP://21.105.28.15/abc/test1.txt'
mystring2 = 'FTP://21.105.28.15/abc/def/ghi/test1.txt'

# Get the index of the third /
def getFirst(url):
    count = 0
    for i, char in enumerate(url):
        if char == '/':
            count += 1
            if count == 3:
                return i
    return -1  # Return -1 if there's no third '/'

# Get the index of the last /
def getLast(url):
    return url.rfind('/')  # Use rfind to get the last occurrence of '/'

# Extract the substring between the third and the last '/'
def extract_substring(url):
    first_index = getFirst(url)
    last_index = getLast(url)
    if first_index != -1 and first_index < last_index:
        return url[first_index + 1:last_index]  # Skip the '/' itself
    return ""

# Extract and print the portions of the URLs
wanted1 = extract_substring(mystring)
wanted2 = extract_substring(mystring2)

print(wanted1)  # Output: abc/test1.txt
print(wanted2)  # Output: abc/def/ghi/test1.txt
I'm using this and it work for me.
Larz60+ write Jan-21-2025, 08:58 AM:
Please post all code, output and errors (it it's entirety) between their respective tags. Refer to BBCode help topic on how to post. Use the "Preview Post" button to make sure the code is presented as you expect before hitting the "Post Reply/Thread" button.
Tags have been added. Please use BBCode tags on future posts
Gribouillis write Jan-20-2025, 08:23 PM:
Clickbait link removed. Please read What to NOT include in a post
Reply
#7
I define an "extract_path" function that uses a regular expression to find the part of the URL after "/abc/" and extract it.

import pymssql
import re

def extract_path(url):
    # Use regex to find the part after "/abc/"
    match = re.search(r'/abc/(.*)', url)
    if match:
        return match.group(1)
    return None

# Database connection
conn = pymssql.connect('10.21.100.21', 'sa', 'abc', 'DBA', as_dict=False)
cur = conn.cursor()

# Execute the query
cur.execute('SELECT path FROM table')

# Fetch all results
fetch = cur.fetchall()

# Process and print the results
for row in fetch:
    ftp_url = row[0]
    extracted_path = extract_path(ftp_url)
    if extracted_path:
        print(f"Original URL: {ftp_url}")
        print(f"Extracted path: {extracted_path}")
        print("---")

# Close the connection
conn.close()
This is working at your end?
our gd project- geometry dash
Reply
#8
Solving this kind of problem with regex is brute-force. Sometimes it's good to take a step back and ask, if you can solve the problem differently.
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#9
(Jan-21-2025, 12:48 PM)DeaD_EyE Wrote: Solving this kind of problem with regex is brute-force. Sometimes it's good to take a step back and ask, if you can solve the problem differently.

If it really works, there is no reason to refuse.
buran write Feb-13-2025, 07:59 AM:
Spam link removed
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  [split] Parse Nested JSON String in Python mmm07 4 2,742 Mar-28-2023, 06:07 PM
Last Post: snippsat
  parse String jaykappy 2 1,486 Dec-23-2022, 07:42 AM
Last Post: praveencqr
  Parse String between 2 Delimiters and add as single list items lastyle 5 4,538 Apr-11-2021, 11:03 PM
Last Post: lastyle

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020