How do I parse the string?

anna17 · (This post was last modified: Apr-10-2024, 07:32 AM by Gribouillis.)

hello

There is a string below.
FTP://21.105.28.15/abc/test1.txt
Cut this string
/abc/
I just want to get the path.

How do I parse the string?

import pymssql
conn = pymssql.connect('10.21.100.21', 'sa', 'abc', 'DBA', as_dict=False) 
cur = conn.cursor()

cur.execute('select path From table')
 
fetch = cur.fetchall()
 
for i in fetch:
	print (i) <==== FTP://21.105.28.15/abc/test1.txt (Cut this string /abc/ 
I just want to get the path.)

conn.close()

Gribouillis write Apr-10-2024, 07:32 AM:
Please post all code, output and errors (it it's entirety) between their respective tags. Refer to BBCode help topic on how to post. Use the "Preview Post" button to make sure the code is presented as you expect before hitting the "Post Reply/Thread" button.

**Gribouillis** · (This post was last modified: Apr-10-2024, 07:37 AM by Gribouillis.)

You could use urllib.parse.urlparse

>>> from urllib.parse import urlparse
>>> urlparse('FTP://21.105.28.15/abc/test1.txt')
ParseResult(scheme='ftp', netloc='21.105.28.15', path='/abc/test1.txt', params='', query='', fragment='')
>>> result = urlparse('FTP://21.105.28.15/abc/test1.txt')
>>> result.path
'/abc/test1.txt'
>>>

Pedroski55 · Apr-10-2024, 09:40 AM

Maybe like this, without using any modules: find the third / and the last /

mystring = 'FTP://21.105.28.15/abc/test1.txt'
mystring2 = 'FTP://21.105.28.15/abc/def/ghi/test1.txt'

# get the index of the third /
def getFirst():
    count = 0
    for s in range(1, len(mystring)):
        # skip the first 2 /
        if mystring[s] == '/':
            count +=1            
            # get the third /
            if count == 3:
                index_first = s
                return index_first

# get the index of the last /
def getLast():
    for s in range(1, len(mystring)):
        if mystring[-s] == '/':
            index_last = -(s - 1)
            return index_last

wanted = mystring[getFirst():getLast()]
wanted = mystring2[getFirst():getLast()]

Gives:

Output:wanted
'/abc/'

or

Output:wanted
'/abc/def/ghi/'

menator01 · (This post was last modified: Apr-10-2024, 10:23 AM by menator01.)

Another way

astring = 'FTP://21.105.28.15/abc/test1.txt'
astring2 = 'FTP://21.105.28.15/abc/def/test1.txt'

def remover(astring):
    astring = astring.split('/')
    for index, item in enumerate(astring):
        if item == '':
            astring.remove(item)
        elif index in (0, 1):
            astring.remove(item)
    return '/'.join(astring)

print(remover(astring))
print(remover(astring2))

output

Output:/abc/test1.txt
/abc/def/test1.txt

little adjustment can remove the file at the end as well

DeaD_EyE · (This post was last modified: Apr-10-2024, 10:26 AM by DeaD_EyE.)

If you need a path, a Path object is maybe what you want.

Code is based on Gribouillis example:

from urllib.parse import urlparse
from pathlib import PurePosixPath


def url2path(url: str) -> PurePosixPath:
    return PurePosixPath(urlparse(url).path)


path = url2path('FTP://21.105.28.15/abc/test1.txt')

print("path:", path)
print("path.parent", path.parent)

print("path.parts", path.parts)
print("path.parents", list(path.parents))

print("path.name:", path.name)
print("path.stem:", path.stem)
print("path.suffix", path.suffix)


# Path objects are printed as strings
# but it's a different type
# here how to do it explicit
print("Path as str:", str(path))

# not all 3rd party libraries handle the conversion from `Path` to `str` implicit

Output:path: /abc/test1.txt
path.parent /abc
path.parts ('/', 'abc', 'test1.txt')
path.parents [PurePosixPath('/abc'), PurePosixPath('/')]
path.name: test1.txt
path.stem: test1
path.suffix .txt
Path as str: /abc/test1.txt

This also works with other non-FTP urls:

urls = (
    'FTP://21.105.28.15/abc/test1.txt',
    'FTPS://21.105.28.15/abc/test1.txt',
    'HTTP://21.105.28.15/abc/test1.txt',
    'HTTPS://21.105.28.15/abc/test1.txt',
    'MyOwnUselessURL://xyz.de/abc/test1.txt',
)


for url in urls:
    print(url2path(url))

Boring output:

Output:/abc/test1.txt
/abc/test1.txt
/abc/test1.txt
/abc/test1.txt
/abc/test1.txt

Documents you should read:

kakocer · (This post was last modified: Jan-21-2025, 08:58 AM by Larz60+.)

mystring = 'FTP://21.105.28.15/abc/test1.txt'
mystring2 = 'FTP://21.105.28.15/abc/def/ghi/test1.txt'

# Get the index of the third /
def getFirst(url):
    count = 0
    for i, char in enumerate(url):
        if char == '/':
            count += 1
            if count == 3:
                return i
    return -1  # Return -1 if there's no third '/'

# Get the index of the last /
def getLast(url):
    return url.rfind('/')  # Use rfind to get the last occurrence of '/'

# Extract the substring between the third and the last '/'
def extract_substring(url):
    first_index = getFirst(url)
    last_index = getLast(url)
    if first_index != -1 and first_index < last_index:
        return url[first_index + 1:last_index]  # Skip the '/' itself
    return ""

# Extract and print the portions of the URLs
wanted1 = extract_substring(mystring)
wanted2 = extract_substring(mystring2)

print(wanted1)  # Output: abc/test1.txt
print(wanted2)  # Output: abc/def/ghi/test1.txt

I'm using this and it work for me.

Larz60+ write Jan-21-2025, 08:58 AM:
Please post all code, output and errors (it it's entirety) between their respective tags. Refer to BBCode help topic on how to post. Use the "Preview Post" button to make sure the code is presented as you expect before hitting the "Post Reply/Thread" button.
Tags have been added. Please use BBCode tags on future posts

Gribouillis write Jan-20-2025, 08:23 PM:
Clickbait link removed. Please read What to NOT include in a post

Keville_35 · Jan-21-2025, 05:08 AM

I define an "extract_path" function that uses a regular expression to find the part of the URL after "/abc/" and extract it.

import pymssql
import re

def extract_path(url):
    # Use regex to find the part after "/abc/"
    match = re.search(r'/abc/(.*)', url)
    if match:
        return match.group(1)
    return None

# Database connection
conn = pymssql.connect('10.21.100.21', 'sa', 'abc', 'DBA', as_dict=False)
cur = conn.cursor()

# Execute the query
cur.execute('SELECT path FROM table')

# Fetch all results
fetch = cur.fetchall()

# Process and print the results
for row in fetch:
    ftp_url = row[0]
    extracted_path = extract_path(ftp_url)
    if extracted_path:
        print(f"Original URL: {ftp_url}")
        print(f"Extracted path: {extracted_path}")
        print("---")

# Close the connection
conn.close()

This is working at your end?

DeaD_EyE · Jan-21-2025, 12:48 PM

Solving this kind of problem with regex is brute-force. Sometimes it's good to take a step back and ask, if you can solve the problem differently.

michaeljordan · (This post was last modified: Feb-13-2025, 07:59 AM by buran.)

(Jan-21-2025, 12:48 PM)DeaD_EyE Wrote: Solving this kind of problem with regex is brute-force. Sometimes it's good to take a step back and ask, if you can solve the problem differently.

If it really works, there is no reason to refuse.

buran write Feb-13-2025, 07:59 AM:
Spam link removed

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	[split] Parse Nested JSON String in Python	mmm07	4	2,929	Mar-28-2023, 06:07 PM Last Post: snippsat
	parse String	jaykappy	2	1,592	Dec-23-2022, 07:42 AM Last Post: praveencqr
	Parse String between 2 Delimiters and add as single list items	lastyle	5	4,697	Apr-11-2021, 11:03 PM Last Post: lastyle

How do I parse the string?

User Panel Messages

Announcements