Python Forum

Full Version: Python 3.6.5 pathlib weird behaviour when resolve a relative path on root (macOs)
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi,

I have recently found a weird behaviour while trying to resolve a relative path located on the root directory on a macOs.

I tried to resolve a Path('spam') and the interpreter answered PosixPath('//spam')double slash for root— instead of (my) expected PosixPath('/spam').

I really do not know if this is the intended result or a bug.

I ran the interpreter from root directory (cd /; python). Once running the interpreter, this is what I did:

>>> import pathlib
>>> pathlib.Path.cwd()
PosixPath('/')
# since the interpreter has been launched from root
>>> p = pathlib.Path('spam')
>>> p
PosixPath('spam')
# just for checking
>>> p.resolve()
PosixPath('//spam')
# beware of double slash instead of single slash
I also checked the behaviour of Path.resolve() in a non-root directory (in my case launching the interpreter from /Applications).

>>> import pathlib
>>> pathlib.Path.cwd()
PosixPath('/Applications')
>>> p = pathlib.Path('eggs')
>>> p
PosixPath('eggs')
>>> p.resolve()
PosixPath('/Applications/eggs')
# just one slash as root in this case (as should be)
So it seems that double slashes just appear while resolving relative paths in the root directory.

More examples are:
>>> pathlib.Path('spam/egg').resolve()
PosixPath('//spam/egg')
>>> pathlib.Path('./spam').resolve()
PosixPath('//spam')
>>> pathlib.Path('./spam/egg').resolve()
PosixPath('//spam/egg')
but

>>> pathlib.Path('').resolve()
PosixPath('/')
>>> pathlib.Path('.').resolve()
PosixPath('/')
Intriguingly,

>>> pathlib.Path('spam').resolve().resolve()
PosixPath('/spam')
# 'spam'.resolve = '//spam'
# '//spam'.resolve = '/spam'!!!
>>> pathlib.Path('//spam').resolve()
PosixPath('/spam')
I searched for some information on this issue but I did not found anything useful.

Python docs (https://docs.python.org/3/library/pathlib.html) talks about "UNC shares" but this is not the case (in using a macOs HFS+ filesystem).

PEP 428 (https://www.python.org/dev/peps/pep-0428/) says:

Quote:Multiple leading slashes are treated differently depending on the path flavour. They are always retained on Windows paths (because of the UNC notation):

>>> PureWindowsPath('//some/path')
PureWindowsPath('//some/path/')

On POSIX, they are collapsed except if there are exactly two leading slashes, which is a special case in the POSIX specification on pathname resolution [8] (this is also necessary for Cygwin compatibility):

>>> PurePosixPath('///some/path')
PurePosixPath('/some/path')
>>> PurePosixPath('//some/path')
PurePosixPath('//some/path')

I do not think that this is related to the aforementioned issue.

However, I also checked the POSIX specification link (http://pubs.opengroup.org/onlinepubs/009...#tag_04_11) and found:

Quote:A pathname that begins with two successive slashes may be interpreted in an implementation-defined manner, although more than two leading slashes shall be treated as a single slash.

I do not really think that this can cause a double slashes while resolving a relative path on macOs.

Is this the proper behaviour of pathlib.Path.resolve() method? Is this a bug? Am I missing something about filesystems, filenames, etc.?

Thank you in advance.

My software is:

Python 3.6.5 (default, May 15 2018, 08:20:57)
[GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.1)] on darwin

Running on: macOs High Sierra 10.13.4 (17E202)
I have had some weird behavior on windows with resolve
i wanted a path:
defaultDir = self.wpath.commandpath.resolve()
but couldn't get a path that the class method I was calling was happy with.
I ended up using what seems more like a hack, and this is what made the method happy:
defaultDir=self.wpath.commandpath.resolve().as_posix()
Hi,

I have gotten same results (double slash) in Python 3.4, Python 3.5, and Python 3.7

Specifically, I used:

Python 3.4.8 (default, Mar 29 2018, 16:18:25)
[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.39.2)] on darwin

Python 3.5.5 (default, Mar 29 2018, 16:22:58)
[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.39.2)] on darwin

Python 3.7.0b4 (default, May 4 2018, 22:01:49)
[Clang 9.1.0 (clang-902.0.39.1)] on darwin


(using 'bin' —a real directory— instead 'spam' in 3.4 and 3.5 because the strict parameter is not available in resolve() method in that versions)

Also I have gotten the same result in Ubuntu 16.04 (so not macOs specific issue; but maybe POSIX issue):

Python 3.5.2 (default, Nov 23 2017, 16:37:01)
[GCC 5.4.0 20160609] on linux
I have checked in Linux (Opensuse tumbleweed, python 3.6.5) and the result is the same...
And looks like a bug if you look at the pathlib code of resolve:
    def resolve(self, path, strict=False):
        sep = self.sep
        accessor = path._accessor
        seen = {}
        def _resolve(path, rest):
            if rest.startswith(sep):
                path = ''

            for name in rest.split(sep):
                if not name or name == '.':
                    # current dir
                    continue
                if name == '..':
                    # parent dir
                    path, _, _ = path.rpartition(sep)
                    continue
                # killerrex Comment:
                # HERE if base is '/' the code will produce '//' + name
                # for any other thing produces the correct one
                newpath = path + sep + name
                if newpath in seen:
                    # Already seen this path
                    path = seen[newpath]
                    if path is not None:
                        # use cached value
                        continue
                    # The symlink is not resolved, so we must have a symlink loop.
                    raise RuntimeError("Symlink loop from %r" % newpath)
                # Resolve the symbolic link
                try:
                    target = accessor.readlink(newpath)
                except OSError as e:
                    if e.errno != EINVAL and strict:
                        raise
                    # Not a symlink, or non-strict mode. We just leave the path
                    # untouched.
                    path = newpath
                else:
                    seen[newpath] = None # not resolved symlink
                    path = _resolve(path, target)
                    seen[newpath] = path # resolved symlink

            return path
        # NOTE: according to POSIX, getcwd() cannot contain path components
        # which are symlinks.
        base = '' if path.is_absolute() else os.getcwd()
        return _resolve(base, str(path)) or sep
The 2 solutions I can think are:
Change the last lines to:
        base = '' if path.is_absolute() else os.getcwd()
        if base == sep:
            base = ''
        return _resolve(base, str(path)) or sep
So we guarantee that _resolve will never start with a single '/' in the path argument or change the resolve_ code so it will never create a false '//' group:
                if path.endswith(sep):
                    newpath = path + name
                else:
                    newpath = path + sep + name
It can be also that there is some obscure corner in the posix specification that explicitly request this behaviour...

In any case, I think that you can open a bug about this.
Hi, again:

I have traced the issue to Lib/pathlib.py:319 in the Python 3.6 repository https://github.com/python/cpython/blob/3...pathlib.py.

In line 319:

newpath = path + sep + name
For pathlib.Path('spam').resolve() in the root directory, newpath is '//spam' since:
  • path is '/'
  • sep is '/'
  • name is 'spam'

I think that it should be checked taht path and sep are not equal. Something like:

newpath = ('' if path == sep else path) + sep + name
So, in my opinion, this behaviour of pathlib.Path.resolve() —returning a path starting with '//' (double slash) when resolving a relative path in root directory— is a bug. Am I right?

Thanks, @killerrex. I do not see your post before sending mine. I was doing the same research as you almost at the same time.

I agree with you. The two solutions that you have proposed seems very reasonable to me. I will open a bug as soon as possible. Thanks again.
Hi,

I think also is a bug and my solution is more or less the same (I am a little bit more paranoid and check when the path ends in separator, no mater the length)

If you do not want to create an account in the python tracker I can open the bug for you referring to these posts... as you wish.
pathlib is good, but there is still some pain in using. Need some tender loving care to make it reliable.
Thanks, I have already opened a bug about this.

In general, I find that pathlib is a great advance. I use it a lot, in every script I write. Of course, it is a new package and it have some quirks and bugs. 99.99% of times it works wonderfully.