In this recent GitHub Issue, it’s been noted (not for the first time) that the decision we made for TfNormPath to lowercase drive-letters on Windows, when upper-case letters are idiomatic (even though case is unimportant on Windows itself), sometimes causes problems when other parts of a pipeline make the idiomatic assumption.
If we were to reverse course on that decision, we don’t think OpenUSD itself would be greatly impacted, but we want to put out a call to the community to see if it would adversely impact conventions/assumptions you’ve built up based on USD’s unusual decision to lowercase.
reactions interpreted as OMG, please do this! Though if you have concerns, a post with why would be preferrable to a
Is the idea to not perform any case changes whatsoever? Or is the idea to uppercase the character instead of lowercase to be more along the Windows standard for drive letters?
I believe you’re proposing the first to not change the casing of the drive letter at all - which I think is a good way forward.
Python as a reference
Just wanted to share this as reference on what Python does. In my case Python 3.9.13.
Python 3.9.13 (tags/v3.9.13:6de2ca5, May 17 2022, 16:36:42) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
Real path I believe actually resolves the actual window path completely - and since my C:/ drive is an uppercased letter it will uppercase it. Which is confirmed when using os.path.realpath on a drive I do not have:
>>> print(os.path.realpath("f:/test"))
f:\test
As such, matching ‘path normalizing’ to the Python ecosystem would basically mean leaving the drive letter casing as it is.
Actually, @BigRoyNL , the proposal is to always uppercase the drive letter rather than always lowercase. The USD clients of TfNormPath are calling it in large part so that two paths that refer to the same location will (obviously without knowing or being able to do anything itself about symlinks and other redirections) compare/hash the same, per comment here. I’ll grant it’s a half-measure on Windows since the entire path is case-insensitive… and I don’t know what the real concerns or test-cases behind the original decision were.
That’d technically mean that for the time you’re authoring across different DCCs with potentialy differing versions you’ll have a time where this ‘cache’ will be missed often in the tables due to one still writing out with lowercased drive if older version of USD in the DCC - and one with uppercased drive if newer USD version. Same would go for legacy assets maybe used in a pipeline with newer ids?
Or doesn’t it really matter what casing was authored in the USD file on disk since the normalizing happens on read anyway?
After having a team chat about it, it seems like the desire for key-coherency was just speculative, and we’re leaning towards adopting the python behavior - just forget TfNormPath() ever knew anything about drive letters.
It looks like TfRealPath()also does the same thing, currently, after reading/resolving the path through the OS, so we’ll stop doing the “correction” there, also, which I think should bring us into alignment with python.os in both instances.
The question is what does os.path.normpath and similar do when the input string does not contain a drive letter? Does it add a capital one or a lowercase one?
Note that even though my working directory is set with a lowercased drive letter that os.path.realpath actually resolves it to the actual drive letter that is uppercased.
Similarly, because Windows path are case-insensitive, if I create a file C:\Test.txt with uppercase T but look for it with os.path.realpath it resolves it:
>>> os.path.realpath("test.txt")
'C:\\Test.txt'
But again, os.path.normpath does no such thing as checking the actual disk and hence does not resolve the uppercase T for Test.txt - nor does it do so for the drive letter:
Was this workaround there just because using uppercased drive letters for loading USD files just did not work at all? Or what did doing that solve for you?
The context of why you needed to do that might help defining whether the proposed changes here would influence that (negatively or positively).
Side question: Who’s we in that sentence? I assume Autodesk Bifrost team?
Good question. Upercase letter was working actually (artist could load a USD file on Windows), but doing some path manipulation logic was broken on Windows (for example, replacing a full path in a reference to a relative to an anchor patch one), because the function was comparing a “real” Windows file path with a “USD Windows file path” and their drive letter was not matching.
So not directly a USD issue, more a client querying USD data one.
Heh - was gonna keep my big nose out of this one… but since there’s still some noise in the thread:
…some path manipulation logic was broken on Windows (for example, replacing a full path in a reference to a relative to an anchor patch one), because the function was comparing a “real” Windows file path with a “USD Windows file path” and their drive letter was not matching.
For me, this is most frequent reason I use things I use functions like TfNormPath - to do path comparisons. While the use as keys in hashtables may be speculative, this use case isn’t - and therefore I think it’s good to standardize on a convention for drive letter. Conceptually, that’s why you would want to “normalize” a path - to make a standard, canonical choice in situations where there are multiple way of representing something. Upper or lower-casing of drive letter is a good example of that, in my opinion.
You can make the case that, if we’re going to normalize case for the drive letter, we should normalize the case for EVERYTHING in a windows path, (ie, do like python’s os.path.normcase) - but I think there are some good reasons why the situation is different:
the windows drive letter is essentially a windows-only construct, and so won’t affect interoperability with posix systems
in contrast, if you normalized all case in a path, you might alter a path in a git repo used on both Windows and Posix
I believe the “real” path windows drive letter is essentially ALWAYS uppercase - so you can know the “correct” / “canonical” path even for paths which don’t exist on disk
while windows compares paths case-insensitively, it nearly always stores them case sensitively - meaning that there is generally a “true” / “real” case for files which exist on disk - and, given the cross-platform issues noted above, this is the casing which would ideally be used as the “canonical” version in most cases, IMO. However, it is impossible to determine what case to use for a general path that doesn’t exist on disk in windows - and so the “safe” choice is to leave it alone.
@BigRoyNL - are there some specific scenarios / workflows you had in mind that break if we standardize on drive-letter case, or are allowed if we don’t? Or is the primary motivation to do so standardization of behavior with python?
No, sorry. We don’t have any issues with the current or newer proposed behavior - we don’t rely on it being one or the other currently. I was merely responding on it using Python as a reference - which seemed like a good candidate to mention since it’s been around a long time, seems well-defined behavior and is greatly used in VFX/animation so I just assumed it would be a good reference.
However, going from the Github issue this ‘change request’ originated from it seems they have some ‘relying’ on case sensitivity even from Windows paths on how they use these paths with remapping on their Linux machines. As such, I can imagine there may be some argument to make for allowing it to be both lowercased/uppercased or some other mix, potentially including that for the drive letter as well.
But, purely from my perspective and use cases I’m not too bothered with either. (As long as things work - hehe - but more specifically that existing files will continue to work as is.)
Yes it answers the question. The os.path.realpath("test.txt") one shows that when it creates a drive letter, it is uppercase. IMHO this behavior should be duplicated in any similar calls, and apparently the current version creates lower-case drive letters.
Preserving the existing case of drive letters actually seems like a good idea, as it matches the preserving of the case of filenames even if they don’t match the actual case.