Hello everyone, I’m trying to find out if a layer has changed by looking at its usdc data, but I’m finding that for the same content (the same usda) I can get two different usc data.
I’ve put up this little script that creates two layers that have the same prim. If I clear the second layer and recreate the same prim and save it I get two different hashes although the content of the two layers is the same.
import hashlib
from pxr import Sdf
filepath1 = 'file1.usdc'
layer1 = Sdf.Layer.CreateNew(filepath1)
Sdf.CreatePrimInLayer(layer1, '/world')
layer1.Save()
with open(filepath1, 'rb') as f:
data = f.read()
print(layer1.ExportToString())
print(data)
print(hashlib.md5(data).digest())
filepath2 = 'file2.usdc'
layer2 = Sdf.Layer.CreateNew(filepath2)
Sdf.CreatePrimInLayer(layer2, '/world')
layer2.Save()
layer2.Clear()
Sdf.CreatePrimInLayer(layer2, '/world')
layer2.Save()
with open(filepath2, 'rb') as f:
data2 = f.read()
print(layer2.ExportToString())
print(data2)
print(hashlib.md5(data2).digest())
If I don’t Clear() and CreatePrimInLayer() and Save() again the hashes are identical.
Looking at the documentation the Clear() method is undoable, so does it store extra data or timestamp info ?https://openusd.org/release/api/class_sdf_layer.html#a9013e716d1676f98b48ab913031e6d01
Crate files don’t necessarily store their data in the exact same order every time. I think you can get lucky and they might match up, but as far as I know, there’s no such guarantee in the crate writer.
As for your specific example, I think it’s small enough that it should be identical. I’ll look into it because I’m curious, but again, I don’t think you should expect repeatable file layout.
Right – in addition to padding bits we use hash tables in the .usdc implementation, and don’t take pains (or the perf cost) to sort everything when we write, so just doing usdcat on a .usdc file is liable to produce results that differ bitwise, but not content-wise.
Even if we did ensure that usdcat with a .usdc always produced bitwise-identical results, you can still run into trouble because an incremental Save() of a .usdc does not in general rewrite the whole file. It’s sort of “journaled”. So an edited and Save()d .usdc file would still differ bitwise from that same file run through usdcat.
I think if we want content fingerprint hashing, it would be best to do it at the SdfLayer level, so that it would work automatically and identically for every file format that USD understands, and would avoid any issues with .usdc or any other “database-esque” formats. And you could do things like freely flip/flop your .usd file between text and binary without changing the content fingerprint hash.
Of course that’s a small project. Today I think the best way to get consistent results is to usdcat the .usd file to .usda and hash the output.
Definitely – usddiff plus usdedit is almost there – usdedit converts your layer to .usda, pops you into your editor of choice, and when you’re done writes it back in the original format, so you don’t have to think about what type of file you’re dealing with.
A USD-specific merge tool would be interesting, since it could operate not only at the bare textual level, but also at the higher structural content level. Something like a user-guided selective “flattening” where the prim hierarchy and listops and dictionaries and so on are understood.
Yeah exactly. I think being able to do it at a structural level would mean that you’d not be dependent on ordering of prims for textual diffs, and also be able to handle crate files.
We’ve done similar for other structural data types in our pipelines and its super handy.