We had an initial chat on the team about this last week, and came up with an ordering of actions we’d like to pursue, if this makes sense to y’all:
We have some new ideas to pursue on increasing performance in dealing with large numbers of sibling prims and properties, and if they pan out, it would benefit everyone. TO this end, can you all get specific about where you’re having problems, what the general patterns are (i.e. using schemas, basic UsdObject API’s, low-level Sdf). Also, on the reading/consumption side, we believe the only thing that should be slow for large numbers of properties is enumerating properties (e.g. GetAttributes(), GetPropertiesInNamespace()) - if that’s not your experience, can you describe what’s demonstrably scaling poorly?
Adding a metadata type dictionary[] that would be embeddable inside other dictionary metadata. Seems like this would go fairly far in facilitating the approach @CalvinGu is taking, which still suffers from the inability to easily describe/encode the structure of the data, but otherwise is pretty flexible
Provide dictionary and dictionary[]-valued attributes. Our undertanding currently is that the main thing this provides over 2 is just the ease of creating such attributes, since new dictionary-valued metadatum fields need to be declared in plugInfo.json’s, and it’s not awesome for discoverability that Calvin is needing to stick all complex data into customData right now… I almost hate myself for suggesting this but even having just a single canonical piece of metadata (like structuredData or something) could still be “organized” into schemas by adding opaque-type attributes whose purpose is to name and host the dictionary that contains the complex data. Dictionary-valued attributes are the most complicated solution on the table because it brings in timeSamples and dictionary-style value resolution through them… not to mention code-sites that may get tripped by having a new datatype to handle.
@CalvinGu , do you recall what your performance testing revealed for the “encode as many USD properties” approach? Were the issues all on the authoring side, or did you see poor performance on the consumption side? If the latter, during import into a particular DCC, or imaging in usdview, or… ?
Just hoping to get a better idea of where the problems folks are actually running into are - thanks!
Hey Spiff, happy 2025, and sorry for my late reply. There were too many meetings in the past week.
I just wrote two simple Python scripts to test the performance. I attached them to this thread, and here is the test result:
set 100000 customdata: 0.16260623931884766
set 100000 attributes: 1.5727198123931885
save 100000 customdata: 0.1886909008026123
save 100000 attributes: 0.4987778663635254
load 100000 customdata: 0.23992586135864258
load 100000 attributes: 0.47798800468444824
get 100000 customdata: 0.12299871444702148
get 100000 attributes: 0.40347838401794434
The result is obvious and become one of the reasons that we go through the way of using custom data.
To simplify our requirements, you can think that we need JSON-type data embedded in USD. In fact, in another game project of Tencent, they already use dumped JSON strings in their USD files. The dumped JSON strings can also solve all of our problems. The reason I didn’t go for this was because I wanted to make our USD file more human-readable, as well as being able to be more community-compliant.
I understand that support dictionary and dictionary[] in attributes can be complicated since you have to deal with time samples and more. So far, the custom data solution can fulfill our needs, and there is no rush for the attributes. However, each of those three actions you proposed is very attractive to us. If you do complete any of them, please let us know and we would be more than happy to test them out in our projects.
Thanks, Calvin - that’s extremely useful! While we can and will improve the scalability of authoring and enumerating properties in OpenUSD, there is always going to be a sizeable gap in performance between “Put N elements into a monolithic dictionary and serialize/deserialize the dictionary into a Layer as a unit” and “Put N elements as individually encoded specs into a Layer” - the latter will necessarily be much more expensive, even if we plumb batching API’s all the way down.
I expect that if you pulled elements out of the dictionary-encoded-in-the-layer using deeper identifiers in GetCustomDataByKey() you’d get similar or maybe worse performance than the Attribute encoding… but if you can always afford to pull the entire dictionary at once, then that will always be the highest performance solution.