Hi there, I am working on integrating USD/hydra into an application via the provided c++ libraries, to visualise and edit a scene, made out of certain 3d-assets. That scene is created via the USD library in memory using DefinePrim. We are using a couple of UsdGeomXForms for each asset and it is still in a very early state of development. We are still learning a lot.
One critical issue we are facing with USD currently is, that changing the stage structure (reparenting, creating, deleting prims) is incredibly slow. I tested a creation of 90k prims in the root layer via UsdGeomXForm::Define(), do some heavy reparenting on the nodes and then delete them. That takes nearly 2 Minutes on my machine compared to our old render engine, which is 1000 times faster.
I already optimised the reparents with batching (collect reparents until we need to query into the stage with SdfBatchNamespaceEdit and then execute them with Stage::Apply()). I know, that I also should do this with the Defines (use the Sdf functions in conjunction with an SdfChangeBlock). However, I saw the execution of apply() of the collected heavy reparentings takes still seconds (around 8 seconds). That is way to much for 10k reparentings and we need to reparent and do stage graph changes frequently. I profiled this and saw that the time is spent in UsdStage::_Recompose(), to recompose probably the complete scene, which takes 8 seconds.
I already saw some old and newer posts, which suggest, that stage recomposing is basically slow and we should aim for trying not to change the stage frequently. However, in our case we need to change the stage frequently in realtime.
I learned from another thread, that Nvidia and Pixar are already using some custom implementations to handle performance problems like this for example with Nvidia UsdRT and Fabric. But this doesn’t seem to be freely available for integration outside of Omniverse. So I guess the solution for us is either to wait for an official USD integration of UsdRT/fabric or similar optimisations or do one by our self. As we currently only have few men power in this project we aim to get around the second option.
So it would be good to know, if there will be an integration in the near future, so we can just accept the bad performance for now and optimise later, when it is available.
It would also be good to know, if we are missing something essential here and there is a solution to our performance problem we didn’t see yet.
Also: Are there some numbers on what performance we can expect from reparenting/creating/deleting prims with USD? I only saw one thread with a guy from VFX who wrote that recomposing takes 30 seconds for their stage (probably millions of prims).
Kind regards,
Robert