Batching Ar2 queries

Hey folks,

The topic of batching Ar 2 queries has surfaced quite a few times in recent discussions. It usually arises when an asset management system may have non-trivial work to do in _Resolve. This is often in the form of a network request, where combining several queries in one doesn’t significantly increase the overall latency.

I bumped into @nporcino at SIGGRAPH and an idea surfaced - in that there may be potential for collecting refs during Pcp and batching those requests in a single call.

What are peoples thoughts - Is there any milage in this, and would this be beneficial for others?

Many thanks in advance,
Tom

4 Likes

Hey Tom,

thanks for your Siggraph OpenAssetIO meeting, it was a fun and insightful discussion!

Something similar was mentioned a while ago in the USD Asset Resolver Python Post, where you also commented.

I think it would be a cool thing to have in some way. I’d guess that trying to add this would probably be a bit messy in regards to mulithreading and would cause a performance hit that might be slower then doing separate queries? I’d love to hear too what others think about this topic too.

What is already possible now is doing what ColinE suggested in the above post: When your pipeline already knows most paths upfront it can pre-cache these and store them (in its asset resolver context). If we mix that approach with relative paths, we can really reduce the amount of paths to query.

I didn’t get around to it in the last weeks, but I’ll add another resolver (a Hybrid/Cached Resolver) soon to my UsdAssetResolver repo here (GitHub - LucaScheller/VFX-UsdAssetResolver: Usd Asset Resolver Reference Implementations) for people to play around with in Houdini (The repo now has automatic builds against the latest Houdini version on Linux and Windows :partying_face: ). It will do the above mentioned approach with a Python/C++ combination. Maybe that will help give ideas too :wink:

Cheers,
Luca

1 Like

Hi @foundrytom!

I think it’s an interesting idea but would be pretty difficult to implement in the USD core because of how USD parallelizes prim composition: each thread works on a single prim and is (more-or-less) independent from the other threads. We might be able to take advantage of batching in cases where the batches are easy to discover, like a layer’s sublayers or if a prim references multiple layers, and I’d be curious if folks thought this might be enough of a win to pursue.

I was thinking about ways that one could handle this outside the core and purely within an ArResolver implementation. One idea I had was to have an ArResolver implementation where a call to Resolve would record the asset path and then start a timer and block. Additional calls to Resolve from other threads would also record those paths and block, and then when the timer expires all of the collected paths would be resolved as a batch and the calls to Resolve would unblock. However, @amohr pointed out that something like this could run into cases where this would be a net loss in performance vs. just doing each resolve call individually.

At Pixar, our internal resolver is filesystem-based, but it has behavior where, if someone asks to resolve /dir1/dir2/foo.usd, it’ll cache the list of files in /dir1/dir2 instead of just looking for the requested file, since in our asset layout subsequent resolve calls will very often be looking for files in the same directory. This sounds kind of similar to what @LucaScheller and ColinE mentioned – taking advantage of external knowledge to guide a caching scheme.

One thing that could make sense is some kind of futures based async setup. It doesn’t necessarily handle batching but it does mean that resolver calls don’t block and could be mapped to N threads (including single threaded).

It would require some kind of polling event loop, but it would open up optimizations for non-local resolves over a network.

For example , a common pattern for async like in rust is that the async object registers a Future/Promise. The event loop then checks futures for status updates to do callback’s like continuing PCP after the fact.

It could also allow batching of composition updates so groups of resolves can queue up the pcp continuation. As well, you’d be able to time out resolves too.

It does have a slight overhead in needing a thread to run the event loop, but the executor loop can be done in the same thread as a worker thread as well.

Futures/promises is something I’d like to see in USD resolution in the future. I’m happy to put up a formal proposal.

1 Like

Hey folks,

Sorry for the delay in replying.

Thanks Luca, was lovely to see everyone at the meeting :slight_smile:

What is already possible now is doing what ColinE suggested in the above post: When your pipeline already knows most paths upfront it can pre-cache these and store them (in its asset resolver context).

Yeah - this was something many people have said they have had to implement. As you and @sunya mentioned - regardless of any Pcp-level batching possibilities, sounds like you’d still end up with a myriad of different requests for each nested layer anyway. So as you say - it may not save that much vs a single up-front lookup. I did wonder though if it may still result in several orders of magnitude fewer requests if asset structure is amenable, but maybe it’s not valuable enough all told.

I didn’t get around to it in the last weeks, but I’ll add another resolver (a Hybrid/Cached Resolver) soon to my UsdAssetResolver repo here…

Your repo is excellent - many thanks for all the work there! Be great to team up on this if you have any bandwidth. People have expressed a lot of interest in having common implementations of all this. The TSC has setup the OpenAssetIO roadmap to add the many of these things to the core API to save each site having to implement the same mechanisms… (multi-language, re-usable/distributable read-through cache). Be a shame to be working on the same thing in parallel.

One idea I had was to have an ArResolver implementation where a call to Resolve would record the asset path and then start a timer and block. Additional calls to Resolve from other threads would also record those paths and block…

Quite a few studios have told us they’re doing this to make things practically scalable, it’s certainly something we’re looking at having available as a mix-in in the core OpenAssetIO library too.

This sounds kind of similar to what @LucaScheller and ColinE mentioned – taking advantage of external knowledge to guide a caching scheme.

We’d be really interested in learning what re-usable parts we can make to help simplify getting this working against pipeline-specific logic.

Futures/promises is something I’d like to see in USD resolution in the future. I’m happy to put up a formal proposal.

@dhruvgovil would love to chat to you more on this outside of the USD context.

Thanks again for all the input, very much appreciated.

2 Likes

Hey, just a small update to this from my side:
I’ve added a “Cached Resolver” to the USD resolver repo, here is a short description of what it does:

  • Cached Resolver - A resolver that first consults an internal resolver context dependent cache to resolve asset paths. If the asset path is not found in the cache, it will redirect the request to Python and cache the result. This is ideal for smaller studios, as this preserves the speed of C++ with the flexibility of Python.

Small disclaimer: I’m still learning C++, so there might still be some issues hidden in the code, the unit tests and Houdini tests work though. To make it work, I had to kind of hack around the “constant” (pointer) variables in the _Resolve method by calling into Python. I wouldn’t consider this a good practise at all, but it works (for now) :wink: If someone could give me a pointer as to how to better implement it, I’d very welcome it and be happy to change the code.

I’ve also created a separate forum post for the resolver repo here to make it more search visible (and clearer for handling feedback).

Your repo is excellent - many thanks for all the work there! Be great to team up on this if you have any bandwidth. People have expressed a lot of interest in having common implementations of all this. The TSC has setup the OpenAssetIO roadmap to add the many of these things to the core API to save each site having to implement the same mechanisms… (multi-language, re-usable/distributable read-through cache). Be a shame to be working on the same thing in parallel.

Many thanks :slight_smile: I hope it helps especially smaller studios. Not sure yet where the USD asset resolver fits into the whole OpenAssetIO picture, I’d be happy to adapt the cached resolver (or create a clone that is adapted to OpenAssetIO or try in helping out building sth for OpenAssetIO) as soon as things are more standardized on that front. As mentioned above, I’m pretty sure I’m also not following best practises on the C++ side yet, so I think it would be good to kind of keep the two project separate for now and in the future we can take a look again when both have progressed.

Links:

Cheers,
Luca

1 Like

I’ve created an internal task for us to adapt our internal resolver-caching tech to ArDefaultResolver. It addresses the problem that lots of stats/faccess’s is much slower than listing the contents of a directory, for all known filesystems. It also demonstrates the pattern of doing the caching in an ArScopedResolverCache, which we like because you do not need to do a Stage.Reload() or hard-kicking of any ArResolverContexts for it to be able to pick up any changes in the filesystem (e.g. new/moved textures).

I’m hopeful that that would be a drop-in performance improvement for anyone who is currently using or deriving the ArDefaultResolver (because USD composition already uses ArScopedResolverCache extensively), and also hopefully add to the great body of examples that Luca and others have built out!

2 Likes