The sort of thing you’re talking about will be facilitated by the “second mode” spiff talks about above.
To do what you are talking about today, you’d need to maintain your own logic to compose an animation and perform constraining like holding a ball, in your engine. You can still use a USD file to carry the skin bindings and weights, but you’ll need to do that composition yourself.
Encoding a change of scene hierarchy at a particular time, ie. in the USD file, a character catches a ball at 5 seconds, and the ball is therefore reparented, is not something you can do. This would be accomplished by animating a constraint weight from zero to one, but USD doesn’t currently have this concept; constraints will be enabled by OpenExec. This pattern is not the general pattern you need in a game in any case, although it has uses of course.
As an example game to illustrate why, a ball player has two hands. the game has eight balls, and twelve players. It’s not practical or efficient at runtime to have all the possible constraints encoded in the scene. In this case, everything would be tied up in game state logic ~ balls are either in free flight, or held in a particular hand by a particular character; and your engine would be responsible per frame to either evolve the ball’s position by physics, or to teleport it to the player’s hands.
You could also parent and unparent the ball to the hand, given a joint hierarchy, of course. But that is bending the use case of UsdSkel to the breaking point.
There are two concerns at work here, which are superficially related to the degree that it seems like UsdSkel solves for it.
One concern is encoding the data required by linear-blend skinning. The second concern is a hierarchy of joints meant to pose the character. Although these are commonly the same in games, they are equally commonly not the same. For example, a skin-bind skeleton may have a great many bones that are unrelated to posing. For example, there may be a computed secondary bone whose purpose is to prevent a shoulder shape from collapsing. This bone is necessary for LBS, but it is not something an animator will pose. Similarly, there may be a ragdoll hierarchy for physics. This may have twenty crucial bones for posing, and not the hundred bones in the character. At runtime, the game engine constrains the LBS skeleton to the ragdoll skeleton, and also, the game engine computes in some manner the locations of the 80 other bones not in the ragdoll.
All of this is to say, that when we get to a joint based skeleton, we should also consider the need of constraining an animator’s posing rig skeleton (which likely has additional non skin-bound joints) to an LBS skeleton, or constraining a sparse ragdoll to an LBS skeleton, presumably via OpenExec. At that point USD will have ways to describe game-like functionality.