After using starting to use the shader registry some of our unit tests started timing out on windows.
What appears to be happening is the tbb threadpool is being shut down in an unclean way as part of the process being torn down leaving various off-brand mutexes related to thread pools and task pools in tbb in locked state. After all this stage cache in usdShade::UsdShaderDefParser is being destroyed and UsdStage::_Close makes heavy use of tbb and deadlocks because it tries to get access to resources that are incorrectly flagged as locked.
Our current workaround is adding an explicit clear call to UsdShaderDefParserPlugin but it seems fragile and easy to forget. Has anyone else seen something like this, or can think of a more automatic solution to avoid this issue?
My comments are not tbb related, and I’m not a tbb expert, but fwiw, OpenEXR encountered shutdown deadlocks in its threadpool with semaphore destruction on windows when the thread is killed before the semaphore signals ~ that pattern yields a freezes in the kernel in ntdll!NtWaitForSingleObject. OpenImageIO encountered the same issue in its threadpool due to a condition variable, cv.notify_all(), which deadlocks when the cv attempts to communicate with the now destructed threads in the threadpool.
This sounds an awful lot like what you are encountering. In both EXR & OIIO the solution involved adding bookkeeping on the threadpool and omitting the notify_all calls if the threads in the threadpool were already shut down.
eg
Capturing the stack in both cases led to a solution, you can see OpenEXR’s problem explicitly here
I’m not linking the OpenEXR solution because it required a lot more surgery than OIIO’s, and OIIO’s is perhaps instructive.
I’m wondering if you might be able to capture stacks during the time out, which I am guessing might be similar scenarios as I’ve described?
Since I have filed the bug back in April we have ran into more instances of this problem. Most recently we ran into the issue with the Maya SDK (DependEngine.dll) calling into TBB during shutdown:
ntdll.dll!NtDelayExecution() Unknown
> ntdll.dll!RtlDelayExecution() Unknown
KERNELBASE.dll!SwitchToThread() Unknown
[Inline Frame] tbbmalloc.dll!tbb::internal::atomic_backoff::pause() Line 367 C++
[Inline Frame] tbbmalloc.dll!AtomicBackoff::pause() Line 98 C++
tbbmalloc.dll!rml::internal::BackendSync::waitTillBlockReleased(__int64 startModifiedCnt) Line 305 C++
tbbmalloc.dll!rml::internal::Backend::askMemFromOS(unsigned __int64 blockSize, __int64 startModifiedCnt, int * lockedBinsThreshold, int numOfLockedBins, bool * splittableRet, bool needSlabRegion) Line 694 C++
tbbmalloc.dll!rml::internal::Backend::genericGetBlock(int num, unsigned __int64 size, bool needAlignedBlock) Line 835 C++
[Inline Frame] tbbmalloc.dll!rml::internal::Backend::getSlabBlock(int) Line 872 C++
tbbmalloc.dll!rml::internal::MemoryPool::getEmptyBlock(unsigned __int64 size) Line 1006 C++
tbbmalloc.dll!rml::internal::internalPoolMalloc(rml::internal::MemoryPool * memPool, unsigned __int64 size) Line 2587 C++
tbbmalloc.dll!scalable_aligned_malloc(unsigned __int64 size, unsigned __int64 alignment) Line 3097 C++
tbb.dll!tbb::internal::NFS_Allocate(unsigned __int64 n, unsigned __int64 element_size, void * __formal) Line 190 C++
[Inline Frame] DependEngine.dll!tbb::cache_aligned_allocator<char>::allocate(unsigned __int64) Line 82 C++
DependEngine.dll!tbb::strict_ppl::concurrent_queue<int,tbb::cache_aligned_allocator<int>>::allocate_block(unsigned __int64 n) Line 44 C++
DependEngine.dll!tbb::strict_ppl::internal::micro_queue<int>::push(const void * item, unsigned __int64 k, tbb::strict_ppl::internal::concurrent_queue_base_v3<int> & base, void(*)(int *, const void *) construct_item) Line 224 C++
[Inline Frame] DependEngine.dll!tbb::strict_ppl::internal::concurrent_queue_base_v3<int>::internal_push(const void *) Line 472 C++
[Inline Frame] DependEngine.dll!tbb::strict_ppl::concurrent_queue<int,tbb::cache_aligned_allocator<int>>::push(const int &) Line 133 C++
[Inline Frame] DependEngine.dll!em::detail::EventFunctionSet<std::function<void __cdecl(Tmetaclass * const &)>>::remove(int index) Line 291 C++
DependEngine.dll!em::detail::EventHub<std::function<void __cdecl(Tmetaclass * const &)>>::disconnect(int index) Line 357 C++
[Inline Frame] EvaluationManager.dll!em::EventHandler::disconnect() Line 45 C++
EvaluationManager.dll!em::EventHandler::~EventHandler() Line 39 C++
DependEngine.dll!TnodeTypeFilter::~TnodeTypeFilter() Line 32 C++
DependEngine.dll!TcustomEvaluator::~TcustomEvaluator() Line 65 C++
[Inline Frame] DynSlice.dll!em::CustomEvaluator::Registration<TdynamicsEvaluator>::{dtor}() Line 71 C++
DynSlice.dll!`anonymous namespace'::`dynamic atexit destructor for '_registration''() C++
ucrtbase.dll!<lambda>(void)() Unknown
ucrtbase.dll!__crt_seh_guarded_call<int>::operator()<<lambda_7777bce6b2f8c936911f934f8298dc43>,<lambda>(void) &,<lambda_3883c3dff614d5e0c5f61bb1ac94921c>>() Unknown
ucrtbase.dll!_execute_onexit_table() Unknown
DynSlice.dll!dllmain_crt_process_detach(const bool is_terminating) Line 182 C++
DynSlice.dll!dllmain_dispatch(HINSTANCE__ * const instance, const unsigned long reason, void * const reserved) Line 293 C++
ntdll.dll!LdrpCallInitRoutine() Unknown
ntdll.dll!LdrShutdownProcess() Unknown
ntdll.dll!RtlExitUserProcess() Unknown
kernel32.dll!ExitProcessImplementation() Unknown
ucrtbase.dll!exit_or_terminate_process() Unknown
ucrtbase.dll!common_exit() Unknown
Spot fixing all these cases isn’t feasible and I can’t modify the Maya code anyway, so I resorted to calling TerminateProcess on the process itself when exiting so that the global destructors are not run:
static int ExitWithoutGlobalDestructors( int exitCode )
{
TerminateProcess( GetCurrentProcess(), exitCode );
// Note: This function doesn't return but we need to have return statements in main and this helps with code flow.
return exitCode;
}
...
int main( int argc, char** argv )
{
...
return ExitWithoutGlobalDestructors( 0 );
}