You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One of the things that the NVIDIA Container Toolkit does is update the ldcache in the container so as to allow applications to discover the host driver libraries that have been injected. We also create (some) .so symlinks to match the files tracked by the driver installation. These point to the SONAME symlinks. For example: libcuda.so -> libcuda.so.1 -> libcuda.so.RM_VERSION. We create the libcuda.so symlinks before we run ldconfig, butlibcuda.so is not present in the ldcache since we rely on running ldconfig to create the libcuda.so.1 symlink. This means that the ldcache in the container once it starts does not match expectations (i.e. the host state).
For example, on a host with the driver installed we have:
This also holds for the "legacy" code path since the symlink chain is only completed by running ldconfig once.
This seems inocent enough, but has the side effect that applications that run dlopen("libcuda.so", RTLD_LAZY); may not find the library if it is not in the standard library path (this could be the case for CDI).
A simple workaround is to inject the update-ldcache hook twice, but we may want to consider a two phase approach where we first run ldconfig with the -N flag to only update the links and then run ldconfig to update the cache.
The text was updated successfully, but these errors were encountered:
A simple workaround is to inject the update-ldcache hook twice, but we may want to consider a two phase approach where we first run ldconfig with the -N flag to only update the links and then run ldconfig to update the cache.
It would also make it more obvious looking at a CDI file why the hook is there twice. the first one has the -N option, the second one doesn't.
One of the things that the NVIDIA Container Toolkit does is update the ldcache in the container so as to allow applications to discover the host driver libraries that have been injected. We also create (some)
.so
symlinks to match the files tracked by the driver installation. These point to theSONAME
symlinks. For example:libcuda.so
->libcuda.so.1
->libcuda.so.RM_VERSION
. We create thelibcuda.so
symlinks before we runldconfig
, butlibcuda.so
is not present in the ldcache since we rely on runningldconfig
to create thelibcuda.so.1
symlink. This means that the ldcache in the container once it starts does not match expectations (i.e. the host state).For example, on a host with the driver installed we have:
In a container:
If we run
ldconfig
in the container we see the following:which matches the host state.
This also holds for the "legacy" code path since the symlink chain is only completed by running
ldconfig
once.This seems inocent enough, but has the side effect that applications that run
dlopen("libcuda.so", RTLD_LAZY);
may not find the library if it is not in the standard library path (this could be the case for CDI).A simple workaround is to inject the
update-ldcache
hook twice, but we may want to consider a two phase approach where we first runldconfig
with the-N
flag to only update the links and then runldconfig
to update the cache.The text was updated successfully, but these errors were encountered: