-
Notifications
You must be signed in to change notification settings - Fork 4
Description
Issue Body:
Describe the bug The GPU classification mode in the nvidia-tensor-based branch fails to initialize. Even after manually extracting and providing the complete set of required CUDA 11.x and cuDNN 8 libraries (totaling ~1.6GB) and correctly setting the LD_LIBRARY_PATH, TensorFlow in the nextcloud:stable image cannot link these libraries.
Environment:
Base Image: nextcloud:stable (Debian Bullseye/Bookworm)
Builder Image for libraries: nvcr.io/nvidia/tensorflow:22.12-tf2-py3 (Contains CUDA 11.8)
Recognize Version: Latest (using TensorFlow.js / tfjs-node-gpu)
GPU: NVIDIA (Verified via nvidia-smi on host and container)
Steps Taken:
Extracted the following libraries from the NVIDIA builder image into lib_gpu:
libcudart.so.11.0, libcublas.so.11, libcublasLt.so.11, libcufft.so.10, libcurand.so.10, libcusolver.so.11, libcusparse.so.11, libcudnn.so.8.
Built the custom image using the provided Dockerfile logic, copying these into /usr/local/cuda/lib64/.
Set LD_LIBRARY_PATH=/usr/local/cuda/lib64 and ran ldconfig.
Attempted to run php occ recognize:classify.
Error Logs: Despite the files being present and the paths being set, TensorFlow repeatedly logs:
Plaintext
W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory
... (repeated for all listed .so files)
W tensorflow/core/common_runtime/gpu/gpu_device.cc:1850] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU.
Observations:
The current build-all.sh in the repo is broken as it pulls CUDA 12 images which are incompatible with Recognize's requirement for CUDA 11.
The strip command in the original script corrupts the NVIDIA binaries (file format not recognized).
Even with clean, unstripped CUDA 11.8 libraries, the Debian-based Nextcloud container seems unable to link the Ubuntu-sourced NVIDIA libraries correctly.
Request: Could you please provide a working Dockerfile or a verified build-all.sh that addresses these linking issues? It seems there is a mismatch between the environment the libraries are pulled from and the final Nextcloud environment.