Skip to content

Fix async_io ops building error on Huawei Ascend NPU#7894

Merged
sfc-gh-truwase merged 2 commits intodeepspeedai:masterfrom
huangyifan0610:huangyifan/fix-npu-aio-ops
Mar 12, 2026
Merged

Fix async_io ops building error on Huawei Ascend NPU#7894
sfc-gh-truwase merged 2 commits intodeepspeedai:masterfrom
huangyifan0610:huangyifan/fix-npu-aio-ops

Conversation

@huangyifan0610
Copy link
Contributor

Summary

Fixes async_io ops building error on Huawei Ascend NPU.

Environment

Item Version
kernel version 5.15.0-101-generic
torch version 2.8.0+cpu
deepspeed info 0.18.7
deepspeed wheel compiled w torch 2.8
torch_npu version 2.8.0
ascend_cann version 8.1.RC1

Deepspeed config "zero_optimization.offload_optimizer.device" = "nvme" (device = "cpu" works).

Error Messages

When offloading from NPU to NVME, error occurs:

ImportError: /.../async_io.so: undefined symbol: _ZN21deepspeed_io_handle_t18_create_io_op_descEbRKN2at6TensorEiPKcbl

nm tells that the symbol is declared but not defined, but it's found at "csrc/aio/py_lib/deepspeed_py_io_handle.cpp":

nm async_io.so | rg _ZN21deepspeed_io_handle_t18_create_io_op_descEbRKN2at6TensorEiPKcbl
#                 U _ZN21deepspeed_io_handle_t18_create_io_op_descEbRKN2at6TensorEiPKcbl

Solution

  1. op_builder/npu/async_io.py:
class AsyncIOBuilder(NPUOpBuilder):
    def sources(self):
        return [
            'csrc/aio/py_lib/deepspeed_py_copy.cpp', 'csrc/aio/py_lib/py_ds_aio.cpp',
            'csrc/aio/py_lib/deepspeed_py_aio.cpp', 'csrc/aio/py_lib/deepspeed_py_aio_handle.cpp',
            'csrc/aio/py_lib/deepspeed_aio_thread.cpp', 'csrc/aio/common/deepspeed_aio_utils.cpp',
            'csrc/aio/common/deepspeed_aio_common.cpp', 'csrc/aio/common/deepspeed_aio_types.cpp',
            'csrc/aio/py_lib/deepspeed_pin_tensor.cpp',
            # Adds 3 source files:
            'csrc/aio/py_lib/deepspeed_py_io_handle.cpp',
            'csrc/aio/py_lib/deepspeed_aio_op_desc.cpp',
            'csrc/aio/py_lib/deepspeed_cpu_op.cpp'
        ]
  1. csrc/aio/py_lib/deepspeed_cpu_op.cpp:
#if defined(__ENABLE_CANN__)
            // `DS_BUILD_OPS=1 install.sh` complains that ‘torch_npu’ has not
            // been declared, so inlines `torch_npu::utils::is_npu`.
            if (_buffer.is_privateuseone()) {
                auto device = at::Device("npu:0");
                _buffer.copy_(_cpu_buffer.to(device));
            }
#endif

Error occurs when offloading from NPU to NVME because async_io ops
compilation fails.

This commit adds some missing source files into the build script
(op_builder/npu/async_io.py) and fixes the error.

Signed-off-by: Huang Yifan <yifan0610@foxmail.com>
@sfc-gh-truwase sfc-gh-truwase merged commit f88d0f8 into deepspeedai:master Mar 12, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants