Fix async_io ops building error on Huawei Ascend NPU by huangyifan0610 · Pull Request #7894 · deepspeedai/DeepSpeed

huangyifan0610 · 2026-03-10T07:30:04Z

Summary

Fixes async_io ops building error on Huawei Ascend NPU.

Environment

Item	Version
kernel version	5.15.0-101-generic
torch version	2.8.0+cpu
deepspeed info	0.18.7
deepspeed wheel compiled w	torch 2.8
torch_npu version	2.8.0
ascend_cann version	8.1.RC1

Deepspeed config "zero_optimization.offload_optimizer.device" = "nvme" (device = "cpu" works).

Error Messages

When offloading from NPU to NVME, error occurs:

ImportError: /.../async_io.so: undefined symbol: _ZN21deepspeed_io_handle_t18_create_io_op_descEbRKN2at6TensorEiPKcbl

nm tells that the symbol is declared but not defined, but it's found at "csrc/aio/py_lib/deepspeed_py_io_handle.cpp":

nm async_io.so | rg _ZN21deepspeed_io_handle_t18_create_io_op_descEbRKN2at6TensorEiPKcbl
#                 U _ZN21deepspeed_io_handle_t18_create_io_op_descEbRKN2at6TensorEiPKcbl

Solution

op_builder/npu/async_io.py:

class AsyncIOBuilder(NPUOpBuilder):
    def sources(self):
        return [
            'csrc/aio/py_lib/deepspeed_py_copy.cpp', 'csrc/aio/py_lib/py_ds_aio.cpp',
            'csrc/aio/py_lib/deepspeed_py_aio.cpp', 'csrc/aio/py_lib/deepspeed_py_aio_handle.cpp',
            'csrc/aio/py_lib/deepspeed_aio_thread.cpp', 'csrc/aio/common/deepspeed_aio_utils.cpp',
            'csrc/aio/common/deepspeed_aio_common.cpp', 'csrc/aio/common/deepspeed_aio_types.cpp',
            'csrc/aio/py_lib/deepspeed_pin_tensor.cpp',
            # Adds 3 source files:
            'csrc/aio/py_lib/deepspeed_py_io_handle.cpp',
            'csrc/aio/py_lib/deepspeed_aio_op_desc.cpp',
            'csrc/aio/py_lib/deepspeed_cpu_op.cpp'
        ]

csrc/aio/py_lib/deepspeed_cpu_op.cpp:

#if defined(__ENABLE_CANN__)
            // `DS_BUILD_OPS=1 install.sh` complains that ‘torch_npu’ has not
            // been declared, so inlines `torch_npu::utils::is_npu`.
            if (_buffer.is_privateuseone()) {
                auto device = at::Device("npu:0");
                _buffer.copy_(_cpu_buffer.to(device));
            }
#endif

Error occurs when offloading from NPU to NVME because async_io ops compilation fails. This commit adds some missing source files into the build script (op_builder/npu/async_io.py) and fixes the error. Signed-off-by: Huang Yifan <yifan0610@foxmail.com>

huangyifan0610 requested review from loadams and tjruwase as code owners March 10, 2026 07:30

sfc-gh-truwase approved these changes Mar 10, 2026

View reviewed changes

Merge branch 'master' into huangyifan/fix-npu-aio-ops

c7ca547

sfc-gh-truwase merged commit f88d0f8 into deepspeedai:master Mar 12, 2026
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix async_io ops building error on Huawei Ascend NPU#7894

Fix async_io ops building error on Huawei Ascend NPU#7894
sfc-gh-truwase merged 2 commits intodeepspeedai:masterfrom
huangyifan0610:huangyifan/fix-npu-aio-ops

huangyifan0610 commented Mar 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

huangyifan0610 commented Mar 10, 2026

Summary

Environment

Error Messages

Solution

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants