Skip to content

Conversation

@MasterJH5574
Copy link
Contributor

This PR fixes a typo in the samping kernel of top-p/top-k sampling op. Prior to this PR, the kernel has out-of-bound global memory access due to a miss when introducing sample_indices in #16675.

The correctness pass did not reveal this issue by directly running the test or running through pytest. But actually, if we use compute-sanitizer from NVIDIA, it will report the illegal memory access:

> compute-sanitizer --tool memcheck --print-limit=5 --launch-timeout 3600 python tests/python/relax/test_frontend_nn_op.py
========= COMPUTE-SANITIZER
========= Invalid __global__ read of size 8 bytes
=========     at 0x4e90 in get_index_from_sorted_kernel
=========     by thread (7,0,0) in block (0,0,0)
=========     Address 0x7fe35ac00238 is out of bounds
=========     and is 9 bytes after the nearest allocation at 0x7fe35ac00200 of size 48 bytes
=========     Saved host backtrace up to driver entry point at kernel launch time
...

This PR fixes a typo in the samping kernel of top-p/top-k sampling op.
Prior to this PR, the kernel has out-of-bound global memory access
due to a miss when introducing `sample_indices` in apache#16675.

The correctness pass did not reveal this issue by directly running
the test or running through pytest. But actually, if we use
compute-sanitizer from NVIDIA, it will report the illegal memory
access:
```
> compute-sanitizer --tool memcheck --print-limit=5 --launch-timeout 3600 python tests/python/relax/test_frontend_nn_op.py
========= COMPUTE-SANITIZER
========= Invalid __global__ read of size 8 bytes
=========     at 0x4e90 in get_index_from_sorted_kernel
=========     by thread (7,0,0) in block (0,0,0)
=========     Address 0x7fe35ac00238 is out of bounds
=========     and is 9 bytes after the nearest allocation at 0x7fe35ac00200 of size 48 bytes
=========     Saved host backtrace up to driver entry point at kernel launch time
...
```
Copy link
Member

@yongwww yongwww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!

@tqchen tqchen merged commit ef8c428 into apache:main Mar 12, 2024
thaisacs pushed a commit to thaisacs/tvm that referenced this pull request Apr 3, 2024
This PR fixes a typo in the samping kernel of top-p/top-k sampling op.
Prior to this PR, the kernel has out-of-bound global memory access
due to a miss when introducing `sample_indices` in apache#16675.

The correctness pass did not reveal this issue by directly running
the test or running through pytest. But actually, if we use
compute-sanitizer from NVIDIA, it will report the illegal memory
access:
```
> compute-sanitizer --tool memcheck --print-limit=5 --launch-timeout 3600 python tests/python/relax/test_frontend_nn_op.py
========= COMPUTE-SANITIZER
========= Invalid __global__ read of size 8 bytes
=========     at 0x4e90 in get_index_from_sorted_kernel
=========     by thread (7,0,0) in block (0,0,0)
=========     Address 0x7fe35ac00238 is out of bounds
=========     and is 9 bytes after the nearest allocation at 0x7fe35ac00200 of size 48 bytes
=========     Saved host backtrace up to driver entry point at kernel launch time
...
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants