Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Mixed Input H100 Kernel Hangs #2121

Open
manishucsd opened this issue Feb 20, 2025 · 1 comment
Open

[BUG] Mixed Input H100 Kernel Hangs #2121

manishucsd opened this issue Feb 20, 2025 · 1 comment
Labels
? - Needs Triage bug Something isn't working

Comments

@manishucsd
Copy link
Contributor

manishucsd commented Feb 20, 2025

FE4M3 x BF16 Kernel Hangs when run with beta=1

Please compile the kernel cutlass3x_sm90_tensorop_s64x128x16gemm_e4m3_bf16_f32_bf16_bf16_cvt_64x128x128_8x1x1_0_tnt_align16_warpspecialized_pingpong_epi_tma in profiler and run it with beta=0 and beta=1. It is not just the profiler, we are using this kernel for some of our shapes where it came to be the winner; however, it hangs when we run it with beta = 1, and our runs are not with profiler. So this might be a bug in the kernel and not just a profiler issue. We used CUDA 12.4.99.

beta = 1 hangs

./tools/profiler/cutlass_profiler --kernels=cutlass3x_sm90_tensorop_s64x128x16gemm_e4m3_bf16_f32_bf16_bf16_cvt_64x128x128_8x1x1_0_tnt_align16_warpspecialized_pingpong_epi_tma --m=128 --n=256 --k=512 --beta=1

beta = 0 works

./tools/profiler/cutlass_profiler --kernels=cutlass3x_sm90_tensorop_s64x128x16gemm_e4m3_bf16_f32_bf16_bf16_cvt_64x128x128_8x1x1_0_tnt_align16_warpspecialized_pingpong_epi_tma --m=128 --n=25
6 --k=512



=============================
  Problem ID: 1

        Provider: CUTLASS
   OperationKind: gemm
       Operation: cutlass3x_sm90_tensorop_s64x128x16gemm_e4m3_bf16_f32_bf16_bf16_cvt_64x128x128_8x1x1_0_tnt_align16_warpspecialized_pingpong_epi_tma

          Status: Success
    Verification: ON
     Disposition: Not verified

reference_device: Not run
          cuBLAS: Not run
           cuDNN: Not run

            Math: 4002.22 GFLOP/s
=============================
@manishucsd manishucsd added ? - Needs Triage bug Something isn't working labels Feb 20, 2025
@MARD1NO
Copy link
Contributor

MARD1NO commented Feb 22, 2025

Also meet this problem, it raise Error Internal

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants