You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Please compile the kernel cutlass3x_sm90_tensorop_s64x128x16gemm_e4m3_bf16_f32_bf16_bf16_cvt_64x128x128_8x1x1_0_tnt_align16_warpspecialized_pingpong_epi_tma in profiler and run it with beta=0 and beta=1. It is not just the profiler, we are using this kernel for some of our shapes where it came to be the winner; however, it hangs when we run it with beta = 1, and our runs are not with profiler. So this might be a bug in the kernel and not just a profiler issue. We used CUDA 12.4.99.
./tools/profiler/cutlass_profiler --kernels=cutlass3x_sm90_tensorop_s64x128x16gemm_e4m3_bf16_f32_bf16_bf16_cvt_64x128x128_8x1x1_0_tnt_align16_warpspecialized_pingpong_epi_tma --m=128 --n=25
6 --k=512
=============================
Problem ID: 1
Provider: CUTLASS
OperationKind: gemm
Operation: cutlass3x_sm90_tensorop_s64x128x16gemm_e4m3_bf16_f32_bf16_bf16_cvt_64x128x128_8x1x1_0_tnt_align16_warpspecialized_pingpong_epi_tma
Status: Success
Verification: ON
Disposition: Not verified
reference_device: Not run
cuBLAS: Not run
cuDNN: Not run
Math: 4002.22 GFLOP/s
=============================
The text was updated successfully, but these errors were encountered:
FE4M3 x BF16 Kernel Hangs when run with beta=1
Please compile the kernel
cutlass3x_sm90_tensorop_s64x128x16gemm_e4m3_bf16_f32_bf16_bf16_cvt_64x128x128_8x1x1_0_tnt_align16_warpspecialized_pingpong_epi_tma
in profiler and run it withbeta=0
andbeta=1
. It is not just the profiler, we are using this kernel for some of our shapes where it came to be the winner; however, it hangs when we run it withbeta = 1
, and our runs are not with profiler. So this might be a bug in the kernel and not just a profiler issue. We used CUDA 12.4.99.beta = 1 hangs
beta = 0 works
The text was updated successfully, but these errors were encountered: