Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect results when problem size M is not divisible by 16. #3963

Open
rdspring1 opened this issue Feb 25, 2025 · 1 comment
Open

Incorrect results when problem size M is not divisible by 16. #3963

rdspring1 opened this issue Feb 25, 2025 · 1 comment
Assignees
Labels

Comments

@rdspring1
Copy link
Collaborator

rdspring1 commented Feb 25, 2025

1752 / 16 = 109.5 so it is not a multiple of 16 and there are incorrect results. 1760 / 16 = 110 and 1744 / 16 = 110 is a multiple of 16 and runs correctly.

This is probably an issue with (16, 16) stmatrix store given the multiple of 16 requirement.

To Reproduce:
NVFUSER_ENABLE=fuse_matmul NVFUSER_DISABLE=matmul_expr_eval python profile_matmul.py 1752 4720 584 NN --verbose --validate

Error Message:

===== Matmul Parameters ========

MMA macro: Hopper_64_256_16
CircularBufferOptions:
  circular_buffer_smem_write: true
  circular_buffer_smem_read: false
  smem_circular_buffer_stage: 4
  smem_circular_buffer_prefetch_gap: 1
SupportedVectorization:
  a: 8
  b: 8
  epilogue: 8
MatMulTileOptions: warp tile [64, 256, 64], CTA tile [128, 256, 64]
Async global mem load: true
Indexing mode: int32_t
Tile rasterization order: column-major
Grid swizzle factor: 1
Tiling strategy: OneTilePerCTA
Buffering loop level: CTATiles
Circular buffering strategy: WarpSpecialized
__cluster_dims__(1, 1, 1)
Use shared memory epilogue: 1
Promote re-use of prologue shared memory: 1
Split-K factor: 1
====================================

Traceback (most recent call last):
  File "/opt/pytorch/nvfuser/doc/dev/python_scheduling/profile_matmul.py", line 209, in <module>
    main()
  File "/opt/pytorch/nvfuser/doc/dev/python_scheduling/profile_matmul.py", line 198, in main
    baseline_result, nvf_result = test_matmul_nvf(
                                  ^^^^^^^^^^^^^^^^
  File "/opt/pytorch/nvfuser/doc/dev/python_scheduling/profile_matmul.py", line 135, in test_matmul_nvf
    assert torch.allclose(
           ^^^^^^^^^^^^^^^
AssertionError
@rdspring1 rdspring1 self-assigned this Feb 25, 2025
@rdspring1 rdspring1 changed the title Incorrect results when problem size M is not divisible by 32. Incorrect results when problem size M is not divisible by 16. Feb 25, 2025
@jacobhinkle
Copy link
Collaborator

For the time being should we just disable smem epilogue when N is not divisible by 16?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants