Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMDGPU] Skip handling of non-byte types in promote alloca. #128769

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

sgundapa
Copy link
Contributor

Non-byte types like i1 can be packed and be supported. For the time being these types are not promoted.

Issue found by fuzzer.

Non-byte types like i1 can be packed and be supported. For the time being
these types are not promoted.

Issue found by fuzzer.
@llvmbot
Copy link
Member

llvmbot commented Feb 25, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Sumanth Gundapaneni (sgundapa)

Changes

Non-byte types like i1 can be packed and be supported. For the time being these types are not promoted.

Issue found by fuzzer.


Full diff: https://github.com/llvm/llvm-project/pull/128769.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp (+9-2)
  • (added) llvm/test/CodeGen/AMDGPU/promote-alloca-skip-non-byte-type.ll (+21)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp b/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
index 28016b5936ccf..007f930cea4f3 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
@@ -759,6 +759,14 @@ bool AMDGPUPromoteAllocaImpl::tryPromoteAllocaToVector(AllocaInst &Alloca) {
     return false;
   }
 
+  Type *VecEltTy = VectorTy->getElementType();
+  constexpr unsigned SIZE_OF_BYTE = 8;
+  unsigned ElementSizeInBits = DL->getTypeSizeInBits(VecEltTy);
+  // FIXME: The non-byte type like i1 can be packed and be supported, but
+  // currently we do not handle them.
+  if (ElementSizeInBits % SIZE_OF_BYTE != 0)
+    return false;
+
   std::map<GetElementPtrInst *, WeakTrackingVH> GEPVectorIdx;
   SmallVector<Instruction *> WorkList;
   SmallVector<Instruction *> UsersToRemove;
@@ -776,8 +784,7 @@ bool AMDGPUPromoteAllocaImpl::tryPromoteAllocaToVector(AllocaInst &Alloca) {
 
   LLVM_DEBUG(dbgs() << "  Attempting promotion to: " << *VectorTy << "\n");
 
-  Type *VecEltTy = VectorTy->getElementType();
-  unsigned ElementSize = DL->getTypeSizeInBits(VecEltTy) / 8;
+  unsigned ElementSize = ElementSizeInBits / SIZE_OF_BYTE;
   for (auto *U : Uses) {
     Instruction *Inst = cast<Instruction>(U->getUser());
 
diff --git a/llvm/test/CodeGen/AMDGPU/promote-alloca-skip-non-byte-type.ll b/llvm/test/CodeGen/AMDGPU/promote-alloca-skip-non-byte-type.ll
new file mode 100644
index 0000000000000..3d2234f0a7ac3
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/promote-alloca-skip-non-byte-type.ll
@@ -0,0 +1,21 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -S -mtriple=amdgcn-unknown-amdhsa -passes=amdgpu-promote-alloca < %s | FileCheck %s
+
+; Verify that we do not crash and not promote non-byte alloca types.
+define <8 x i1> @non_byte_alloca_type() {
+; CHECK-LABEL: define <8 x i1> @non_byte_alloca_type() {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[C:%.*]] = icmp ugt <16 x i1> zeroinitializer, zeroinitializer
+; CHECK-NEXT:    [[RP:%.*]] = alloca <8 x i1>, align 1
+; CHECK-NEXT:    [[TMP0:%.*]] = load <8 x i1>, ptr [[RP]], align 1
+; CHECK-NEXT:    store <16 x i1> [[C]], ptr [[RP]], align 2
+; CHECK-NEXT:    ret <8 x i1> [[TMP0]]
+;
+entry:
+  %C = icmp ugt <16 x i1> zeroinitializer, zeroinitializer
+  %RP = alloca <8 x i1>, align 1
+  %0 = load <8 x i1>, ptr %RP, align 1
+  store <16 x i1> %C, ptr %RP, align 2
+  ret <8 x i1> %0
+}
+

@sgundapa sgundapa changed the title [AMDGPU] Skip handling non-byte types in promote alloca. [AMDGPU] Skip handling of non-byte types in promote alloca. Feb 25, 2025
@@ -776,8 +784,7 @@ bool AMDGPUPromoteAllocaImpl::tryPromoteAllocaToVector(AllocaInst &Alloca) {

LLVM_DEBUG(dbgs() << " Attempting promotion to: " << *VectorTy << "\n");

Type *VecEltTy = VectorTy->getElementType();
unsigned ElementSize = DL->getTypeSizeInBits(VecEltTy) / 8;
unsigned ElementSize = ElementSizeInBits / SIZE_OF_BYTE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC SIZE_OF_BYTE is defined by the whatever compiler compiles LLVM instead of for AMDGPU.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean , to use some thing like this to derive the value from data layout "DL.getTypeSizeInBits(Type::getInt8Ty(M->getContext()))".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have defined it to be "constexpr unsigned SIZE_OF_BYTE = 8" in line 763. Probably pick a different name ?

Copy link
Contributor

@shiltian shiltian Feb 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I missed that part. Hardcoding 8 is probably fine for now and in the any near future, but the proper approach is definitely to query DL.

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the actual code, I don't see why this doesn't just work for this case. Is the assert wrong?

; CHECK-NEXT: ret <8 x i1> [[TMP0]]
;
entry:
%C = icmp ugt <16 x i1> zeroinitializer, zeroinitializer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use something that can't fold away

;
entry:
%C = icmp ugt <16 x i1> zeroinitializer, zeroinitializer
%RP = alloca <8 x i1>, align 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the correct alloca address space. Also this issue isn't about the UB under-alignment, so correct that

unsigned ElementSizeInBits = DL->getTypeSizeInBits(VecEltTy);
// FIXME: The non-byte type like i1 can be packed and be supported, but
// currently we do not handle them.
if (ElementSizeInBits % SIZE_OF_BYTE != 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Best to replicate typeSizeEqualsStoreSize

store <16 x i1> %C, ptr %RP, align 2
ret <8 x i1> %0
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some tests for the scalar case? Only the subvector extract was a problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants