Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crypto/sha256: Sum256 performance regression in Go 1.24 #71943

Closed
danillouz opened this issue Feb 25, 2025 · 8 comments
Closed

crypto/sha256: Sum256 performance regression in Go 1.24 #71943

danillouz opened this issue Feb 25, 2025 · 8 comments
Assignees
Labels
BugReport Issues describing a possible bug in the Go implementation. NeedsFix The path to resolution is known, but the work has not been done. Performance

Comments

@danillouz
Copy link

After upgrading from Go 1.23.4 to 1.24 our regression test started to fail for a function where we compute the SHA256 checksum for a token + salt.

Reproduction

I can reproduce this with:

package pow

import (
	"crypto/sha256"
)

func Verify(token, salt string) {
	sha256.Sum256([]byte(token + salt))
}
package pow_test

import (
	"testing"

	pow "github.com/framer/shasum/go124"
)

func BenchmarkVerify(b *testing.B) {
	b.Run("valid", func(b *testing.B) {
		benchmarkVerify(b, "SwqnDBGIiJUHhVAG", "framer")
	})

	b.Run("invalid", func(b *testing.B) {
		benchmarkVerify(b, "SwqnDBGIiJUHhVAG", "webflow")
	})
}

func benchmarkVerify(b *testing.B, token, salt string) {
	for range b.N {
		pow.Verify(token, salt)
	}
}

Benchmark results

Go 1.23.4

~/Framer/shasum/go123 
> go1.23.4 test ./... -run ^$ -benchmem -bench BenchmarkVerify

goos: darwin
goarch: arm64
pkg: github.com/framer/shasum/go123
cpu: Apple M3 Max
BenchmarkVerify/valid-14         	26785438	        44.72 ns/op	       0 B/op	       0 allocs/op
BenchmarkVerify/invalid-14       	26374399	        44.52 ns/op	       0 B/op	       0 allocs/op
PASS
ok  	github.com/framer/shasum/go123	3.484s

~/Framer/shasum/go123 took 26s 
> go1.23.4 version

go version go1.23.4 darwin/arm64

Go 1.24

~/Framer/shasum/go124 
> go test ./... -run ^$ -benchmem -bench BenchmarkVerify

goos: darwin
goarch: arm64
pkg: github.com/framer/shasum/go124
cpu: Apple M3 Max
BenchmarkVerify/valid-14         	22192354	        53.80 ns/op	      24 B/op	       1 allocs/op
BenchmarkVerify/invalid-14       	21927637	        53.91 ns/op	      24 B/op	       1 allocs/op
PASS
ok  	github.com/framer/shasum/go124	3.537s

~/Framer/shasum/go124 took 26s 
> go version

go version go1.24.0 darwin/arm64

Potential cause

I see the implementation changed between versions:

But is the performance regression expected?

The release notes only mention the new interface implementation for crypto/sha256.
And from what I understand, the fips140/sha256 is disabled by default?

@gabyhelp gabyhelp added the BugReport Issues describing a possible bug in the Go implementation. label Feb 25, 2025
@mauri870
Copy link
Member

cc @FiloSottile

@FiloSottile
Copy link
Contributor

That's surprising, we have a test to make sure there is no allocation in that path. Maybe it's broken because it doesn't use the output? Will try to reproduce.

https://cs.opensource.google/go/go/+/refs/tags/go1.24.0:src/crypto/sha256/sha256_test.go;l=298-322

@mateusz834
Copy link
Member

mateusz834 commented Feb 25, 2025

Also note that it does string concatenation []byte(token + salt), maybe now the compiler for some reason does not use the stack 32byte optimization. i guess this is where the 24B comes from.

@mateusz834
Copy link
Member

mateusz834 commented Feb 25, 2025

On gotip:

func BenchmarkVerify(b *testing.B) {
	for b.Loop() {
		f := make([]byte, 32)
		rand.Read(f)
		runtime.KeepAlive(Sum256(f))
	}
}
func BenchmarkVerify(b *testing.B) {
	for b.Loop() {
		runtime.KeepAlive(Sum256([]byte("teststring" + "test")))
	}
}

both of these (above) do not allocate, but following (below) cases allocate:

func BenchmarkVerify(b *testing.B) {
	for b.Loop() {
		runtime.KeepAlive(Verify("teststring", "test"))
	}
}

func Verify(token, salt string) [32]byte {
	return Sum256([]byte(token + salt))
}
func BenchmarkVerify(b *testing.B) {
	for b.Loop() {
		str := "teststring"
		runtime.KeepAlive(Sum256([]byte(str + "test")))
	}
}

In these cases the 32 Byte optimization is unused.

CC @golang/compiler

@mateusz834
Copy link
Member

Probably caused by CL 527935, before these changes str1+str2 would have used the 32B stack buffer, and furthermore it would take use of no-copy string -> []byte optimization. Now the temporary buf is not created for such []byte(str1 + str2) conversions.

switch {
default:
base.FatalfAt(n.Pos(), "unexpected type: %v", typ)
case typ.IsString():
buf := typecheck.NodNil()
if n.Esc() == ir.EscNone {
sz := int64(0)
for _, n1 := range n.List {
if n1.Op() == ir.OLITERAL {
sz += int64(len(ir.StringVal(n1)))
}
}
// Don't allocate the buffer if the result won't fit.
if sz < tmpstringbufsize {
// Create temporary buffer for result string on stack.
buf = stackBufAddr(tmpstringbufsize, types.Types[types.TUINT8])
}
}
args = []ir.Node{buf}
fnsmall, fnbig = "concatstring%d", "concatstrings"
case typ.IsSlice() && typ.Elem().IsKind(types.TUINT8): // Optimize []byte(str1+str2+...)
fnsmall, fnbig = "concatbyte%d", "concatbytes"
}

@danillouz
Copy link
Author

Oh interesting, thank you all for looking into this.

Can indeed confirm that the following fixes the regression:

func Verify(token, salt string) {
-       sha256.Sum256([]byte(salt + token))
+	input := salt + token
+	sha256.Sum256([]byte(input))
}

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/652395 mentions this issue: cmd/compile, runtime: optimize concatbytes

@cuonglm cuonglm self-assigned this Feb 25, 2025
@cuonglm cuonglm added the NeedsFix The path to resolution is known, but the work has not been done. label Feb 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BugReport Issues describing a possible bug in the Go implementation. NeedsFix The path to resolution is known, but the work has not been done. Performance
Projects
None yet
Development

No branches or pull requests

7 participants