Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-18757: Create full-function SimpleAssignor to match KIP-932 description #18864

Open
wants to merge 20 commits into
base: trunk
Choose a base branch
from

Conversation

adixitconfluent
Copy link
Contributor

@adixitconfluent adixitconfluent commented Feb 11, 2025

About

The current SimpleAssignor in AK assigned all subscribed topic partitions to all the share group members. This does not match the description given in KIP-932. Here are the rules as mentioned in the KIP by which the assignment should happen. We have changed the step 3 implementation here due to the reasons described -

  1. The assignor hashes the member IDs of the members and maps the partitions assigned to the members based on the hash. This gives approximately even balance.
  2. If any partitions were not assigned any members by (1) and do not have members already assigned in the current assignment, members are assigned round-robin until each partition has at least one member assigned to it.
  3. We combine the current and new assignment. (Original rule - If any partitions were assigned members by (1) and also have members in the current assignment assigned by (2), the members assigned by (2) are removed.)

Tests

The added code has been verified with unit tests and the already present integration tests.

@github-actions github-actions bot added the triage PRs from the community label Feb 11, 2025
@adixitconfluent adixitconfluent marked this pull request as ready for review February 11, 2025 18:49
@AndrewJSchofield AndrewJSchofield added KIP-932 Queues for Kafka ci-approved and removed triage PRs from the community labels Feb 11, 2025
@github-actions github-actions bot added the core Kafka Broker label Feb 12, 2025
@adixitconfluent adixitconfluent marked this pull request as draft February 12, 2025 15:41
…ulating hash for current assignment + unit test"

This reverts commit 86a4c6f.
@adixitconfluent adixitconfluent marked this pull request as ready for review February 12, 2025 18:15
@adixitconfluent
Copy link
Contributor Author

Hi @AndrewJSchofield @apoorvmittal10 , the step 3 described above is a little tricky to implement (since we can only know the current assignment, not whether it was calculated by step 1 or step 2). I have implemented a way to filter current assignment as required in step 3 in function filterCurrentAssignment, but it is incorrect for a few cases. Maybe step 3 needs more consideration (or a future PR), but can we please meanwhile review the PR in its current state. I can also get rid of step 3 as of now and implement it in a new PR as well. Let me know your thoughts.

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 2 out of 3 changed files in this pull request and generated 1 comment.

Files not reviewed (1)
  • core/src/test/scala/unit/kafka/server/ShareGroupHeartbeatRequestTest.scala: Language not supported
Comments suppressed due to low confidence (1)

group-coordinator/src/main/java/org/apache/kafka/coordinator/group/assignor/SimpleAssignor.java:304

  • The 'partition' field should be declared as 'final' to make the 'TargetPartition' class immutable.
int partition;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 2 out of 3 changed files in this pull request and generated no comments.

Files not reviewed (1)
  • core/src/test/scala/unit/kafka/server/ShareGroupHeartbeatRequestTest.scala: Language not supported
@adixitconfluent
Copy link
Contributor Author

adixitconfluent commented Feb 14, 2025

I have amended the implementation of the step 3 of the assignment such that we will combine new and current assignment without revoking the assignments that were assigned by step 1 in the new assignment and have members in current assignment by step 2. This has been done to avoid the complexity in both the implementation and the run time complexity because as of now we can only get the current assignment while calculating the new assignment. We do not have a way to know with which step, a particular assignment happened in the current assignment. I do have a way with which we can recreate the step wise assignment using the current assignment but that involves sorting and unnecessary computation. Hence, I am deferring with that approach.
IMO, step 3 helps in reducing the burden of certain members of the share groups. This can be achieved with the help of limiting the max no. of partitions assignment for every member(KAFKA-18788). Hence, the potential problem of burdening the share consumers will be addressed in a future PR.
PS - We shouldn't have any problem in merging the PR to trunk with the amendment I suggested since right now, we anyways assign all the topic partitions to all the share group members which would be leading to burdening the share consumers anyways.
cc- @AndrewJSchofield @apoorvmittal10

@AndrewJSchofield
Copy link
Member

Marked failed test as flaky in #18925.

Copy link
Member

@AndrewJSchofield AndrewJSchofield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. Only a partial review so far, but I've left some initial comments.

Copy link
Collaborator

@apoorvmittal10 apoorvmittal10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, took an initial look. Some comments.

Copy link
Collaborator

@apoorvmittal10 apoorvmittal10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments, though seems PR as good starting point and we migh improve on better partition stickiness while revoking and assigning partitions.

Copy link
Collaborator

@apoorvmittal10 apoorvmittal10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good to me, can you please share the numbers with 16 partitions and 25 share consumers, with and without the PR.

@adixitconfluent
Copy link
Contributor Author

can you please share the numbers with 16 partitions and 25 share consumers, with and without the PR.

@apoorvmittal10 , here are the numbers for 16 partitions and 25 share consumers -

With PR -
1 million records of size 1024 each - 17.5 seconds
5 million records of size 1024 each - 91 seconds

Without PR -
1 million records of size 1024 each - 14.1 seconds
5 million records of size 1024 each - 72.2 seconds

As mentioned above as well, this PR reduces the sharing of topic partitions from the assignor, hence the decline in performance is expected. With the future PRs, the performance should reach an optimum number.

@apoorvmittal10
Copy link
Collaborator

can you please share the numbers with 16 partitions and 25 share consumers, with and without the PR.

@apoorvmittal10 , here are the numbers for 16 partitions and 25 share consumers -

With PR - 1 million records of size 1024 each - 17.5 seconds 5 million records of size 1024 each - 91 seconds

Without PR - 1 million records of size 1024 each - 14.1 seconds 5 million records of size 1024 each - 72.2 seconds

As mentioned above as well, this PR reduces the sharing of topic partitions from the assignor, hence the decline in performance is expected. With the future PRs, the performance should reach an optimum number.

Just to clarify, how was the partition allocation with the current PR code. Also if members are removed and added then there would be more sharing of partitions as per the combine logic in the PR, correct? Will it affect the performance?

@adixitconfluent
Copy link
Contributor Author

can you please share the numbers with 16 partitions and 25 share consumers, with and without the PR.

@apoorvmittal10 , here are the numbers for 16 partitions and 25 share consumers -
With PR - 1 million records of size 1024 each - 17.5 seconds 5 million records of size 1024 each - 91 seconds
Without PR - 1 million records of size 1024 each - 14.1 seconds 5 million records of size 1024 each - 72.2 seconds
As mentioned above as well, this PR reduces the sharing of topic partitions from the assignor, hence the decline in performance is expected. With the future PRs, the performance should reach an optimum number.

Just to clarify, how was the partition allocation with the current PR code. Also if members are removed and added then there would be more sharing of partitions as per the combine logic in the PR, correct? Will it affect the performance?

right now, most of the members had 1-2 topic partitions allocated to them excepted 1-2 members which had a good 12-14 partitions assigned to them.
Yes, if members are removed and added then there would be more sharing of partitions as per the combine logic in the PR. Given the small size of 1-5 million records, it should improve the performance.

Copy link
Collaborator

@apoorvmittal10 apoorvmittal10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, given the code of the simple assignor will change in future PRs. One comment to address.

Copy link
Contributor

@TaiJuWu TaiJuWu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just two nit for question but they are not very important.

// the burden of certain members of the share groups. This can be achieved with the help of limiting the max
// no. of partitions assignment for every member(KAFKA-18788). Hence, the potential problem of burdening
// the share consumers will be addressed in a future PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the following do the job a bit better?

        newAssignment.forEach((targetPartition, members) -> members.forEach(member ->
                finalAssignment.computeIfAbsent(member, k -> new HashSet<>()).add(targetPartition)));
        currentAssignment.forEach((targetPartition, members) -> {
            if (subscribedTopicIds.contains(targetPartition.topicId())) {}
                members.forEach(member -> {
                    if (groupSpec.memberIds().contains(member) && !newAssignment.containsKey(targetPartition))
                        finalAssignment.computeIfAbsent(member, k -> new HashSet<>()).add(targetPartition);
                });
        });

The problem with the code as it currently exists is that it assigns all partitions to the first member, and then as other members join, it leaves all partitions with the first member in spite of assigning the partitions to the other members.

What the snippet above does is essentially give precedence to the new assignment, and only copies over information from the current assignment which augments the new assignment. It's still not perfect because the round-robin nature of the reassignment is not sophisticated enough, but I think it's probably better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense. This will help reduce burdening of members, though it affects the stickiness of assignments now since we are revoking the assignments from current assignment. Now, we'll need to think of a way for optimum sharing in the future PRs. I have made this change.

Copy link
Member

@AndrewJSchofield AndrewJSchofield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. Needs a bit more refinement, but this is a good start.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-approved core Kafka Broker KIP-932 Queues for Kafka
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants