Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CNTRLPLANE-112: Add new Azure authentication type for managed Azure HCP for cluster-image-registry #1174

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

bryan-cox
Copy link
Member

@bryan-cox bryan-cox commented Feb 13, 2025

This PR:

  • adds a context function parameter to getCreds
  • adds the new Azure authentication for managed Azure HCP called UserAssignedIdentityCredentials

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Feb 13, 2025
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Feb 13, 2025

@bryan-cox: This pull request references CNTRLPLANE-112 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set.

In response to this:

This PR:

  • adds a context function parameter to getCreds
  • removes the filewatcher that was previously needed for Azure authentication for managed Azure HCP
  • adds the new Azure authentication for managed Azure HCP called UserAssignedIdentityCredentials

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 13, 2025
Copy link
Contributor

openshift-ci bot commented Feb 13, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@bryan-cox
Copy link
Member Author

This PR is needed before this PR can be merged - openshift/hypershift#5621.

@bryan-cox bryan-cox changed the title CNTRLPLANE-112: Add new Azure authentication type for managed Azure HCP CNTRLPLANE-112: Add new Azure authentication type for managed Azure HCP for cluster-image-registry Feb 13, 2025
@flavianmissi
Copy link
Member

Thanks for this, looks great!

Can you have a look at the verify job failure? It looks like an error is uncaught.
Pasting the job output here for accessibility:

 GOLANGCI_LINT_CACHE=/go/src/github.com/openshift/cluster-image-registry-operator/_output/golangci-lint-cache _output/tools/golangci-lint run --timeout=300s ./cmd/... ./pkg/... ./test/...
pkg/storage/azure/azure.go:380:9: ineffectual assignment to err (ineffassign)
		cred, err = dataplane.NewUserAssignedIdentityCredential(context.Background(), userAssignedIdentityCredentialsFilePath, dataplane.WithClientOpts(clientOptions))
		      ^
pkg/storage/azure/azureclient/azureclient.go:112:10: ineffectual assignment to err (ineffassign)
		creds, err = dataplane.NewUserAssignedIdentityCredential(ctx, userAssignedIdentityCredentialsFilePath, dataplane.WithClientOpts(clientOptions))
		       ^
make: *** [Makefile:52: verify-golangci-lint] Error 1 

An assignment is ineffectual if the variable assigned is not thereafter used.

https://pkg.go.dev/github.com/gordonklaus/ineffassign#section-readme

@bryan-cox
Copy link
Member Author

/test all

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Feb 17, 2025

@bryan-cox: This pull request references CNTRLPLANE-112 which is a valid jira issue.

In response to this:

This PR:

  • adds a context function parameter to getCreds
  • adds the new Azure authentication for managed Azure HCP called UserAssignedIdentityCredentials

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

This commit adds the msi-dataplane library to the go.mod for managed Azure HCP.

Signed-off-by: Bryan Cox <[email protected]>
This commit adds a context function parameter to getCreds. This will be needed for the new authentication type being used for managed Azure HCP.

Signed-off-by: Bryan Cox <[email protected]>
This commit adds a new authentication type for managed Azure HCP called UserAssignedIdentityCredentials. This new authentication type replaces the previous authentication method for managed Azure HCP.

Signed-off-by: Bryan Cox <[email protected]>
@bryan-cox
Copy link
Member Author

/test all

@bryan-cox
Copy link
Member Author

/test e2e-aws-ovn

@flavianmissi
Copy link
Member

/retest

@bryan-cox
Copy link
Member Author

/test hypershift-e2e-aks

1 similar comment
@bryan-cox
Copy link
Member Author

/test hypershift-e2e-aks

@bryan-cox
Copy link
Member Author

Hey @flavianmissi - I tested this PR successfully locally yesterday in conjunction with openshift/hypershift#5655. Once this PR merges, I can merge openshift/hypershift#5655 and after it merges I will open a new PR to remove the old auth method/filewatcher. We need both auth methods at the moment to keep e2es passing.

% oc get machines.cluster.x-k8s.io -A; echo; oc get nodepools -A; echo; oc get hostedclusters -A; echo; oc get pods -n clusters-generic-hc
NAMESPACE             NAME                     CLUSTER            NODENAME                 PROVIDERID                                                                                                                                                         PHASE     AGE   VERSION
clusters-generic-hc   generic-hc-p2zw9-nbwcc   generic-hc-bphll   generic-hc-p2zw9-nbwcc   azure:///subscriptions/5f99720c-6823-4792-8a28-69efb0719eea/resourceGroups/generic-managed-rg/providers/Microsoft.Compute/virtualMachines/generic-hc-p2zw9-nbwcc   Running   89m   4.19.0-0.test-2025-02-17-141729-ci-ln-fd0wzgt-latest
clusters-generic-hc   generic-hc-p2zw9-nrhn5   generic-hc-bphll   generic-hc-p2zw9-nrhn5   azure:///subscriptions/5f99720c-6823-4792-8a28-69efb0719eea/resourceGroups/generic-managed-rg/providers/Microsoft.Compute/virtualMachines/generic-hc-p2zw9-nrhn5   Running   89m   4.19.0-0.test-2025-02-17-141729-ci-ln-fd0wzgt-latest

NAMESPACE   NAME         CLUSTER      DESIRED NODES   CURRENT NODES   AUTOSCALING   AUTOREPAIR   VERSION                                                UPDATINGVERSION   UPDATINGCONFIG   MESSAGE
clusters    generic-hc   generic-hc   2               2               False         False        4.19.0-0.test-2025-02-17-141729-ci-ln-fd0wzgt-latest   False             False

NAMESPACE   NAME         VERSION   KUBECONFIG                    PROGRESS   AVAILABLE   PROGRESSING   MESSAGE
clusters    generic-hc             generic-hc-admin-kubeconfig   Partial    True        False         The hosted control plane is available

NAME                                                  READY   STATUS    RESTARTS   AGE
azure-cloud-controller-manager-69976bffb-zrsc9        1/1     Running   0          89m
azure-disk-csi-driver-controller-7b6cdc5b98-qxlsh     11/11   Running   0          88m
azure-disk-csi-driver-operator-6b496569cb-2cx6s       1/1     Running   0          88m
azure-file-csi-driver-controller-7b6f9ffd76-wglks     11/11   Running   0          88m
azure-file-csi-driver-operator-69dbd7646f-79rxw       1/1     Running   0          88m
capi-provider-596759c649-vpfxt                        1/1     Running   0          100m
catalog-operator-5c49b58cd5-4cqd6                     2/2     Running   0          64m
certified-operators-catalog-748b6f59c9-9kjsx          1/1     Running   0          89m
cloud-network-config-controller-7d59ff54d6-gdqgh      3/3     Running   0          63m
cluster-api-7f6779c6fc-zsq6t                          1/1     Running   0          100m
cluster-image-registry-operator-84f674c97d-w8jtj      2/2     Running   0          64m
 % k logs pod/cluster-image-registry-operator-84f674c97d-w8jtj
Defaulted container "cluster-image-registry-operator" out of: cluster-image-registry-operator, client-token-minter
Waiting for client token
Waiting for client token
I0217 20:07:53.108585       1 leaderelection.go:121] The leader election gives 4 retries and allows for 30s of clock skew. The kube-apiserver downtime tolerance is 78s. Worst non-graceful lease acquisition is 2m43s. Worst graceful lease acquisition is {26s}.
I0217 20:07:53.115414       1 observer_polling.go:159] Starting file observer
...
I0217 20:08:24.420528       1 event.go:377] Event(v1.ObjectReference{Kind:"Namespace", Namespace:"openshift-image-registry", Name:"openshift-image-registry", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'DaemonSetUpdated' Updated DaemonSet.apps/node-ca -n openshift-image-registry because it changed
I0217 20:08:24.469217       1 generator.go:63] object *v1.DaemonSet, Namespace=openshift-image-registry, Name=node-ca updated:
I0217 20:08:24.470222       1 azureclient.go:111] Using UserAssignedIdentityCredentials for Azure authentication for managed Azure HCP
...

@bryan-cox bryan-cox marked this pull request as ready for review February 18, 2025 11:45
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 18, 2025
@flavianmissi
Copy link
Member

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Feb 18, 2025
Copy link
Contributor

openshift-ci bot commented Feb 18, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bryan-cox, flavianmissi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 18, 2025
@bryan-cox
Copy link
Member Author

@Patryk-Stefanski and/or I will look into that failing aks test. It should not be failing since he merged in the fix for the issue earlier today.

@bryan-cox
Copy link
Member Author

/retest-required

@bryan-cox
Copy link
Member Author

/test hypershift-e2e-aks

@flavianmissi
Copy link
Member

Let's try and see if we can get e2e-azure-operator passing, even though they're not required, since the changes in this PR affect azure code. I'm pretty sure it's just flakes though 😅

/test e2e-azure-operator

@flavianmissi
Copy link
Member

@bryan-cox it also looks like you're going to need QE, docs and PX approvals. Let me know if you need help getting those.

@bryan-cox
Copy link
Member Author

/test e2e-azure-operator

2 similar comments
@bryan-cox
Copy link
Member Author

/test e2e-azure-operator

@bryan-cox
Copy link
Member Author

/test e2e-azure-operator

Copy link
Contributor

openshift-ci bot commented Feb 20, 2025

@bryan-cox: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/okd-scos-e2e-aws-ovn 2abed34 link false /test okd-scos-e2e-aws-ovn

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@flavianmissi
Copy link
Member

Hmm it's concerning that the e2e-azure-operator is consistently failing. I realize it's not always the same test that fails, but if you look at the test output the one consistent thing between different tests is that the operator gets stuck in Progress and Degraded states.

Can you have a look @bryan-cox? These suits are in the operator repository itself. They also don't gather artifacts as the e2e tests in openshift/release repository do, so we have to rely on whatever the test output gives us for troubleshooting... (sorry about that, it's legacy stuff). It might be easier to run it locally in your own cluster. Let me know if you need any more info.

I'll put this PR on hold for now until we can understand why these tests are flaking more than usual.
/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 20, 2025
@flavianmissi
Copy link
Member

FTR I've triggered e2e-azure-operator tests on a different PR to try and understand if these tests are also flaky on the main branch or if it's something here that destabilized them.

@bryan-cox
Copy link
Member Author

@bryan-cox it also looks like you're going to need QE, docs and PX approvals. Let me know if you need help getting those.

Hey folks 👋🏻 - could I get some help with getting these labels for this PR please?

@xenolinux
Copy link

/label docs-approved

@openshift-ci openshift-ci bot added the docs-approved Signifies that Docs has signed off on this PR label Feb 20, 2025
@flavianmissi
Copy link
Member

After running e2e-azure-operator tests on a different PR we observed they were also failing in similar manner to the failures we're seeing here.
Given that, I think it's safe to say these tests are unstable on the main branch. I've raised the matter on slack, and for now we're going to override them.

/override ci/prow/e2e-azure-operator
/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 24, 2025
Copy link
Contributor

openshift-ci bot commented Feb 24, 2025

@flavianmissi: Overrode contexts on behalf of flavianmissi: ci/prow/e2e-azure-operator

In response to this:

After running e2e-azure-operator tests on a different PR we observed they were also failing in similar manner to the failures we're seeing here.
Given that, I think it's safe to say these tests are unstable on the main branch. I've raised the matter on slack, and for now we're going to override them.

/override ci/prow/e2e-azure-operator
/hold cancel

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@bryan-cox
Copy link
Member Author

@wewang58 could you help us with getting the QE approved label please?

@sferich888
Copy link
Contributor

/label px-approved

@openshift-ci openshift-ci bot added the px-approved Signifies that Product Support has signed off on this PR label Feb 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. docs-approved Signifies that Docs has signed off on this PR jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. px-approved Signifies that Product Support has signed off on this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants