Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TRACING-4752: Add OpenTelemetry-Collector as optional sub-package #4281

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

copejon
Copy link
Contributor

@copejon copejon commented Dec 6, 2024

Which issue(s) this PR addresses:

Closes #

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 6, 2024
Copy link
Contributor

openshift-ci bot commented Dec 6, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link
Contributor

openshift-ci bot commented Dec 6, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: copejon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 6, 2024
@@ -0,0 +1,27 @@
[Unit]
Description=MicroShift Observability
BindsTo=microshift.service
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to run the collector even when MicroShift fails?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd think yes. If MicroShift fails to start, the metrics and log data should still be collectable by the metrics/logging backend remotely.

@ggiguash
Copy link
Contributor

ggiguash commented Dec 9, 2024

/retitle NO-ISSUE: OpenTelemetry certificates and service for MicroShift

@openshift-ci openshift-ci bot changed the title No issue generate otel cert NO-ISSUE: OpenTelemetry certificates and service for MicroShift Dec 9, 2024
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Dec 9, 2024
@openshift-ci-robot
Copy link

@copejon: This pull request explicitly references no jira issue.

In response to this:

Which issue(s) this PR addresses:

Closes #

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@copejon copejon changed the title NO-ISSUE: OpenTelemetry certificates and service for MicroShift TRACING-4752: Add OpenTelemetry-Collector as optional sub-package Dec 12, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented Dec 12, 2024

@copejon: This pull request references TRACING-4752 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set.

In response to this:

Which issue(s) this PR addresses:

Closes #

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@copejon
Copy link
Contributor Author

copejon commented Dec 12, 2024

/jira refresh

@openshift-ci-robot
Copy link

openshift-ci-robot commented Dec 12, 2024

@copejon: This pull request references TRACING-4752 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@copejon copejon force-pushed the no-issue-generate-otel-cert branch from fa4f579 to fede276 Compare December 12, 2024 21:05
@copejon copejon marked this pull request as ready for review January 21, 2025 16:45
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 21, 2025
@openshift-ci openshift-ci bot requested review from agullon and eslutsky January 21, 2025 16:46
@copejon copejon force-pushed the no-issue-generate-otel-cert branch from 2042714 to ccfea22 Compare January 22, 2025 23:20
Requires: opentelemetry-collector

%description observability
Deploys the Red Hat build of Opentelemetry-collector as a systemd service on host. MicroShift provides client
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to be consistent in the naming case. Either fix this, or the Summary section, please.

Suggested change
Deploys the Red Hat build of Opentelemetry-collector as a systemd service on host. MicroShift provides client
Deploys the Red Hat build of OpenTelemetry-Collector as a systemd service on host. MicroShift provides client

Comment on lines 232 to 234
certificates to permit access to the kube-apiserver metrics endpoints. If a user defined opentelemetry-collector exists
at /etc/microshift/opentelemetry-collector.yaml, this config is used. Otherwise, a default config is provided. Note that
the default configuration requires the backend endpoint be set by the user.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
certificates to permit access to the kube-apiserver metrics endpoints. If a user defined opentelemetry-collector exists
at /etc/microshift/opentelemetry-collector.yaml, this config is used. Otherwise, a default config is provided. Note that
the default configuration requires the backend endpoint be set by the user.
certificates to permit access to the kube-apiserver metrics endpoints. If a user-defined configuration file exists
at /etc/microshift/opentelemetry-collector.yaml, this configuration is used. Otherwise, a default configuration is provided.
Note that the default configuration requires the backend endpoint be set by the user.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the backend endpoint, should we be specific on what we expect users to set?
I mean, should we say exporters.otlp section must be edited by users?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more specific instructions

# EXAMPLE OTLP (Prometheus) ENDPOINT CONFIG
# The otlp exporter requires an endpoint listening for OTLP connections. To prevent spamming the log with Go
# stack traces, the exporter is disabled. The endpoint is not known at installation, thus a tire-kicking of the
# microshift-observability package would result in stack traces spam in logs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's think what we can do so that the logs are not "spammed" when the default configuration is used. It sounds as if we should copy this file with .example suffix so that users would have to explicitly rename the file when they enable the collector service.

Copy link
Contributor

@ggiguash ggiguash Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, the "style" of this comment should be reworded.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tweaked the comment and made it a little more informational

@@ -0,0 +1,20 @@
[Unit]
Description=MicroShift Observability
After=microshift.service
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use ConditionPathExists here for all the files the service expects to have before it starts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The opentelemetry-collector performs that check for us each time it starts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but the point of the condition in systemd is not to attempt starting the service if the path does no exist.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this help to avoid unnecessary restarts?


# It takes a bit for the certs to be created. This service will reach it's burst limit almost immediately, pretty much
# guaranteeing that it will reach the restart limit before it can possibly succeed.
RestartSec=200ms
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this? We've configured the service to start After microshift, so microshift must report readiness to systemd before the current service startup is attempted. MicroShift only reports readiness after creating all certificates.
What am I missing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In earlier tests this was necessary to keep the service from crash looping, but that doesn't seem to be an issue in the latest opentelemetry-collector. Will remove

auth_type: tls
ca_file: /etc/pki/microshift-opentelemetry-collector-client/client-ca.crt
key_file: /etc/pki/microshift-opentelemetry-collector-client/client.key
cert_file: /etc/pki/microshift-opentelemetry-collector-client/client.crt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These paths need to be updated too -> /var/lib/microshift/..../

certificates to permit access to the kube-apiserver metrics endpoints. If a user defined Opentelemetry-Collector exists
at /etc/microshift/opentelemetry-collector.yaml, this config is used. Otherwise, a default config is provided. Note that
the default configuration requires the backend endpoint be set by the user. The otlp export must also be specified as
.service.pipelines.$RECIEVER.exporter: "otlp". The specification for the otlp config is:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not use shortened words because it's a user-facing RPM description.

@copejon copejon force-pushed the no-issue-generate-otel-cert branch from aee833a to ad4892d Compare January 30, 2025 13:13
Requires: opentelemetry-collector

%description observability
Deploys the Red Hat build of Opentelemetry-Collector as a systemd service on host. MicroShift provides client
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, fix the case of Opentelemety -> OpenTelemetry to make it consistent with the summary text.

@copejon copejon force-pushed the no-issue-generate-otel-cert branch from 33de178 to 2996e90 Compare February 11, 2025 19:58
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 13, 2025
@copejon copejon force-pushed the no-issue-generate-otel-cert branch from 6c3feda to 024edb1 Compare February 13, 2025 22:21
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 13, 2025
@copejon copejon force-pushed the no-issue-generate-otel-cert branch 3 times, most recently from 6562511 to 0241f48 Compare February 17, 2025 19:40
…elemetry-collector, preconfigured for microshift
@copejon copejon force-pushed the no-issue-generate-otel-cert branch from cc06c31 to e7136a4 Compare February 18, 2025 22:07
@copejon
Copy link
Contributor Author

copejon commented Feb 18, 2025

I've gotten the test suite into a functioning state locally. Up next: t-shirt sized configs to enable different levels of data collecting.

add test for error in log

handle uninstalling observability systemd units gracefully

Signed-off-by: Jon Cope <[email protected]>
Copy link
Contributor

openshift-ci bot commented Feb 21, 2025

@copejon: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-tests-bootc 5845981 link true /test e2e-aws-tests-bootc
ci/prow/e2e-aws-tests 5845981 link true /test e2e-aws-tests
ci/prow/e2e-aws-tests-bootc-arm 5845981 link true /test e2e-aws-tests-bootc-arm
ci/prow/verify 5845981 link true /test verify

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants