Instrument http add on operator #1328

elieser1101 · 2025-08-24T22:43:59Z

Added instrumentation for promethues compatible endpoints and otel.
Added tests and e2e tests.

I went forward this ways as I was trying to get familiar with codebase. Happy to go in any direction reviewers advise.

Checklist

Commits are signed with Developer Certificate of Origin (DCO)
Changelog has been updated and is aligned with our changelog requirements
Any necessary documentation is added, such as:

Part of #965

Signed-off-by: Elieser Pereira <elieser.pereiraa@gmail.com>

…o fit meaning Signed-off-by: Elieser Pereira <elieser.pereiraa@gmail.com>

elieser1101 · 2025-08-25T11:46:12Z

Hi folks, trying to get some eyes here but not really sure who to tag, thanks for any pointer in advance
@JorTurFer @rickbrouwer @zroubalik tagging you 3 since helped me with previous unrelated PR kedacore/keda#6990

wozniakjan · 2025-08-28T08:35:52Z

operator/controllers/http/httpscaledobject_controller.go

+func (r *HTTPScaledObjectReconciler) updatePromMetricsOnDelete(ctx context.Context, scaledObject *httpv1alpha1.HTTPScaledObject, namespacedName string) {
+	logger := log.FromContext(ctx, "updatePromMetricsOnDelete", namespacedName)
+	logger.Info("updatePromMetricsOnDelete")
+	metrics.RecordDeleteHTTPScaledObjectCount(namespacedName)
+}


looks like this is never called, wouldn't this result in always growing metric?

wozniakjan · 2025-08-28T08:40:04Z

tests/checks/operator_otel_metrics/operator_otel_metrics_test.go

+	assert.True(t, ok, "operator_http_scaled_object_count_total is available")
+
+	requestCount := getMetricsValue(val)
+	assert.GreaterOrEqual(t, requestCount, float64(1))


when the whole e2e suite is executed, multiple e2e tests can run in parallel (iirc it could be like 3?). Perhaps the test could create 100 HSOs check if the metric is equal or over 100. And then we may want to delete that 100, wait for the metric to propagate and check again if it's under some reasonably low number, e.g. 10.

That's a good point, meanwhile we don't execute sequential tests (I'd like to avoid it tbh, but maybe we have to 🤔)
I guess that we can just execute both checks at once in the same tests, something like:

spawn 20 HTTPScaledObjects

check prometheus

check otel

remove the HTTPScaledObjects

check prometheus

check otel

JorTurFer · 2025-08-28T18:13:43Z

config/operator/deployment.yaml

-          containerPort: 8080
+          containerPort: 2223


why this? Personally, I'd prefer to not change an already existing port

JorTurFer · 2025-08-28T18:15:27Z

docs/operate.md

+This endpoint can be enabled by setting the `OTEL_PROM_EXPORTER_ENABLED` environment variable to `true` on the operator deployment (`true` by default) and by setting `OTEL_PROM_EXPORTER_PORT` to an unused port for the endpoint to be made avaialble on (`2223` by default).
+
+### Configuring the OTEL HTTP exporter
+When configured, the ioperator can export metrics to a OTEL HTTP collector.


Suggested change

When configured, the ioperator can export metrics to a OTEL HTTP collector.

When configured, the operator can export metrics to a OTEL HTTP collector.

JorTurFer · 2025-08-28T18:15:55Z

docs/operate.md

+### Configuring the OTEL HTTP exporter
+When configured, the ioperator can export metrics to a OTEL HTTP collector.
+
+The OTEL exporter can be enabled by setting the `OTEL_EXPORTER_OTLP_METRICS_ENABLED` environment variable to `true` on the operator deployment (`false` by default). When enabled the `OTEL_EXPORTER_OTLP_ENDPOINT` environment variable must also be configured so the exporter knows what collector to send the metrics to (e.g. http://opentelemetry-collector.open-telemetry-system:4318).


Suggested change

The OTEL exporter can be enabled by setting the `OTEL_EXPORTER_OTLP_METRICS_ENABLED` environment variable to `true` on the operator deployment (`false` by default). When enabled the `OTEL_EXPORTER_OTLP_ENDPOINT` environment variable must also be configured so the exporter knows what collector to send the metrics to (e.g. http://opentelemetry-collector.open-telemetry-system:4318).

The OTEL exporter can be enabled by setting the `OTEL_EXPORTER_OTLP_METRICS_ENABLED` environment variable to `true` on the operator deployment (`false` by default). When enabled, the `OTEL_EXPORTER_OTLP_ENDPOINT` environment variable must also be configured so the exporter knows what collector to send the metrics to (e.g. http://opentelemetry-collector.open-telemetry-system:4318).

JorTurFer · 2025-08-28T18:19:56Z

operator/main.go

+	if metricsCfg.OtelPrometheusExporterEnabled {
+		// start the prometheus compatible metrics server
+		// serves a prometheus compatible metrics endpoint on the configured port
+		eg.Go(func() error {
+			if err := runMetricsServer(ctx, ctrl.Log, metricsCfg); !util.IsIgnoredErr(err) {
+				setupLog.Error(err, "could not start the Prometheus metrics server")
+				return err
+			}
+
+			return nil
+		})
+	}


The operator is already exposing a metrics server in prom format. Let's use it instead of starting another one, you can register more metrics on that server, for example, that's what we do for KEDA -> https://github.com/kedacore/keda/blob/main/pkg/metricscollector/prommetrics.go#L161

JorTurFer · 2025-08-28T18:21:48Z

operator/metrics/otelmetrics.go

+func NewOtelMetrics(options ...metric.Option) *OtelMetrics {
+	ctx := context.Background()
+
+	exporter, err := otlpmetrichttp.New(ctx)


Opentelemetry supports HTTP and gRCP protocols, I prefer if we support both tbh. This is how we do it in KEDA, you can use it as example https://github.com/kedacore/keda/blob/main/pkg/metricscollector/opentelemetry.go#L67-L86

JorTurFer · 2025-08-28T18:24:40Z

operator/metrics/otelmetrics.go

+	provider := metric.NewMeterProvider(options...)
+	meter := provider.Meter(meterName)
+
+	httpScaledObjectCounter, err := meter.Int64UpDownCounter("operator_http_scaled_object_count", api.WithDescription("a counter of http_scaled_objects processed by the operator"))


in opentelemetry context, dots are used instead of underscores

Suggested change

httpScaledObjectCounter, err := meter.Int64UpDownCounter("operator_http_scaled_object_count", api.WithDescription("a counter of http_scaled_objects processed by the operator"))

httpScaledObjectCounter, err := meter.Int64UpDownCounter("keda.http.scaled.object.count", api.WithDescription("a counter of HttpScaledObjects processed by the operator"))

JorTurFer · 2025-08-28T18:26:00Z

operator/metrics/prommetrics.go

+	)
+	meter := provider.Meter(meterName)
+
+	httpScaledObjectCounter, err := meter.Int64UpDownCounter("operator_http_scaled_object_count", api.WithDescription("a counter of http_scaled_objects processed by the operator"))


Suggested change

httpScaledObjectCounter, err := meter.Int64UpDownCounter("operator_http_scaled_object_count", api.WithDescription("a counter of http_scaled_objects processed by the operator"))

httpScaledObjectCounter, err := meter.Int64UpDownCounter("keda_http_scaled_object_total", api.WithDescription("a counter of http_scaled_objects processed by the operator"))

JorTurFer · 2025-08-28T18:28:47Z

tests/checks/operator_otel_metrics/operator_otel_metrics_test.go

+	assert.True(t, ok, "operator_http_scaled_object_count_total is available")
+
+	requestCount := getMetricsValue(val)
+	assert.GreaterOrEqual(t, requestCount, float64(1))


That's a good point, meanwhile we don't execute sequential tests (I'd like to avoid it tbh, but maybe we have to 🤔)
I guess that we can just execute both checks at once in the same tests, something like:

spawn 20 HTTPScaledObjects

check prometheus

check otel

remove the HTTPScaledObjects

check prometheus

check otel

JorTurFer · 2025-08-28T18:29:12Z

Thanks a lot for the contribution!

elieser1101 added 3 commits August 24, 2025 01:14

basic instrumentation for the operator

4e8745b

Signed-off-by: Elieser Pereira <elieser.pereiraa@gmail.com>

add e2e tests and update config/operator to suport e2e

ebf0358

Signed-off-by: Elieser Pereira <elieser.pereiraa@gmail.com>

document how to use prometheus and otel metrics. change matric type t…

af078c4

…o fit meaning Signed-off-by: Elieser Pereira <elieser.pereiraa@gmail.com>

wozniakjan reviewed Aug 28, 2025

View reviewed changes

JorTurFer reviewed Aug 28, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Instrument http add on operator #1328

Instrument http add on operator #1328

Uh oh!

elieser1101 commented Aug 24, 2025 •

edited

Loading

Uh oh!

elieser1101 commented Aug 25, 2025

Uh oh!

wozniakjan Aug 28, 2025

Uh oh!

wozniakjan Aug 28, 2025

Uh oh!

JorTurFer Aug 28, 2025

Uh oh!

JorTurFer Aug 28, 2025

Uh oh!

JorTurFer Aug 28, 2025

Uh oh!

JorTurFer Aug 28, 2025

Uh oh!

JorTurFer Aug 28, 2025

Uh oh!

JorTurFer Aug 28, 2025

Uh oh!

JorTurFer Aug 28, 2025

Uh oh!

JorTurFer Aug 28, 2025

Uh oh!

JorTurFer Aug 28, 2025

Uh oh!

JorTurFer commented Aug 28, 2025

Uh oh!

Uh oh!

	When configured, the ioperator can export metrics to a OTEL HTTP collector.
	When configured, the operator can export metrics to a OTEL HTTP collector.

	The OTEL exporter can be enabled by setting the `OTEL_EXPORTER_OTLP_METRICS_ENABLED` environment variable to `true` on the operator deployment (`false` by default). When enabled the `OTEL_EXPORTER_OTLP_ENDPOINT` environment variable must also be configured so the exporter knows what collector to send the metrics to (e.g. http://opentelemetry-collector.open-telemetry-system:4318).
	The OTEL exporter can be enabled by setting the `OTEL_EXPORTER_OTLP_METRICS_ENABLED` environment variable to `true` on the operator deployment (`false` by default). When enabled, the `OTEL_EXPORTER_OTLP_ENDPOINT` environment variable must also be configured so the exporter knows what collector to send the metrics to (e.g. http://opentelemetry-collector.open-telemetry-system:4318).

	httpScaledObjectCounter, err := meter.Int64UpDownCounter("operator_http_scaled_object_count", api.WithDescription("a counter of http_scaled_objects processed by the operator"))
	httpScaledObjectCounter, err := meter.Int64UpDownCounter("keda.http.scaled.object.count", api.WithDescription("a counter of HttpScaledObjects processed by the operator"))

Instrument http add on operator #1328

Are you sure you want to change the base?

Instrument http add on operator #1328

Uh oh!

Conversation

elieser1101 commented Aug 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Uh oh!

elieser1101 commented Aug 25, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JorTurFer commented Aug 28, 2025

Uh oh!

Uh oh!

elieser1101 commented Aug 24, 2025 •

edited

Loading