-
Notifications
You must be signed in to change notification settings - Fork 130
Instrument http add on operator #1328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Instrument http add on operator #1328
Conversation
Signed-off-by: Elieser Pereira <elieser.pereiraa@gmail.com>
Signed-off-by: Elieser Pereira <elieser.pereiraa@gmail.com>
…o fit meaning Signed-off-by: Elieser Pereira <elieser.pereiraa@gmail.com>
Hi folks, trying to get some eyes here but not really sure who to tag, thanks for any pointer in advance |
func (r *HTTPScaledObjectReconciler) updatePromMetricsOnDelete(ctx context.Context, scaledObject *httpv1alpha1.HTTPScaledObject, namespacedName string) { | ||
logger := log.FromContext(ctx, "updatePromMetricsOnDelete", namespacedName) | ||
logger.Info("updatePromMetricsOnDelete") | ||
metrics.RecordDeleteHTTPScaledObjectCount(namespacedName) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like this is never called, wouldn't this result in always growing metric?
assert.True(t, ok, "operator_http_scaled_object_count_total is available") | ||
|
||
requestCount := getMetricsValue(val) | ||
assert.GreaterOrEqual(t, requestCount, float64(1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when the whole e2e suite is executed, multiple e2e tests can run in parallel (iirc it could be like 3?). Perhaps the test could create 100 HSOs check if the metric is equal or over 100. And then we may want to delete that 100, wait for the metric to propagate and check again if it's under some reasonably low number, e.g. 10.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point, meanwhile we don't execute sequential tests (I'd like to avoid it tbh, but maybe we have to 🤔)
I guess that we can just execute both checks at once in the same tests, something like:
- spawn 20 HTTPScaledObjects
- check prometheus
- check otel
- remove the HTTPScaledObjects
- check prometheus
- check otel
containerPort: 8080 | ||
containerPort: 2223 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this? Personally, I'd prefer to not change an already existing port
This endpoint can be enabled by setting the `OTEL_PROM_EXPORTER_ENABLED` environment variable to `true` on the operator deployment (`true` by default) and by setting `OTEL_PROM_EXPORTER_PORT` to an unused port for the endpoint to be made avaialble on (`2223` by default). | ||
|
||
### Configuring the OTEL HTTP exporter | ||
When configured, the ioperator can export metrics to a OTEL HTTP collector. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When configured, the ioperator can export metrics to a OTEL HTTP collector. | |
When configured, the operator can export metrics to a OTEL HTTP collector. |
### Configuring the OTEL HTTP exporter | ||
When configured, the ioperator can export metrics to a OTEL HTTP collector. | ||
|
||
The OTEL exporter can be enabled by setting the `OTEL_EXPORTER_OTLP_METRICS_ENABLED` environment variable to `true` on the operator deployment (`false` by default). When enabled the `OTEL_EXPORTER_OTLP_ENDPOINT` environment variable must also be configured so the exporter knows what collector to send the metrics to (e.g. http://opentelemetry-collector.open-telemetry-system:4318). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The OTEL exporter can be enabled by setting the `OTEL_EXPORTER_OTLP_METRICS_ENABLED` environment variable to `true` on the operator deployment (`false` by default). When enabled the `OTEL_EXPORTER_OTLP_ENDPOINT` environment variable must also be configured so the exporter knows what collector to send the metrics to (e.g. http://opentelemetry-collector.open-telemetry-system:4318). | |
The OTEL exporter can be enabled by setting the `OTEL_EXPORTER_OTLP_METRICS_ENABLED` environment variable to `true` on the operator deployment (`false` by default). When enabled, the `OTEL_EXPORTER_OTLP_ENDPOINT` environment variable must also be configured so the exporter knows what collector to send the metrics to (e.g. http://opentelemetry-collector.open-telemetry-system:4318). |
if metricsCfg.OtelPrometheusExporterEnabled { | ||
// start the prometheus compatible metrics server | ||
// serves a prometheus compatible metrics endpoint on the configured port | ||
eg.Go(func() error { | ||
if err := runMetricsServer(ctx, ctrl.Log, metricsCfg); !util.IsIgnoredErr(err) { | ||
setupLog.Error(err, "could not start the Prometheus metrics server") | ||
return err | ||
} | ||
|
||
return nil | ||
}) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The operator is already exposing a metrics server in prom format. Let's use it instead of starting another one, you can register more metrics on that server, for example, that's what we do for KEDA -> https://github.com/kedacore/keda/blob/main/pkg/metricscollector/prommetrics.go#L161
func NewOtelMetrics(options ...metric.Option) *OtelMetrics { | ||
ctx := context.Background() | ||
|
||
exporter, err := otlpmetrichttp.New(ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opentelemetry supports HTTP and gRCP protocols, I prefer if we support both tbh. This is how we do it in KEDA, you can use it as example https://github.com/kedacore/keda/blob/main/pkg/metricscollector/opentelemetry.go#L67-L86
provider := metric.NewMeterProvider(options...) | ||
meter := provider.Meter(meterName) | ||
|
||
httpScaledObjectCounter, err := meter.Int64UpDownCounter("operator_http_scaled_object_count", api.WithDescription("a counter of http_scaled_objects processed by the operator")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in opentelemetry context, dots are used instead of underscores
httpScaledObjectCounter, err := meter.Int64UpDownCounter("operator_http_scaled_object_count", api.WithDescription("a counter of http_scaled_objects processed by the operator")) | |
httpScaledObjectCounter, err := meter.Int64UpDownCounter("keda.http.scaled.object.count", api.WithDescription("a counter of HttpScaledObjects processed by the operator")) |
) | ||
meter := provider.Meter(meterName) | ||
|
||
httpScaledObjectCounter, err := meter.Int64UpDownCounter("operator_http_scaled_object_count", api.WithDescription("a counter of http_scaled_objects processed by the operator")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
httpScaledObjectCounter, err := meter.Int64UpDownCounter("operator_http_scaled_object_count", api.WithDescription("a counter of http_scaled_objects processed by the operator")) | |
httpScaledObjectCounter, err := meter.Int64UpDownCounter("keda_http_scaled_object_total", api.WithDescription("a counter of http_scaled_objects processed by the operator")) |
assert.True(t, ok, "operator_http_scaled_object_count_total is available") | ||
|
||
requestCount := getMetricsValue(val) | ||
assert.GreaterOrEqual(t, requestCount, float64(1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point, meanwhile we don't execute sequential tests (I'd like to avoid it tbh, but maybe we have to 🤔)
I guess that we can just execute both checks at once in the same tests, something like:
- spawn 20 HTTPScaledObjects
- check prometheus
- check otel
- remove the HTTPScaledObjects
- check prometheus
- check otel
Thanks a lot for the contribution! |
Added instrumentation for promethues compatible endpoints and otel.
Added tests and e2e tests.
I went forward this ways as I was trying to get familiar with codebase. Happy to go in any direction reviewers advise.
Checklist
README.md
docs/
directoryPart of #965