Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Adopt native histograms #3165

Merged
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions pkg/internal/controller/metrics/metrics.go
Original file line number Diff line number Diff line change
@@ -17,6 +17,8 @@ limitations under the License.
package metrics

import (
"time"

"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/collectors"
"sigs.k8s.io/controller-runtime/pkg/metrics"
@@ -60,6 +62,9 @@ var (
Help: "Length of time per reconciliation per controller",
Buckets: []float64{0.005, 0.01, 0.025, 0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0,
1.25, 1.5, 1.75, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 40, 50, 60},
NativeHistogramBucketFactor: 1.1,
NativeHistogramMaxBucketNumber: 100,
NativeHistogramMinResetDuration: 1 * time.Hour,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do this, can we please do it for all histograms and not just one? And what was the reasoning behind choosing that NativeHistogramBucketFactor and NativeHistogramMaxBucketNumber?

Copy link
Contributor Author

@krisztianfekete krisztianfekete Mar 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let me also add it for workqueue metrics too!
The values I've set are sensible defaults that most project starts with (us included).

Please see the following references, reasoning:

  • NativeHistogramBucketFactor: https://github.com/prometheus/client_golang/blob/331dfab0cc853dca0242a0d96a80184087a80c1d/prometheus/histogram.go#L405-L429
  • NativeHistogramMaxBucketNumber: I've seen this being set to 100 usually, unless it is desired to mimic Exponential Histograms in OTel closer, in which case this can also be 160.
  • NativeHistogramMinResetDuration: "If at least NativeHistogramMinResetDuration has passed since the last reset of the histogram (which includes the creation of the histogram), the whole histogram is reset, i.e. all buckets are deleted and the sum and count of observations as well as the zero bucket are set to zero. Prometheus handles this as a normal counter reset, which means that some observations will be lost between scrapes, so resetting should happen rarely compared to the scraping interval. Additionally, frequent counter resets might lead to less efficient storage in the TSDB (see the TSDB section for details). A NativeHistogramMinResetDuration of one hour is a value that should work well in most situations." source: https://prometheus.io/docs/specs/native_histograms/#limiting-the-bucket-count

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's one more:

RequestLatency = prometheus.NewHistogramVec(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah, not sure how could I miss that. Thanks, just pushed a commit!

}, []string{"controller"})

// WorkerCount is a prometheus metric which holds the number of