Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[profiling] Reduce copying and allocation in exporter #926

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

danielsn
Copy link
Contributor

What does this PR do?

Passes the EncodedProfile as a Rust object, rather than forcing it through the C-FFI straw.

Motivation

We ended up having to make a new Vec and copy the bytes from the pprof in, even though they were already there. This saves the allocation and copy (which can be several MB).

Additional Notes

Anything else we should know when reviewing?

How to test the change?

Existing tests

@github-actions github-actions bot added the profiling Relates to the profiling* modules. label Mar 13, 2025
@pr-commenter
Copy link

pr-commenter bot commented Mar 13, 2025

Benchmarks

Comparison

Benchmark execution time: 2025-03-14 21:26:48

Comparing candidate commit f85951f in PR branch dsn/exporter_avoid_copy with baseline commit b39c6ee in branch main.

Found 0 performance improvements and 1 performance regressions! Performance is the same for 51 metrics, 2 unstable metrics.

scenario:redis/obfuscate_redis_string

  • 🟥 execution_time [+5.047µs; +5.563µs] or [+15.125%; +16.671%]

Candidate

Candidate benchmark details

Group 1

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz f85951f 1741986835 dsn/exporter_avoid_copy
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
ip_address/quantize_peer_ip_address_benchmark execution_time 4.914µs 4.991µs ± 0.041µs 4.979µs ± 0.031µs 5.023µs 5.060µs 5.063µs 5.066µs 1.76% 0.361 -1.163 0.81% 0.003µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
ip_address/quantize_peer_ip_address_benchmark execution_time [4.985µs; 4.997µs] or [-0.113%; +0.113%] None None None

Group 2

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz f85951f 1741986835 dsn/exporter_avoid_copy
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
benching deserializing traces from msgpack to their internal representation execution_time 54.418ms 54.626ms ± 0.202ms 54.584ms ± 0.063ms 54.647ms 54.980ms 55.398ms 56.240ms 3.03% 4.088 23.746 0.37% 0.014ms 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
benching deserializing traces from msgpack to their internal representation execution_time [54.598ms; 54.654ms] or [-0.051%; +0.051%] None None None

Group 3

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz f85951f 1741986835 dsn/exporter_avoid_copy
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
two way interface execution_time 17.703µs 26.105µs ± 10.995µs 18.061µs ± 0.298µs 34.847µs 44.856µs 46.611µs 90.947µs 403.55% 1.681 5.461 42.01% 0.777µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
two way interface execution_time [24.581µs; 27.629µs] or [-5.837%; +5.837%] None None None

Group 4

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz f85951f 1741986835 dsn/exporter_avoid_copy
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
credit_card/is_card_number/ execution_time 3.897µs 3.914µs ± 0.004µs 3.914µs ± 0.001µs 3.915µs 3.918µs 3.919µs 3.948µs 0.87% 5.091 43.911 0.10% 0.000µs 1 200
credit_card/is_card_number/ throughput 253305407.634op/s 255492424.699op/s ± 260748.388op/s 255513138.319op/s ± 83960.657op/s 255595616.536op/s 255700207.676op/s 255858742.550op/s 256607878.765op/s 0.43% -5.030 43.424 0.10% 18437.695op/s 1 200
credit_card/is_card_number/ 3782-8224-6310-005 execution_time 81.677µs 83.020µs ± 0.432µs 83.043µs ± 0.308µs 83.334µs 83.716µs 83.817µs 83.934µs 1.07% -0.284 -0.198 0.52% 0.031µs 1 200
credit_card/is_card_number/ 3782-8224-6310-005 throughput 11914132.386op/s 12045609.723op/s ± 62800.446op/s 12041890.676op/s ± 44668.419op/s 12088704.316op/s 12143457.886op/s 12199167.814op/s 12243337.651op/s 1.67% 0.311 -0.164 0.52% 4440.662op/s 1 200
credit_card/is_card_number/ 378282246310005 execution_time 78.523µs 79.496µs ± 0.393µs 79.443µs ± 0.255µs 79.754µs 80.128µs 80.498µs 80.675µs 1.55% 0.308 -0.072 0.49% 0.028µs 1 200
credit_card/is_card_number/ 378282246310005 throughput 12395401.503op/s 12579569.899op/s ± 62023.246op/s 12587637.685op/s ± 40290.117op/s 12624894.714op/s 12674507.861op/s 12715754.224op/s 12735041.102op/s 1.17% -0.281 -0.092 0.49% 4385.706op/s 1 200
credit_card/is_card_number/37828224631 execution_time 3.896µs 3.914µs ± 0.003µs 3.914µs ± 0.001µs 3.915µs 3.918µs 3.920µs 3.921µs 0.18% -1.271 7.382 0.07% 0.000µs 1 200
credit_card/is_card_number/37828224631 throughput 255049327.838op/s 255501095.976op/s ± 185681.949op/s 255508444.647op/s ± 80934.974op/s 255565699.557op/s 255824843.829op/s 255960373.402op/s 256680421.924op/s 0.46% 1.288 7.491 0.07% 13129.696op/s 1 200
credit_card/is_card_number/378282246310005 execution_time 75.368µs 76.557µs ± 0.432µs 76.564µs ± 0.326µs 76.869µs 77.231µs 77.344µs 77.486µs 1.20% -0.277 -0.352 0.56% 0.031µs 1 200
credit_card/is_card_number/378282246310005 throughput 12905568.576op/s 13062507.074op/s ± 73810.616op/s 13061043.496op/s ± 55612.892op/s 13118247.491op/s 13194095.342op/s 13243937.606op/s 13268295.054op/s 1.59% 0.303 -0.326 0.56% 5219.199op/s 1 200
credit_card/is_card_number/37828224631000521389798 execution_time 51.337µs 51.436µs ± 0.037µs 51.435µs ± 0.022µs 51.457µs 51.511µs 51.548µs 51.557µs 0.24% 0.522 1.037 0.07% 0.003µs 1 200
credit_card/is_card_number/37828224631000521389798 throughput 19396087.401op/s 19441482.863op/s ± 14102.447op/s 19441903.703op/s ± 8480.490op/s 19450891.848op/s 19461385.897op/s 19472039.284op/s 19479028.878op/s 0.19% -0.516 1.029 0.07% 997.194op/s 1 200
credit_card/is_card_number/x371413321323331 execution_time 6.026µs 6.038µs ± 0.006µs 6.038µs ± 0.003µs 6.041µs 6.046µs 6.049µs 6.098µs 0.99% 4.137 39.085 0.10% 0.000µs 1 200
credit_card/is_card_number/x371413321323331 throughput 163978238.820op/s 165604649.786op/s ± 172219.853op/s 165605928.693op/s ± 75097.614op/s 165680790.096op/s 165879988.022op/s 165924761.050op/s 165939196.311op/s 0.20% -4.063 38.211 0.10% 12177.783op/s 1 200
credit_card/is_card_number_no_luhn/ execution_time 3.893µs 3.914µs ± 0.003µs 3.914µs ± 0.001µs 3.915µs 3.919µs 3.920µs 3.922µs 0.22% -1.085 8.625 0.08% 0.000µs 1 200
credit_card/is_card_number_no_luhn/ throughput 254953666.959op/s 255518685.533op/s ± 202574.991op/s 255521491.022op/s ± 96491.703op/s 255623185.028op/s 255747801.883op/s 255987246.065op/s 256847069.577op/s 0.52% 1.108 8.769 0.08% 14324.215op/s 1 200
credit_card/is_card_number_no_luhn/ 3782-8224-6310-005 execution_time 66.027µs 66.328µs ± 0.129µs 66.313µs ± 0.090µs 66.418µs 66.545µs 66.652µs 66.680µs 0.55% 0.318 -0.187 0.19% 0.009µs 1 200
credit_card/is_card_number_no_luhn/ 3782-8224-6310-005 throughput 14997094.437op/s 15076685.822op/s ± 29382.090op/s 15080061.896op/s ± 20352.638op/s 15097878.371op/s 15121341.134op/s 15135484.295op/s 15145228.195op/s 0.43% -0.308 -0.196 0.19% 2077.628op/s 1 200
credit_card/is_card_number_no_luhn/ 378282246310005 execution_time 59.399µs 59.571µs ± 0.070µs 59.562µs ± 0.044µs 59.610µs 59.697µs 59.784µs 59.848µs 0.48% 0.781 1.415 0.12% 0.005µs 1 200
credit_card/is_card_number_no_luhn/ 378282246310005 throughput 16708951.254op/s 16786713.860op/s ± 19665.883op/s 16789128.200op/s ± 12370.565op/s 16800631.704op/s 16812601.974op/s 16825131.787op/s 16835345.809op/s 0.28% -0.771 1.391 0.12% 1390.588op/s 1 200
credit_card/is_card_number_no_luhn/37828224631 execution_time 3.892µs 3.914µs ± 0.003µs 3.913µs ± 0.002µs 3.915µs 3.919µs 3.920µs 3.931µs 0.46% -0.366 11.726 0.08% 0.000µs 1 200
credit_card/is_card_number_no_luhn/37828224631 throughput 254373512.671op/s 255508371.755op/s ± 213920.685op/s 255532922.608op/s ± 99771.699op/s 255617228.601op/s 255757381.632op/s 255916940.591op/s 256912282.246op/s 0.54% 0.401 11.832 0.08% 15126.477op/s 1 200
credit_card/is_card_number_no_luhn/378282246310005 execution_time 56.191µs 56.465µs ± 0.132µs 56.456µs ± 0.079µs 56.529µs 56.727µs 56.843µs 56.868µs 0.73% 0.612 0.307 0.23% 0.009µs 1 200
credit_card/is_card_number_no_luhn/378282246310005 throughput 17584439.391op/s 17710181.154op/s ± 41231.821op/s 17712802.530op/s ± 24706.911op/s 17738992.060op/s 17767595.909op/s 17783850.804op/s 17796373.099op/s 0.47% -0.598 0.283 0.23% 2915.530op/s 1 200
credit_card/is_card_number_no_luhn/37828224631000521389798 execution_time 51.350µs 51.437µs ± 0.031µs 51.435µs ± 0.016µs 51.451µs 51.484µs 51.510µs 51.669µs 0.46% 2.174 15.384 0.06% 0.002µs 1 200
credit_card/is_card_number_no_luhn/37828224631000521389798 throughput 19353895.507op/s 19441449.132op/s ± 11637.633op/s 19442063.623op/s ± 6172.951op/s 19448208.449op/s 19455305.523op/s 19469944.889op/s 19474227.807op/s 0.17% -2.152 15.190 0.06% 822.905op/s 1 200
credit_card/is_card_number_no_luhn/x371413321323331 execution_time 6.027µs 6.038µs ± 0.004µs 6.038µs ± 0.002µs 6.040µs 6.045µs 6.047µs 6.051µs 0.21% -0.020 0.267 0.07% 0.000µs 1 200
credit_card/is_card_number_no_luhn/x371413321323331 throughput 165270500.204op/s 165619331.760op/s ± 111641.387op/s 165619236.674op/s ± 65228.631op/s 165684747.748op/s 165820279.824op/s 165863287.632op/s 165908340.651op/s 0.17% 0.025 0.266 0.07% 7894.238op/s 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
credit_card/is_card_number/ execution_time [3.913µs; 3.915µs] or [-0.014%; +0.014%] None None None
credit_card/is_card_number/ throughput [255456287.480op/s; 255528561.917op/s] or [-0.014%; +0.014%] None None None
credit_card/is_card_number/ 3782-8224-6310-005 execution_time [82.960µs; 83.080µs] or [-0.072%; +0.072%] None None None
credit_card/is_card_number/ 3782-8224-6310-005 throughput [12036906.186op/s; 12054313.261op/s] or [-0.072%; +0.072%] None None None
credit_card/is_card_number/ 378282246310005 execution_time [79.441µs; 79.550µs] or [-0.068%; +0.068%] None None None
credit_card/is_card_number/ 378282246310005 throughput [12570974.073op/s; 12588165.724op/s] or [-0.068%; +0.068%] None None None
credit_card/is_card_number/37828224631 execution_time [3.913µs; 3.914µs] or [-0.010%; +0.010%] None None None
credit_card/is_card_number/37828224631 throughput [255475362.244op/s; 255526829.708op/s] or [-0.010%; +0.010%] None None None
credit_card/is_card_number/378282246310005 execution_time [76.498µs; 76.617µs] or [-0.078%; +0.078%] None None None
credit_card/is_card_number/378282246310005 throughput [13052277.633op/s; 13072736.516op/s] or [-0.078%; +0.078%] None None None
credit_card/is_card_number/37828224631000521389798 execution_time [51.431µs; 51.442µs] or [-0.010%; +0.010%] None None None
credit_card/is_card_number/37828224631000521389798 throughput [19439528.400op/s; 19443437.327op/s] or [-0.010%; +0.010%] None None None
credit_card/is_card_number/x371413321323331 execution_time [6.038µs; 6.039µs] or [-0.014%; +0.014%] None None None
credit_card/is_card_number/x371413321323331 throughput [165580781.771op/s; 165628517.802op/s] or [-0.014%; +0.014%] None None None
credit_card/is_card_number_no_luhn/ execution_time [3.913µs; 3.914µs] or [-0.011%; +0.011%] None None None
credit_card/is_card_number_no_luhn/ throughput [255490610.588op/s; 255546760.479op/s] or [-0.011%; +0.011%] None None None
credit_card/is_card_number_no_luhn/ 3782-8224-6310-005 execution_time [66.310µs; 66.346µs] or [-0.027%; +0.027%] None None None
credit_card/is_card_number_no_luhn/ 3782-8224-6310-005 throughput [15072613.747op/s; 15080757.897op/s] or [-0.027%; +0.027%] None None None
credit_card/is_card_number_no_luhn/ 378282246310005 execution_time [59.561µs; 59.581µs] or [-0.016%; +0.016%] None None None
credit_card/is_card_number_no_luhn/ 378282246310005 throughput [16783988.358op/s; 16789439.362op/s] or [-0.016%; +0.016%] None None None
credit_card/is_card_number_no_luhn/37828224631 execution_time [3.913µs; 3.914µs] or [-0.012%; +0.012%] None None None
credit_card/is_card_number_no_luhn/37828224631 throughput [255478724.406op/s; 255538019.105op/s] or [-0.012%; +0.012%] None None None
credit_card/is_card_number_no_luhn/378282246310005 execution_time [56.447µs; 56.483µs] or [-0.032%; +0.032%] None None None
credit_card/is_card_number_no_luhn/378282246310005 throughput [17704466.820op/s; 17715895.488op/s] or [-0.032%; +0.032%] None None None
credit_card/is_card_number_no_luhn/37828224631000521389798 execution_time [51.432µs; 51.441µs] or [-0.008%; +0.008%] None None None
credit_card/is_card_number_no_luhn/37828224631000521389798 throughput [19439836.268op/s; 19443061.996op/s] or [-0.008%; +0.008%] None None None
credit_card/is_card_number_no_luhn/x371413321323331 execution_time [6.037µs; 6.039µs] or [-0.009%; +0.009%] None None None
credit_card/is_card_number_no_luhn/x371413321323331 throughput [165603859.338op/s; 165634804.183op/s] or [-0.009%; +0.009%] None None None

Group 5

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz f85951f 1741986835 dsn/exporter_avoid_copy
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
normalization/normalize_trace/test_trace execution_time 246.127ns 253.037ns ± 9.518ns 249.356ns ± 1.574ns 251.902ns 277.392ns 283.119ns 285.315ns 14.42% 2.172 3.289 3.75% 0.673ns 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
normalization/normalize_trace/test_trace execution_time [251.718ns; 254.356ns] or [-0.521%; +0.521%] None None None

Group 6

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz f85951f 1741986835 dsn/exporter_avoid_copy
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
benching string interning on wordpress profile execution_time 148.744µs 149.641µs ± 0.427µs 149.564µs ± 0.178µs 149.788µs 150.311µs 151.276µs 152.812µs 2.17% 2.963 16.894 0.28% 0.030µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
benching string interning on wordpress profile execution_time [149.582µs; 149.700µs] or [-0.040%; +0.040%] None None None

Group 7

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz f85951f 1741986835 dsn/exporter_avoid_copy
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
redis/obfuscate_redis_string execution_time 37.961µs 38.672µs ± 1.168µs 38.133µs ± 0.066µs 38.268µs 41.233µs 41.272µs 41.986µs 10.10% 1.707 0.975 3.01% 0.083µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
redis/obfuscate_redis_string execution_time [38.510µs; 38.834µs] or [-0.419%; +0.419%] None None None

Group 8

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz f85951f 1741986835 dsn/exporter_avoid_copy
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
normalization/normalize_service/normalize_service/A0000000000000000000000000000000000000000000000000... execution_time 504.741µs 505.804µs ± 0.591µs 505.774µs ± 0.226µs 505.988µs 506.404µs 506.618µs 512.424µs 1.31% 7.026 76.886 0.12% 0.042µs 1 200
normalization/normalize_service/normalize_service/A0000000000000000000000000000000000000000000000000... throughput 1951510.599op/s 1977054.096op/s ± 2291.575op/s 1977165.930op/s ± 883.047op/s 1978146.373op/s 1979253.280op/s 1980289.778op/s 1981215.654op/s 0.20% -6.923 75.387 0.12% 162.039op/s 1 200
normalization/normalize_service/normalize_service/Data🐨dog🐶 繋がっ⛰てて execution_time 452.275µs 453.256µs ± 0.417µs 453.270µs ± 0.312µs 453.548µs 453.928µs 454.327µs 454.427µs 0.26% 0.182 -0.196 0.09% 0.029µs 1 200
normalization/normalize_service/normalize_service/Data🐨dog🐶 繋がっ⛰てて throughput 2200573.712op/s 2206262.881op/s ± 2029.601op/s 2206191.058op/s ± 1520.768op/s 2207761.629op/s 2209491.171op/s 2210391.266op/s 2211045.388op/s 0.22% -0.177 -0.200 0.09% 143.514op/s 1 200
normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&**() Characters execution_time 175.645µs 176.639µs ± 0.351µs 176.629µs ± 0.218µs 176.854µs 177.239µs 177.350µs 177.380µs 0.43% -0.123 -0.133 0.20% 0.025µs 1 200
normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&**() Characters throughput 5637610.286op/s 5661295.396op/s ± 11248.009op/s 5661581.325op/s ± 6999.433op/s 5668399.331op/s 5680023.579op/s 5688703.462op/s 5693308.155op/s 0.56% 0.134 -0.124 0.20% 795.354op/s 1 200
normalization/normalize_service/normalize_service/[empty string] execution_time 37.534µs 37.634µs ± 0.043µs 37.632µs ± 0.026µs 37.660µs 37.706µs 37.744µs 37.771µs 0.37% 0.291 0.376 0.11% 0.003µs 1 200
normalization/normalize_service/normalize_service/[empty string] throughput 26475260.793op/s 26571661.763op/s ± 30110.329op/s 26573160.044op/s ± 18285.812op/s 26590349.275op/s 26617366.569op/s 26640081.394op/s 26642803.802op/s 0.26% -0.284 0.370 0.11% 2129.122op/s 1 200
normalization/normalize_service/normalize_service/test_ASCII execution_time 48.083µs 48.264µs ± 0.239µs 48.105µs ± 0.014µs 48.531µs 48.641µs 48.708µs 49.503µs 2.91% 1.298 2.028 0.49% 0.017µs 1 200
normalization/normalize_service/normalize_service/test_ASCII throughput 20200661.383op/s 20719891.151op/s ± 101756.147op/s 20787753.044op/s ± 5838.513op/s 20792068.123op/s 20794711.014op/s 20795766.647op/s 20797266.645op/s 0.05% -1.264 1.782 0.49% 7195.246op/s 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
normalization/normalize_service/normalize_service/A0000000000000000000000000000000000000000000000000... execution_time [505.722µs; 505.886µs] or [-0.016%; +0.016%] None None None
normalization/normalize_service/normalize_service/A0000000000000000000000000000000000000000000000000... throughput [1976736.506op/s; 1977371.686op/s] or [-0.016%; +0.016%] None None None
normalization/normalize_service/normalize_service/Data🐨dog🐶 繋がっ⛰てて execution_time [453.198µs; 453.313µs] or [-0.013%; +0.013%] None None None
normalization/normalize_service/normalize_service/Data🐨dog🐶 繋がっ⛰てて throughput [2205981.598op/s; 2206544.164op/s] or [-0.013%; +0.013%] None None None
normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&**() Characters execution_time [176.590µs; 176.687µs] or [-0.028%; +0.028%] None None None
normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&**() Characters throughput [5659736.530op/s; 5662854.262op/s] or [-0.028%; +0.028%] None None None
normalization/normalize_service/normalize_service/[empty string] execution_time [37.628µs; 37.640µs] or [-0.016%; +0.016%] None None None
normalization/normalize_service/normalize_service/[empty string] throughput [26567488.761op/s; 26575834.765op/s] or [-0.016%; +0.016%] None None None
normalization/normalize_service/normalize_service/test_ASCII execution_time [48.231µs; 48.297µs] or [-0.068%; +0.068%] None None None
normalization/normalize_service/normalize_service/test_ASCII throughput [20705788.728op/s; 20733993.574op/s] or [-0.068%; +0.068%] None None None

Group 9

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz f85951f 1741986835 dsn/exporter_avoid_copy
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
tags/replace_trace_tags execution_time 2.330µs 2.390µs ± 0.022µs 2.383µs ± 0.012µs 2.409µs 2.431µs 2.436µs 2.438µs 2.33% -0.193 0.179 0.93% 0.002µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
tags/replace_trace_tags execution_time [2.387µs; 2.393µs] or [-0.130%; +0.130%] None None None

Group 10

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz f85951f 1741986835 dsn/exporter_avoid_copy
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
concentrator/add_spans_to_concentrator execution_time 6.095ms 6.106ms ± 0.011ms 6.105ms ± 0.003ms 6.109ms 6.114ms 6.119ms 6.226ms 1.97% 7.934 82.447 0.17% 0.001ms 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
concentrator/add_spans_to_concentrator execution_time [6.105ms; 6.108ms] or [-0.024%; +0.024%] None None None

Group 11

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz f85951f 1741986835 dsn/exporter_avoid_copy
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
write only interface execution_time 1.178µs 3.182µs ± 1.416µs 2.993µs ± 0.030µs 3.014µs 3.624µs 13.872µs 14.834µs 395.70% 7.386 55.549 44.39% 0.100µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
write only interface execution_time [2.986µs; 3.378µs] or [-6.167%; +6.167%] None None None

Group 12

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz f85951f 1741986835 dsn/exporter_avoid_copy
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
normalization/normalize_name/normalize_name/Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Lo... execution_time 208.583µs 209.038µs ± 0.211µs 209.022µs ± 0.119µs 209.148µs 209.354µs 209.525µs 210.689µs 0.80% 2.451 17.416 0.10% 0.015µs 1 200
normalization/normalize_name/normalize_name/Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Lo... throughput 4746332.898op/s 4783829.662op/s ± 4819.299op/s 4784180.628op/s ± 2720.814op/s 4786626.962op/s 4790060.879op/s 4793073.600op/s 4794256.481op/s 0.21% -2.411 17.033 0.10% 340.776op/s 1 200
normalization/normalize_name/normalize_name/bad-name execution_time 18.233µs 18.322µs ± 0.042µs 18.313µs ± 0.023µs 18.347µs 18.392µs 18.423µs 18.566µs 1.38% 1.230 4.912 0.23% 0.003µs 1 200
normalization/normalize_name/normalize_name/bad-name throughput 53860947.328op/s 54580630.867op/s ± 124270.401op/s 54606243.114op/s ± 68944.921op/s 54654320.608op/s 54749431.520op/s 54840349.585op/s 54845533.776op/s 0.44% -1.193 4.691 0.23% 8787.244op/s 1 200
normalization/normalize_name/normalize_name/good execution_time 10.689µs 10.750µs ± 0.035µs 10.750µs ± 0.019µs 10.769µs 10.804µs 10.826µs 10.951µs 1.87% 1.044 4.633 0.32% 0.002µs 1 200
normalization/normalize_name/normalize_name/good throughput 91316246.425op/s 93021537.773op/s ± 300813.650op/s 93024720.248op/s ± 160751.504op/s 93181898.428op/s 93482910.812op/s 93539269.966op/s 93549947.036op/s 0.56% -0.991 4.326 0.32% 21270.737op/s 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
normalization/normalize_name/normalize_name/Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Lo... execution_time [209.008µs; 209.067µs] or [-0.014%; +0.014%] None None None
normalization/normalize_name/normalize_name/Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Lo... throughput [4783161.754op/s; 4784497.571op/s] or [-0.014%; +0.014%] None None None
normalization/normalize_name/normalize_name/bad-name execution_time [18.316µs; 18.327µs] or [-0.032%; +0.032%] None None None
normalization/normalize_name/normalize_name/bad-name throughput [54563408.184op/s; 54597853.549op/s] or [-0.032%; +0.032%] None None None
normalization/normalize_name/normalize_name/good execution_time [10.745µs; 10.755µs] or [-0.045%; +0.045%] None None None
normalization/normalize_name/normalize_name/good throughput [92979847.894op/s; 93063227.651op/s] or [-0.045%; +0.045%] None None None

Group 13

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz f85951f 1741986835 dsn/exporter_avoid_copy
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
sql/obfuscate_sql_string execution_time 66.337µs 66.583µs ± 0.222µs 66.523µs ± 0.073µs 66.619µs 66.909µs 67.233µs 68.638µs 3.18% 4.742 36.677 0.33% 0.016µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
sql/obfuscate_sql_string execution_time [66.552µs; 66.614µs] or [-0.046%; +0.046%] None None None

Baseline

Omitted due to size.

@danielsn danielsn force-pushed the dsn/exporter_avoid_copy branch from 8945950 to 0a70e3e Compare March 13, 2025 21:52
@danielsn danielsn changed the title DRAFT [profiling] Reduce copying and allocation in exporter [profiling] Reduce copying and allocation in exporter Mar 13, 2025
@danielsn danielsn force-pushed the dsn/exporter_avoid_copy branch from 0a70e3e to dfa6eb3 Compare March 13, 2025 21:53
@danielsn danielsn force-pushed the dsn/exporter_avoid_copy branch from dfa6eb3 to 606d179 Compare March 13, 2025 21:54
@danielsn danielsn marked this pull request as ready for review March 13, 2025 21:54
@danielsn danielsn requested review from a team as code owners March 13, 2025 21:54
@codecov-commenter
Copy link

codecov-commenter commented Mar 13, 2025

Codecov Report

Attention: Patch coverage is 80.64516% with 18 lines in your changes missing coverage. Please review.

Project coverage is 72.79%. Comparing base (b39c6ee) to head (f85951f).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #926   +/-   ##
=======================================
  Coverage   72.79%   72.79%           
=======================================
  Files         334      334           
  Lines       50916    50882   -34     
=======================================
- Hits        37062    37041   -21     
+ Misses      13854    13841   -13     
Components Coverage Δ
crashtracker 42.88% <ø> (+0.02%) ⬆️
crashtracker-ffi 6.25% <ø> (ø)
datadog-alloc 98.73% <ø> (ø)
data-pipeline 91.81% <ø> (ø)
data-pipeline-ffi 90.28% <ø> (ø)
ddcommon 81.93% <ø> (+0.55%) ⬆️
ddcommon-ffi 67.57% <ø> (+1.46%) ⬆️
ddtelemetry 61.87% <ø> (ø)
ddtelemetry-ffi 22.46% <ø> (ø)
dogstatsd 89.70% <ø> (ø)
dogstatsd-client 82.57% <ø> (ø)
ipc 82.51% <ø> (+0.10%) ⬆️
profiling 81.74% <80.64%> (-0.21%) ⬇️
profiling-ffi 68.86% <56.09%> (-1.82%) ⬇️
serverless 0.00% <ø> (ø)
sidecar 40.97% <ø> (ø)
sidecar-ffi 1.17% <ø> (ø)
spawn-worker 54.37% <ø> (ø)
tinybytes 91.24% <ø> (ø)
trace-mini-agent 74.66% <ø> (ø)
trace-normalization 98.24% <ø> (ø)
trace-obfuscation 96.00% <ø> (ø)
trace-protobuf 78.13% <ø> (ø)
trace-utils 93.11% <ø> (ø)
🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gave it a pass!

Overall I like this PR less about the copying/allocation reduction and more as I think it's very useful to have a direct link between the profile and the exporter. In the past for instance we've had to expose the ProfiledEndpointsStats because it was not encoded in the pprof. With this change, we can trivially report more things (metrics? other info?) that also don't get encoded in the pprof.

Comment on lines -723 to -728
pub struct EncodedProfile {
start: Timespec,
end: Timespec,
buffer: ddcommon_ffi::Vec<u8>,
endpoints_stats: Box<ProfiledEndpointsStats>,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm removing this leaves us in a weird in-between state -- I did find it useful to have the start/end here because in some cases I relied on libdatadog's defaults.

Providing start/end time is currently optional on a number of our APIs, so I think if we remove them here I think it's worth going ahead and also make them non-optional on the other APIs -- particularly in ddog_prof_Profile_serialize (make non-optional) and possibly in ddog_prof_Profile_reset/ddog_prof_Profile_new (remove them?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd (mildly) rather do that on another PR so we can figure out the API we want, and when to use the current time vs requiring a time be passed in.

Comment on lines 163 to 171
pub fn build(
&self,
start: DateTime<Utc>,
end: DateTime<Utc>,
profile: EncodedProfile,
files_to_compress_and_export: &[File],
files_to_export_unmodified: &[File],
additional_tags: Option<&Vec<Tag>>,
endpoint_counts: Option<&ProfiledEndpointsStats>,
internal_metadata: Option<serde_json::Value>,
info: Option<serde_json::Value>,
) -> anyhow::Result<Request> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider perhaps making profile optional? In particular, that enables experimentation around e.g. if I don't want to send a profile, or want to post-process the profile before sending it. (It's not a very strong use-case, but making it optional seems quite simple, and it'll be so annoying to otherwise have to create "dummy" profiles just to get around this if it's not)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3786ad1 (#926)

It makes the code a bit uglier, since we have to handle option some places. LMK if you think its worth it

@danielsn danielsn requested a review from a team as a code owner March 14, 2025 19:08
@r1viollet
Copy link
Contributor

r1viollet commented Mar 14, 2025

Artifact Size Benchmark Report

aarch64-alpine-linux-musl
Artifact Baseline Commit Change
/aarch64-alpine-linux-musl/lib/libdatadog_profiling.a 84.34 MB 84.37 MB +.02% (+22.95 KB) 🔍
/aarch64-alpine-linux-musl/lib/libdatadog_profiling.so 8.60 MB 8.60 MB +0% (+224 B) 👌
/aarch64-alpine-linux-musl/lib/libdatadog_profiling.so.debug 26.69 MB 26.70 MB +.04% (+12.29 KB) 🔍
aarch64-apple-darwin
Artifact Baseline Commit Change
/aarch64-apple-darwin/lib/libdatadog_profiling.a 47.51 MB 47.54 MB +.04% (+23.34 KB) 🔍
/aarch64-apple-darwin/lib/libdatadog_profiling.dylib 8.89 MB 8.89 MB +0% (+224 B) 👌
aarch64-unknown-linux-gnu
Artifact Baseline Commit Change
/aarch64-unknown-linux-gnu/lib/libdatadog_profiling.so 8.54 MB 8.54 MB +0% (+224 B) 👌
/aarch64-unknown-linux-gnu/lib/libdatadog_profiling.so.debug 25.35 MB 25.37 MB +.04% (+11.65 KB) 🔍
/aarch64-unknown-linux-gnu/lib/libdatadog_profiling.a 79.00 MB 79.02 MB +.02% (+20.95 KB) 🔍
i686-alpine-linux-musl
Artifact Baseline Commit Change
/i686-alpine-linux-musl/lib/libdatadog_profiling.a 73.13 MB 73.16 MB +.03% (+24.72 KB) 🔍
/i686-alpine-linux-musl/lib/libdatadog_profiling.so 9.15 MB 9.15 MB +0% (+136 B) 👌
/i686-alpine-linux-musl/lib/libdatadog_profiling.so.debug 25.88 MB 25.89 MB +.04% (+12.92 KB) 🔍
i686-unknown-linux-gnu
Artifact Baseline Commit Change
/i686-unknown-linux-gnu/lib/libdatadog_profiling.a 74.82 MB 74.85 MB +.03% (+26.61 KB) 🔍
/i686-unknown-linux-gnu/lib/libdatadog_profiling.so 9.04 MB 9.04 MB +0% (+128 B) 👌
/i686-unknown-linux-gnu/lib/libdatadog_profiling.so.debug 23.53 MB 23.54 MB +.05% (+13.54 KB) 🔍
libdatadog-x64-windows
Artifact Baseline Commit Change
/libdatadog-x64-windows/debug/dynamic/datadog_profiling_ffi.dll 19.25 MB 19.26 MB +.04% (+8.00 KB) 🔍
/libdatadog-x64-windows/debug/dynamic/datadog_profiling_ffi.lib 54.81 KB 55.15 KB +.60% (+340 B) 🔍
/libdatadog-x64-windows/debug/dynamic/datadog_profiling_ffi.pdb 133.75 MB 133.82 MB +.05% (+72.00 KB) 🔍
/libdatadog-x64-windows/debug/static/datadog_profiling_ffi.lib 861.05 MB 861.74 MB +.07% (+701.43 KB) 🔍
/libdatadog-x64-windows/release/dynamic/datadog_profiling_ffi.dll 5.88 MB 5.88 MB +.01% (+1.00 KB) 🔍
/libdatadog-x64-windows/release/dynamic/datadog_profiling_ffi.lib 54.81 KB 55.15 KB +.60% (+340 B) 🔍
/libdatadog-x64-windows/release/dynamic/datadog_profiling_ffi.pdb 17.87 MB 17.87 MB +.04% (+8.00 KB) 🔍
/libdatadog-x64-windows/release/static/datadog_profiling_ffi.lib 30.17 MB 30.17 MB +.02% (+7.27 KB) 🔍
libdatadog-x86-windows
Artifact Baseline Commit Change
/libdatadog-x86-windows/debug/dynamic/datadog_profiling_ffi.dll 16.40 MB 16.40 MB +.04% (+7.00 KB) 🔍
/libdatadog-x86-windows/debug/dynamic/datadog_profiling_ffi.lib 55.66 KB 55.99 KB +.60% (+344 B) 🔍
/libdatadog-x86-windows/debug/dynamic/datadog_profiling_ffi.pdb 135.98 MB 136.23 MB +.18% (+264.00 KB) 🔍
/libdatadog-x86-windows/debug/static/datadog_profiling_ffi.lib 852.94 MB 852.58 MB --.04% (-360.96 KB) 💪
/libdatadog-x86-windows/release/dynamic/datadog_profiling_ffi.dll 4.47 MB 4.47 MB +.03% (+1.50 KB) 🔍
/libdatadog-x86-windows/release/dynamic/datadog_profiling_ffi.lib 55.66 KB 55.99 KB +.60% (+344 B) 🔍
/libdatadog-x86-windows/release/dynamic/datadog_profiling_ffi.pdb 18.44 MB 18.44 MB 0% (0 B) 👌
/libdatadog-x86-windows/release/static/datadog_profiling_ffi.lib 27.70 MB 27.70 MB +.02% (+6.04 KB) 🔍
x86_64-alpine-linux-musl
Artifact Baseline Commit Change
/x86_64-alpine-linux-musl/lib/libdatadog_profiling.a 73.13 MB 73.16 MB +.03% (+24.72 KB) 🔍
/x86_64-alpine-linux-musl/lib/libdatadog_profiling.so 9.15 MB 9.15 MB +0% (+136 B) 👌
/x86_64-alpine-linux-musl/lib/libdatadog_profiling.so.debug 25.88 MB 25.89 MB +.04% (+12.92 KB) 🔍
x86_64-apple-darwin
Artifact Baseline Commit Change
/x86_64-apple-darwin/lib/libdatadog_profiling.a 47.51 MB 47.54 MB +.04% (+23.34 KB) 🔍
/x86_64-apple-darwin/lib/libdatadog_profiling.dylib 8.89 MB 8.89 MB +0% (+224 B) 👌
x86_64-unknown-linux-gnu
Artifact Baseline Commit Change
/x86_64-unknown-linux-gnu/lib/libdatadog_profiling.a 74.82 MB 74.85 MB +.03% (+26.61 KB) 🔍
/x86_64-unknown-linux-gnu/lib/libdatadog_profiling.so 9.04 MB 9.04 MB +0% (+128 B) 👌
/x86_64-unknown-linux-gnu/lib/libdatadog_profiling.so.debug 23.53 MB 23.54 MB +.05% (+13.54 KB) 🔍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
profiling Relates to the profiling* modules.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants