Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crashtracking for Windows #892

Open
wants to merge 53 commits into
base: main
Choose a base branch
from
Open

Conversation

kevingosse
Copy link
Contributor

@kevingosse kevingosse commented Feb 20, 2025

What does this PR do?

Add Windows support for crashtracking.

Motivation

PHP (and probably other languages) have some unique crashes on Windows, so the Linux support isn't enough.

Additional Notes

Our implementation of crashtracking for Windows relies on WER (Windows Error Reporting).

The way it works is: the process calls WerRegisterRuntimeExceptionModule to register the DLL that contains the crash handling code. The DLL must expose three methods:

  • OutOfProcessExceptionEventCallback: gives a chance to analyze the crashing process. Optionally, the crash handler can set ownershipClaimed to true to "claim" the crash. If it does, then the crash handler provide some metadata that will be attached to the crash report in the Windows event log. Since we are only implementing crash tracking for telemetry, we never claim the crash.
  • OutOfProcessExceptionEventSignatureCallback: this method is only called if the crash handler claimed the crash. It's used to provide metadata to help triaging the crash.
  • OutOfProcessExceptionEventDebuggerLaunchCallback: this method is only called if the crash handler claimed the crash. It's used to provide the path to a custom debugger that will be used to debug the crashing process.

In addition, the path to the crash handler must be added to the registry, in HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\Windows Error Reporting\RuntimeExceptionHelperModules (requires administrator permissions). Since Windows 10 20H1, it's possible to use HKEY_CURRENT_USER\SOFTWARE\Microsoft\Windows\Windows Error Reporting\RuntimeExceptionHelperModules instead (doesn't require additional permissions in most environments). The only limitation of using HKCU instead of HKLM is that the crash handler can't claim the crash (which we don't really care about).

Ideally, we should add the key to HKEY_LOCAL_MACHINE during installation. To handle scenarios where it's not possible, the init_crashtracking_windows method looks for the key, and creates one in HKEY_CURRENT_USER if it can't be found.

When the process crashes, Windows suspends it then spawns WerFault.exe. WerFault loads the registered crash handlers in order, and calls OutOfProcessExceptionEventCallback for each of them until the crash is claimed (or none are left). The crashing process is resumed when WerFault terminates (gracefully or abruptly). In theory, crashing in WerFault has no visible consequence on the system, so this is a very safe way to inspect the crashes.

To inspect the crashing process during OutOfProcessExceptionEventCallback, we use the APIs exposed by dbghelp.dll. dbghelp.dll is present on all Windows systems, however the version of the library can change. Because of this, we must be careful about what APIs we call, and favor the oldest ones. Currently we limit ourselves to SymInitializeW (it's unclear what it does exactly, but it's required to be able to walk the callstacks), and StackWalkEx (walks the stack of the given thread). The .NET tracer already relies on those methods for its crashtracking implementation, so we know they're safe to call on the supported versions of Windows (2012 R2 and ulterior).

It's very hard to monitor what happens inside of the crash handler loaded in WerFault.exe, because it's a background process with no window or console. To provide a bit of visibility, we call the Win32 API OutputDebugString, which logs message that can be listened to using a special system-wide debugger (for instance, DebugView).

How to test the change?

test_crash in collector_windows_tests launches a test application and checks that a crash report is generated. The same test application can be used to test manually if needed.

@pr-commenter
Copy link

pr-commenter bot commented Feb 20, 2025

Benchmarks

Comparison

Benchmark execution time: 2025-03-13 16:28:08

Comparing candidate commit 17fe93c in PR branch kevin/crashtracking_windows with baseline commit 9a4a791 in branch main.

Found 1 performance improvements and 0 performance regressions! Performance is the same for 51 metrics, 2 unstable metrics.

scenario:tags/replace_trace_tags

  • 🟩 execution_time [-105.713ns; -98.483ns] or [-4.308%; -4.013%]

Candidate

Candidate benchmark details

Group 1

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz 17fe93c 1741882583 kevin/crashtracking_windows
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
normalization/normalize_trace/test_trace execution_time 244.703ns 254.195ns ± 11.575ns 249.080ns ± 2.137ns 255.716ns 286.873ns 289.918ns 291.201ns 16.91% 2.001 3.034 4.54% 0.819ns 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
normalization/normalize_trace/test_trace execution_time [252.591ns; 255.800ns] or [-0.631%; +0.631%] None None None

Group 2

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz 17fe93c 1741882583 kevin/crashtracking_windows
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
benching deserializing traces from msgpack to their internal representation execution_time 53.897ms 54.489ms ± 0.297ms 54.569ms ± 0.236ms 54.732ms 54.869ms 55.102ms 55.326ms 1.39% -0.047 -0.946 0.54% 0.021ms 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
benching deserializing traces from msgpack to their internal representation execution_time [54.448ms; 54.531ms] or [-0.075%; +0.075%] None None None

Group 3

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz 17fe93c 1741882583 kevin/crashtracking_windows
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
credit_card/is_card_number/ execution_time 3.892µs 3.914µs ± 0.003µs 3.914µs ± 0.002µs 3.916µs 3.919µs 3.921µs 3.922µs 0.22% -1.524 9.303 0.08% 0.000µs 1 200
credit_card/is_card_number/ throughput 254943967.208op/s 255506889.084op/s ± 208373.791op/s 255495809.869op/s ± 102691.886op/s 255596858.438op/s 255841534.417op/s 255950156.947op/s 256912479.119op/s 0.55% 1.546 9.463 0.08% 14734.252op/s 1 200
credit_card/is_card_number/ 3782-8224-6310-005 execution_time 81.935µs 82.477µs ± 0.259µs 82.408µs ± 0.144µs 82.585µs 82.979µs 83.198µs 83.913µs 1.83% 1.591 4.477 0.31% 0.018µs 1 200
credit_card/is_card_number/ 3782-8224-6310-005 throughput 11917163.988op/s 12124769.295op/s ± 37862.131op/s 12134797.918op/s ± 21184.009op/s 12149338.474op/s 12167314.094op/s 12174330.713op/s 12204863.638op/s 0.58% -1.555 4.255 0.31% 2677.257op/s 1 200
credit_card/is_card_number/ 378282246310005 execution_time 76.483µs 77.079µs ± 0.331µs 77.009µs ± 0.211µs 77.279µs 77.757µs 77.945µs 78.170µs 1.51% 0.719 0.220 0.43% 0.023µs 1 200
credit_card/is_card_number/ 378282246310005 throughput 12792555.913op/s 12973939.934op/s ± 55508.934op/s 12985515.334op/s ± 35730.144op/s 13011880.886op/s 13047409.271op/s 13064443.294op/s 13074860.112op/s 0.69% -0.697 0.174 0.43% 3925.074op/s 1 200
credit_card/is_card_number/37828224631 execution_time 3.895µs 3.914µs ± 0.003µs 3.914µs ± 0.001µs 3.916µs 3.919µs 3.921µs 3.921µs 0.18% -1.341 8.767 0.07% 0.000µs 1 200
credit_card/is_card_number/37828224631 throughput 255005633.787op/s 255469563.321op/s ± 190304.006op/s 255468759.425op/s ± 94790.961op/s 255560747.713op/s 255785423.922op/s 255918879.784op/s 256726247.796op/s 0.49% 1.361 8.903 0.07% 13456.525op/s 1 200
credit_card/is_card_number/378282246310005 execution_time 73.061µs 73.990µs ± 0.432µs 73.976µs ± 0.310µs 74.283µs 74.716µs 74.991µs 75.219µs 1.68% 0.273 -0.330 0.58% 0.031µs 1 200
credit_card/is_card_number/378282246310005 throughput 13294483.464op/s 13515772.979op/s ± 78722.643op/s 13517944.075op/s ± 56945.095op/s 13574880.363op/s 13634068.885op/s 13679316.879op/s 13687257.636op/s 1.25% -0.245 -0.349 0.58% 5566.531op/s 1 200
credit_card/is_card_number/37828224631000521389798 execution_time 51.924µs 52.209µs ± 0.104µs 52.211µs ± 0.069µs 52.269µs 52.372µs 52.463µs 52.569µs 0.69% 0.210 0.236 0.20% 0.007µs 1 200
credit_card/is_card_number/37828224631000521389798 throughput 19022493.504op/s 19153843.818op/s ± 38103.141op/s 19153084.522op/s ± 25164.113op/s 19180034.545op/s 19214789.146op/s 19228468.965op/s 19258992.219op/s 0.55% -0.197 0.221 0.20% 2694.299op/s 1 200
credit_card/is_card_number/x371413321323331 execution_time 6.427µs 6.560µs ± 0.067µs 6.556µs ± 0.047µs 6.603µs 6.680µs 6.734µs 6.780µs 3.41% 0.409 -0.096 1.02% 0.005µs 1 200
credit_card/is_card_number/x371413321323331 throughput 147500541.581op/s 152454124.437op/s ± 1559783.638op/s 152532829.253op/s ± 1085301.748op/s 153604690.646op/s 154585830.897op/s 155542386.945op/s 155583472.379op/s 2.00% -0.357 -0.185 1.02% 110293.359op/s 1 200
credit_card/is_card_number_no_luhn/ execution_time 3.897µs 3.914µs ± 0.002µs 3.914µs ± 0.001µs 3.916µs 3.918µs 3.919µs 3.921µs 0.17% -1.875 14.979 0.06% 0.000µs 1 200
credit_card/is_card_number_no_luhn/ throughput 255032041.707op/s 255464288.046op/s ± 150095.821op/s 255473826.707op/s ± 80350.781op/s 255542349.756op/s 255663941.950op/s 255793070.828op/s 256596333.494op/s 0.44% 1.899 15.178 0.06% 10613.377op/s 1 200
credit_card/is_card_number_no_luhn/ 3782-8224-6310-005 execution_time 63.606µs 63.845µs ± 0.114µs 63.828µs ± 0.052µs 63.883µs 64.006µs 64.303µs 64.336µs 0.80% 1.732 5.176 0.18% 0.008µs 1 200
credit_card/is_card_number_no_luhn/ 3782-8224-6310-005 throughput 15543414.092op/s 15663047.265op/s ± 27818.993op/s 15667049.846op/s ± 12784.564op/s 15678744.723op/s 15699902.425op/s 15712375.485op/s 15721798.273op/s 0.35% -1.710 5.091 0.18% 1967.100op/s 1 200
credit_card/is_card_number_no_luhn/ 378282246310005 execution_time 57.745µs 57.878µs ± 0.054µs 57.874µs ± 0.025µs 57.906µs 57.950µs 58.010µs 58.222µs 0.60% 1.362 8.127 0.09% 0.004µs 1 200
credit_card/is_card_number_no_luhn/ 378282246310005 throughput 17175627.902op/s 17277829.724op/s ± 15980.823op/s 17278991.765op/s ± 7524.846op/s 17285788.045op/s 17304579.010op/s 17312652.922op/s 17317368.445op/s 0.22% -1.339 7.980 0.09% 1130.015op/s 1 200
credit_card/is_card_number_no_luhn/37828224631 execution_time 3.894µs 3.914µs ± 0.003µs 3.914µs ± 0.001µs 3.915µs 3.918µs 3.921µs 3.923µs 0.21% -1.772 12.217 0.07% 0.000µs 1 200
credit_card/is_card_number_no_luhn/37828224631 throughput 254937531.325op/s 255486679.751op/s ± 184139.929op/s 255483138.128op/s ± 83832.215op/s 255560523.122op/s 255759812.369op/s 255930332.126op/s 256797907.987op/s 0.51% 1.796 12.396 0.07% 13020.659op/s 1 200
credit_card/is_card_number_no_luhn/378282246310005 execution_time 54.561µs 54.757µs ± 0.194µs 54.676µs ± 0.028µs 54.728µs 55.168µs 55.271µs 55.652µs 1.79% 1.849 2.921 0.35% 0.014µs 1 200
credit_card/is_card_number_no_luhn/378282246310005 throughput 17968680.884op/s 18262646.525op/s ± 64363.152op/s 18289721.177op/s ± 9213.750op/s 18296353.079op/s 18314111.953op/s 18325687.192op/s 18327968.480op/s 0.21% -1.833 2.821 0.35% 4551.162op/s 1 200
credit_card/is_card_number_no_luhn/37828224631000521389798 execution_time 51.934µs 52.167µs ± 0.087µs 52.165µs ± 0.055µs 52.220µs 52.312µs 52.397µs 52.470µs 0.58% 0.311 0.538 0.17% 0.006µs 1 200
credit_card/is_card_number_no_luhn/37828224631000521389798 throughput 19058513.408op/s 19169365.516op/s ± 31890.830op/s 19169879.329op/s ± 20205.172op/s 19189984.268op/s 19219269.049op/s 19232318.283op/s 19255223.982op/s 0.45% -0.299 0.523 0.17% 2255.022op/s 1 200
credit_card/is_card_number_no_luhn/x371413321323331 execution_time 6.432µs 6.561µs ± 0.062µs 6.555µs ± 0.041µs 6.596µs 6.679µs 6.718µs 6.727µs 2.61% 0.483 -0.192 0.94% 0.004µs 1 200
credit_card/is_card_number_no_luhn/x371413321323331 throughput 148664111.144op/s 152419927.235op/s ± 1431907.034op/s 152547416.404op/s ± 964655.304op/s 153512671.713op/s 154462527.101op/s 155144867.474op/s 155473988.537op/s 1.92% -0.438 -0.244 0.94% 101251.117op/s 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
credit_card/is_card_number/ execution_time [3.913µs; 3.914µs] or [-0.011%; +0.011%] None None None
credit_card/is_card_number/ throughput [255478010.481op/s; 255535767.688op/s] or [-0.011%; +0.011%] None None None
credit_card/is_card_number/ 3782-8224-6310-005 execution_time [82.441µs; 82.512µs] or [-0.043%; +0.043%] None None None
credit_card/is_card_number/ 3782-8224-6310-005 throughput [12119521.968op/s; 12130016.623op/s] or [-0.043%; +0.043%] None None None
credit_card/is_card_number/ 378282246310005 execution_time [77.033µs; 77.125µs] or [-0.059%; +0.059%] None None None
credit_card/is_card_number/ 378282246310005 throughput [12966246.930op/s; 12981632.938op/s] or [-0.059%; +0.059%] None None None
credit_card/is_card_number/37828224631 execution_time [3.914µs; 3.915µs] or [-0.010%; +0.010%] None None None
credit_card/is_card_number/37828224631 throughput [255443189.016op/s; 255495937.626op/s] or [-0.010%; +0.010%] None None None
credit_card/is_card_number/378282246310005 execution_time [73.930µs; 74.050µs] or [-0.081%; +0.081%] None None None
credit_card/is_card_number/378282246310005 throughput [13504862.778op/s; 13526683.180op/s] or [-0.081%; +0.081%] None None None
credit_card/is_card_number/37828224631000521389798 execution_time [52.195µs; 52.223µs] or [-0.028%; +0.028%] None None None
credit_card/is_card_number/37828224631000521389798 throughput [19148563.089op/s; 19159124.547op/s] or [-0.028%; +0.028%] None None None
credit_card/is_card_number/x371413321323331 execution_time [6.551µs; 6.569µs] or [-0.142%; +0.142%] None None None
credit_card/is_card_number/x371413321323331 throughput [152237953.427op/s; 152670295.448op/s] or [-0.142%; +0.142%] None None None
credit_card/is_card_number_no_luhn/ execution_time [3.914µs; 3.915µs] or [-0.008%; +0.008%] None None None
credit_card/is_card_number_no_luhn/ throughput [255443486.209op/s; 255485089.883op/s] or [-0.008%; +0.008%] None None None
credit_card/is_card_number_no_luhn/ 3782-8224-6310-005 execution_time [63.829µs; 63.861µs] or [-0.025%; +0.025%] None None None
credit_card/is_card_number_no_luhn/ 3782-8224-6310-005 throughput [15659191.820op/s; 15666902.710op/s] or [-0.025%; +0.025%] None None None
credit_card/is_card_number_no_luhn/ 378282246310005 execution_time [57.870µs; 57.885µs] or [-0.013%; +0.013%] None None None
credit_card/is_card_number_no_luhn/ 378282246310005 throughput [17275614.936op/s; 17280044.512op/s] or [-0.013%; +0.013%] None None None
credit_card/is_card_number_no_luhn/37828224631 execution_time [3.914µs; 3.914µs] or [-0.010%; +0.010%] None None None
credit_card/is_card_number_no_luhn/37828224631 throughput [255461159.728op/s; 255512199.774op/s] or [-0.010%; +0.010%] None None None
credit_card/is_card_number_no_luhn/378282246310005 execution_time [54.730µs; 54.784µs] or [-0.049%; +0.049%] None None None
credit_card/is_card_number_no_luhn/378282246310005 throughput [18253726.411op/s; 18271566.639op/s] or [-0.049%; +0.049%] None None None
credit_card/is_card_number_no_luhn/37828224631000521389798 execution_time [52.155µs; 52.179µs] or [-0.023%; +0.023%] None None None
credit_card/is_card_number_no_luhn/37828224631000521389798 throughput [19164945.754op/s; 19173785.279op/s] or [-0.023%; +0.023%] None None None
credit_card/is_card_number_no_luhn/x371413321323331 execution_time [6.553µs; 6.570µs] or [-0.131%; +0.131%] None None None
credit_card/is_card_number_no_luhn/x371413321323331 throughput [152221478.691op/s; 152618375.778op/s] or [-0.130%; +0.130%] None None None

Group 4

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz 17fe93c 1741882583 kevin/crashtracking_windows
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
write only interface execution_time 1.152µs 3.143µs ± 1.404µs 2.951µs ± 0.027µs 2.980µs 3.623µs 13.671µs 14.791µs 401.20% 7.402 55.798 44.54% 0.099µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
write only interface execution_time [2.949µs; 3.338µs] or [-6.189%; +6.189%] None None None

Group 5

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz 17fe93c 1741882583 kevin/crashtracking_windows
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
concentrator/add_spans_to_concentrator execution_time 6.001ms 6.010ms ± 0.005ms 6.009ms ± 0.002ms 6.011ms 6.016ms 6.027ms 6.042ms 0.54% 2.769 14.942 0.08% 0.000ms 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
concentrator/add_spans_to_concentrator execution_time [6.009ms; 6.010ms] or [-0.011%; +0.011%] None None None

Group 6

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz 17fe93c 1741882583 kevin/crashtracking_windows
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
ip_address/quantize_peer_ip_address_benchmark execution_time 4.911µs 4.991µs ± 0.060µs 4.956µs ± 0.015µs 5.057µs 5.088µs 5.103µs 5.108µs 3.08% 0.654 -1.303 1.20% 0.004µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
ip_address/quantize_peer_ip_address_benchmark execution_time [4.983µs; 4.999µs] or [-0.166%; +0.166%] None None None

Group 7

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz 17fe93c 1741882583 kevin/crashtracking_windows
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
normalization/normalize_name/normalize_name/Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Lo... execution_time 208.640µs 209.115µs ± 0.185µs 209.124µs ± 0.129µs 209.242µs 209.409µs 209.498µs 209.603µs 0.23% -0.047 -0.370 0.09% 0.013µs 1 200
normalization/normalize_name/normalize_name/Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Lo... throughput 4770931.668op/s 4782065.181op/s ± 4237.347op/s 4781860.808op/s ± 2955.786op/s 4785048.955op/s 4789088.095op/s 4791646.336op/s 4792939.042op/s 0.23% 0.052 -0.370 0.09% 299.626op/s 1 200
normalization/normalize_name/normalize_name/bad-name execution_time 18.241µs 18.329µs ± 0.050µs 18.320µs ± 0.032µs 18.355µs 18.416µs 18.482µs 18.505µs 1.01% 0.892 1.288 0.27% 0.004µs 1 200
normalization/normalize_name/normalize_name/bad-name throughput 54038001.185op/s 54558937.697op/s ± 148024.001op/s 54584758.116op/s ± 95623.525op/s 54662705.413op/s 54794893.931op/s 54808944.526op/s 54820476.994op/s 0.43% -0.872 1.236 0.27% 10466.878op/s 1 200
normalization/normalize_name/normalize_name/good execution_time 10.680µs 10.750µs ± 0.029µs 10.750µs ± 0.022µs 10.773µs 10.794µs 10.810µs 10.818µs 0.64% -0.097 -0.631 0.27% 0.002µs 1 200
normalization/normalize_name/normalize_name/good throughput 92438728.650op/s 93027215.323op/s ± 254224.490op/s 93027279.219op/s ± 186913.643op/s 93200648.636op/s 93459015.307op/s 93568944.485op/s 93631794.431op/s 0.65% 0.108 -0.628 0.27% 17976.386op/s 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
normalization/normalize_name/normalize_name/Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Lo... execution_time [209.089µs; 209.141µs] or [-0.012%; +0.012%] None None None
normalization/normalize_name/normalize_name/Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Lo... throughput [4781477.925op/s; 4782652.436op/s] or [-0.012%; +0.012%] None None None
normalization/normalize_name/normalize_name/bad-name execution_time [18.322µs; 18.336µs] or [-0.038%; +0.038%] None None None
normalization/normalize_name/normalize_name/bad-name throughput [54538422.994op/s; 54579452.400op/s] or [-0.038%; +0.038%] None None None
normalization/normalize_name/normalize_name/good execution_time [10.746µs; 10.754µs] or [-0.038%; +0.038%] None None None
normalization/normalize_name/normalize_name/good throughput [92991982.254op/s; 93062448.393op/s] or [-0.038%; +0.038%] None None None

Group 8

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz 17fe93c 1741882583 kevin/crashtracking_windows
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
benching string interning on wordpress profile execution_time 149.582µs 150.497µs ± 0.351µs 150.482µs ± 0.157µs 150.646µs 150.981µs 151.466µs 152.355µs 1.24% 1.284 6.819 0.23% 0.025µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
benching string interning on wordpress profile execution_time [150.448µs; 150.546µs] or [-0.032%; +0.032%] None None None

Group 9

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz 17fe93c 1741882583 kevin/crashtracking_windows
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
two way interface execution_time 17.562µs 26.194µs ± 11.732µs 17.813µs ± 0.118µs 36.120µs 46.745µs 48.923µs 89.775µs 403.99% 1.700 4.978 44.68% 0.830µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
two way interface execution_time [24.568µs; 27.820µs] or [-6.207%; +6.207%] None None None

Group 10

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz 17fe93c 1741882583 kevin/crashtracking_windows
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
redis/obfuscate_redis_string execution_time 33.594µs 34.210µs ± 0.807µs 33.713µs ± 0.055µs 35.063µs 35.710µs 35.929µs 36.860µs 9.34% 1.080 -0.421 2.35% 0.057µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
redis/obfuscate_redis_string execution_time [34.098µs; 34.322µs] or [-0.327%; +0.327%] None None None

Group 11

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz 17fe93c 1741882583 kevin/crashtracking_windows
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
sql/obfuscate_sql_string execution_time 67.493µs 67.707µs ± 0.188µs 67.695µs ± 0.067µs 67.751µs 67.870µs 68.251µs 69.793µs 3.10% 7.234 75.465 0.28% 0.013µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
sql/obfuscate_sql_string execution_time [67.681µs; 67.733µs] or [-0.038%; +0.038%] None None None

Group 12

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz 17fe93c 1741882583 kevin/crashtracking_windows
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
normalization/normalize_service/normalize_service/A0000000000000000000000000000000000000000000000000... execution_time 502.214µs 504.165µs ± 0.884µs 504.160µs ± 0.576µs 504.742µs 505.338µs 506.188µs 510.152µs 1.19% 1.493 9.268 0.17% 0.063µs 1 200
normalization/normalize_service/normalize_service/A0000000000000000000000000000000000000000000000000... throughput 1960198.816op/s 1983482.989op/s ± 3468.464op/s 1983496.319op/s ± 2267.433op/s 1985612.476op/s 1988587.519op/s 1990242.058op/s 1991181.279op/s 0.39% -1.446 8.910 0.17% 245.257op/s 1 200
normalization/normalize_service/normalize_service/Data🐨dog🐶 繋がっ⛰てて execution_time 452.767µs 453.659µs ± 0.734µs 453.619µs ± 0.208µs 453.817µs 454.183µs 454.442µs 462.895µs 2.04% 10.021 124.063 0.16% 0.052µs 1 200
normalization/normalize_service/normalize_service/Data🐨dog🐶 繋がっ⛰てて throughput 2160318.535op/s 2204305.959op/s ± 3507.841op/s 2204491.133op/s ± 1012.419op/s 2205565.303op/s 2207228.458op/s 2208092.839op/s 2208641.043op/s 0.19% -9.897 122.010 0.16% 248.042op/s 1 200
normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&**() Characters execution_time 175.210µs 176.847µs ± 0.402µs 176.902µs ± 0.177µs 177.068µs 177.328µs 177.892µs 178.312µs 0.80% -0.675 3.208 0.23% 0.028µs 1 200
normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&**() Characters throughput 5608150.585op/s 5654643.875op/s ± 12858.818op/s 5652856.306op/s ± 5677.667op/s 5658984.423op/s 5678499.810op/s 5692012.605op/s 5707449.502op/s 0.97% 0.707 3.222 0.23% 909.256op/s 1 200
normalization/normalize_service/normalize_service/[empty string] execution_time 37.542µs 37.652µs ± 0.050µs 37.651µs ± 0.033µs 37.684µs 37.732µs 37.776µs 37.813µs 0.43% 0.283 -0.077 0.13% 0.004µs 1 200
normalization/normalize_service/normalize_service/[empty string] throughput 26445943.492op/s 26558768.708op/s ± 35087.505op/s 26559428.342op/s ± 23192.078op/s 26582926.342op/s 26613202.606op/s 26625497.338op/s 26637151.051op/s 0.29% -0.275 -0.086 0.13% 2481.061op/s 1 200
normalization/normalize_service/normalize_service/test_ASCII execution_time 48.081µs 48.318µs ± 0.256µs 48.257µs ± 0.159µs 48.504µs 48.589µs 48.720µs 50.353µs 4.34% 3.071 20.208 0.53% 0.018µs 1 200
normalization/normalize_service/normalize_service/test_ASCII throughput 19859746.542op/s 20696785.832op/s ± 107757.760op/s 20722330.422op/s ± 68469.891op/s 20787552.415op/s 20791890.450op/s 20794483.687op/s 20798283.844op/s 0.37% -2.877 18.224 0.52% 7619.624op/s 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
normalization/normalize_service/normalize_service/A0000000000000000000000000000000000000000000000000... execution_time [504.043µs; 504.288µs] or [-0.024%; +0.024%] None None None
normalization/normalize_service/normalize_service/A0000000000000000000000000000000000000000000000000... throughput [1983002.293op/s; 1983963.685op/s] or [-0.024%; +0.024%] None None None
normalization/normalize_service/normalize_service/Data🐨dog🐶 繋がっ⛰てて execution_time [453.557µs; 453.760µs] or [-0.022%; +0.022%] None None None
normalization/normalize_service/normalize_service/Data🐨dog🐶 繋がっ⛰てて throughput [2203819.806op/s; 2204792.112op/s] or [-0.022%; +0.022%] None None None
normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&**() Characters execution_time [176.791µs; 176.902µs] or [-0.031%; +0.031%] None None None
normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&**() Characters throughput [5652861.767op/s; 5656425.984op/s] or [-0.032%; +0.032%] None None None
normalization/normalize_service/normalize_service/[empty string] execution_time [37.646µs; 37.659µs] or [-0.018%; +0.018%] None None None
normalization/normalize_service/normalize_service/[empty string] throughput [26553905.918op/s; 26563631.499op/s] or [-0.018%; +0.018%] None None None
normalization/normalize_service/normalize_service/test_ASCII execution_time [48.283µs; 48.353µs] or [-0.073%; +0.073%] None None None
normalization/normalize_service/normalize_service/test_ASCII throughput [20681851.642op/s; 20711720.021op/s] or [-0.072%; +0.072%] None None None

Group 13

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz 17fe93c 1741882583 kevin/crashtracking_windows
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
tags/replace_trace_tags execution_time 2.295µs 2.352µs ± 0.017µs 2.351µs ± 0.005µs 2.356µs 2.386µs 2.391µs 2.398µs 2.02% -0.207 2.330 0.71% 0.001µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
tags/replace_trace_tags execution_time [2.349µs; 2.354µs] or [-0.098%; +0.098%] None None None

Baseline

Omitted due to size.

@codecov-commenter
Copy link

codecov-commenter commented Feb 21, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 72.48%. Comparing base (7f40e14) to head (17fe93c).
Report is 13 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #892      +/-   ##
==========================================
+ Coverage   72.07%   72.48%   +0.41%     
==========================================
  Files         328      333       +5     
  Lines       48891    50097    +1206     
==========================================
+ Hits        35237    36312    +1075     
- Misses      13654    13785     +131     
Components Coverage Δ
crashtracker 42.90% <ø> (+0.02%) ⬆️
crashtracker-ffi 6.25% <ø> (ø)
datadog-alloc 98.73% <ø> (ø)
data-pipeline 92.08% <ø> (-0.13%) ⬇️
data-pipeline-ffi 90.28% <ø> (-0.12%) ⬇️
ddcommon 79.19% <ø> (-0.89%) ⬇️
ddcommon-ffi 61.05% <ø> (ø)
ddtelemetry 61.74% <ø> (ø)
ddtelemetry-ffi 22.46% <ø> (ø)
dogstatsd 89.59% <ø> (ø)
dogstatsd-client 82.57% <ø> (ø)
ipc 82.50% <ø> (-0.14%) ⬇️
profiling 81.94% <ø> (-0.09%) ⬇️
profiling-ffi 70.68% <ø> (ø)
serverless 0.00% <ø> (ø)
sidecar 40.60% <ø> (+0.36%) ⬆️
sidecar-ffi 2.94% <ø> (+2.79%) ⬆️
spawn-worker 54.37% <ø> (ø)
tinybytes 91.21% <ø> (-0.79%) ⬇️
trace-mini-agent 74.66% <ø> (+2.18%) ⬆️
trace-normalization 98.23% <ø> (ø)
trace-obfuscation 96.07% <ø> (+0.10%) ⬆️
trace-protobuf 78.13% <ø> (ø)
trace-utils 92.97% <ø> (-0.32%) ⬇️
🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

let mut path = env::temp_dir().join(process_name);
path.set_extension("dll");

// Attempt to move it just in case it already exists
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would this happen?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we log something here? Is this unexpected?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reused the logic from the trampoline (since the need is the same): https://github.com/DataDog/libdatadog/blob/main/spawn_worker/src/win32.rs#L48

The filename is made of the user SID and the version number, so if multiple instances of PHP are running they will share the same file. I think this is a good thing for crashtracking because we need to add the path to the registry, and I'm afraid we would add a lot of garbage if the path was random.

.file("src/crashtracking_trampoline.cpp") // Path to your C++ file
.warnings(true)
.warnings_into_errors(true)
.flag("/std:c++17") // Set the C++ standard (adjust as needed)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does having a C++ binary increase the size of libdatadog vs C?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably but the size is still reasonable. The only reason I used C++ is because it has regex support in the stdlib. The DLL size is 160 KB, I believe it's acceptable (it was ~60 KB in C with manual parsing).


if (!EnumProcessModules(process, nullptr, 0, &cbNeeded))
{
OutputDebugStringW(L"Failed to enumerate process modules (1st)");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does 1st vs 2nd mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We call EnumProcessModules twice (first to get the number of modules, then to populate them). It's simply to know if we failed in the first or the second call.

fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
write!(
f,
"{:08x}{:04x}{:04x}{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this documented somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is standard guid formatting on Windows, but without the dashes: https://devblogs.microsoft.com/oldnewthing/20220928-00/?p=107221

Comment on lines 651 to 657
let debug_data_dir: IMAGE_DATA_DIRECTORY = if is_pe32 {
let nt_headers32: IMAGE_NT_HEADERS32 = read_memory(process_handle, nt_headers_address)?;
nt_headers32.OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_DEBUG.0 as usize]
} else {
let nt_headers64: IMAGE_NT_HEADERS64 = read_memory(process_handle, nt_headers_address)?;
nt_headers64.OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_DEBUG.0 as usize]
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there documentation for why this is the case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is a good official documentation about the PE format. It's mostly the definitions in the official headers (https://learn.microsoft.com/en-us/windows/win32/api/winnt/ns-winnt-image_nt_headers64) and then a bunch of third-party articles: https://learn.microsoft.com/en-us/archive/msdn-magazine/2002/february/inside-windows-win32-portable-executable-file-format-in-detail https://wiki.osdev.org/PE

For the Rust implementation, I simply converted the C++ code we wrote for crashtracking in .net: https://github.com/DataDog/dd-trace-dotnet/blob/master/profiler/src/ProfilerEngine/Datadog.Profiler.Native.Windows/CrashReportingWindows.cpp#L271
which has proper testing: https://github.com/DataDog/dd-trace-dotnet/blob/master/profiler/test/Datadog.Profiler.Native.Tests/CrashReportingTest.cpp

We probably want to add similar testing in the libdatadog repository.

Comment on lines 726 to 728
if thread_entry.th32OwnerProcessID == pid {
thread_ids.push(thread_entry.th32ThreadID);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We loop over every thread on the machine? Could that be expensive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and yes. But as crazy as it sounds, this is the normal way.

In "recent" versions of windows (since 2012 R2) there is an alternative way that doesn't require to enumerate all threads (using something called "process snapshotting", which can be thought as Windows' sane version of vfork). However we would need to confirm that it works correctly in the context of WER, so that requires additional research. Our implementation of crashtracking in .NET uses CreateToolhelp32Snapshot, so I'd rather rely on this battle-tested solution for now.

Cargo.toml Outdated
Comment on lines 10 to 11
"crashtracker-ffi/tests/test_app",
"crashtracker-ffi/tests/test_app_lib",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need a test app to crash. The lib is because I need a DLL for WER. Maybe it's possible to reference datadog-crashtracker-ffi as a DLL but I didn't find how (when I reference the crate it gets statically linked)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reorganized the tests and remove test_app_lib (but for that I had to add a cdylib crate-type to crashtracker-ffi)

Ok(())
}

unsafe fn output_debug_string(message: &str) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Best practice is to put unsafe on the function if there is a contract the caller must follow, and have an unsafe block inside a "safe" function otherwise.

@kevingosse kevingosse marked this pull request as ready for review March 4, 2025 13:58
@kevingosse kevingosse requested review from a team as code owners March 4, 2025 13:58
Copy link
Contributor

@danielsn danielsn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking pretty good!

@@ -12,10 +12,11 @@ license.workspace = true
bench = false

[features]
default = ["cbindgen", "collector", "demangler", "receiver"]
default = ["cbindgen", "collector", "collector_windows", "demangler", "receiver"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be cleaner if its possible


[[bin]]
name = "test_app"
path = "tests/test_app/src/main.rs"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens with this on Linux?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test application will compile but be empty.

}

fn output_debug_string(message: &str) {
unsafe { OutputDebugStringW(&HSTRING::from(message)) };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Safety comment

"exception_information is null"
);

let process_handle = unsafe { (*exception_information).hProcess };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Safety comments

Comment on lines 29 to 31
unsafe {
*ptr = 42;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to wrap this in blackbox like some of the other tests do to avoid the compiler doing something clever


if open_result == ERROR_SUCCESS {
// Check if the value exists
let query_result = unsafe { RegQueryValueExW(hkey, &name, None, None, None, None) };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

safety comments

} {
let mut frame = StackFrame::new();

frame.ip = Some(format!("{:x}", native_frame.AddrPC.Offset));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to specify a width here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand, why would I? I'm just converting the ip to hex

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think he means to have consistent padding, like always 16 hex chars.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh 🤔 Those values are not meant to be interpreted by a human, so I don't think the padding makes sense


// Force a segfault to crash
let ptr = std::ptr::null_mut::<i32>();
// SAFETY: Don't worry, we are crashing on purpose
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤣

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants