-
Notifications
You must be signed in to change notification settings - Fork 560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Occasional Stripe::APIConnectionError in production #382
Comments
Hey @kgx, So in general, a CA bundle either has our root certificate or it doesn't, and so the bundle itself shouldn't be causing this sort of non-deterministic problem. An intermittent network error should also never manifest as a bad certificate. I haven't seen very many cases like yours so far, but in the ones we have seen, it often turns out to be caused by a router or some other kind of appliance on a user's internal network that steps in to intercept requests under certain circumstances (i.e. an actual MITM thus resulting the error). Can you say pretty confidently that that's not going on in this case? Thanks! |
@brandur , OK good to know. The application is running in a docker container on Google Cloud, so there is some packet forwarding going on to reach outside world, but otherwise there should not be any MITM. |
@brandur I'm getting this error for each request since upgrading from 1.34 to 1.36 and after looking over your commits related to this change, I'm confused why that is. If gem CA bundle is used on my system, it should work before and after the refresh of said bundle between those two gem versions, as both 1.34 and 1.36 should have CA certs for Stripe endpoints, right? What would be the best way to check which CA bundle is actually being used with 1.34 and 1.36? |
Actually last line for my errors is a bit different then one posted above:
Ruby 2.3.0 on Ubuntu 10.04 |
I've been putting a logging statement right around here to see where the path is actually pointed to. From what I've been able to tell though, it seems like even the new versions indeed use the bundled CA file. Is the problem that you're seeing intermittent as well? Or are you able to reproduce it regularly? |
In my case, its not intermittent. All requests fail with same error on 1.36. Logging shows both versions use bundled CA file. And both versions use rest-client 1.6.9. I guess next step is to figure out what CA file rest-client, and more importantly, net/http are using. |
Although, if there is some kind of issue with updated bundled CA file, this is how it would behave. Any idea how can we definitively exclude that option? |
Hi @nnc, since you're able to reproduce this regularly, would you mind helping me narrow down the problem? I've written a small test script here that just iterates through each CA bundle and checks whether rest-client can make a successful request with it: https://github.com/brandur/ca-test Would you mind running it on one of the problematic computers and pasting the output? Instructions are in the repo's README. Sorry about the hassle here, but I'm trying to narrow down whether the problem is related to changes in the library, is machine-specific, or is possibly even related to some kind of change in server configuration. Unfortunately I'm having no luck at all reproducing the problem locally :/ |
And for the record, here's my output on an OSX box:
And here's my output on a Ubuntu box (12.04 LTS):
|
Hi @brandur, thanks for digging into this! This is on Ubuntu 10.04:
I tried it also on Ubuntu 12.04, with same Ruby version, and new cert succeeds there. Let me know if you need me to try anything else on that Ubuntu 10.04 system. |
@kgx Just for completeness, would you mind also running the script above on your Debian environment? I suspect that we're dealing with two different problems here. @nnc, I believe that yours is stemming from being on quite an older version of Ubuntu (and as a result, probably OpenSSL) as well. I don't want to sound to overly prescriptive here, but 10.04's been EOLed for almost a year now and I'd strongly encourage you to upgrade to ensure that you're getting adequate security updates. I'm still not sure what the precise problem is, but there may have been some change in the format of a CRT/PEM that's not compatible with older versions of OpenSSL. For an immediate solution, I'd recommend either pulling our old bundle into your app, or setting it to your system bundle (if it's reasonably up-to-date): Stripe.ca_bundle_path = "/etc/ssl/certs/ca-certificates.crt" @kgx Your issue is still a bit of a mystery, but given that the problem is intermittent and you're on quite a recent version of Debian, it might be something else. |
@nnc And BTW, thanks a lot for running that script! That was very helpful in getting this problem a little more localized. |
@brandur thanks for looking into this. I'll apply one of the workarounds for now. |
@brandur, I just ran the script in production environment and the CA bundles work as expected. Based on the intermittent results, I think we are dealing with either a networking issue (most likely) or a thread safety issue (least likely). We have not seen this SSL error for about 3 days now, but we have had 2 network timeouts while making a request to the Stripe API during this period. I am going to continue to monitor for new occurrences of the SSL error. In the mean time do you have any ideas for things we should look at? |
Ok so we have experienced this error about 15 times since my last post on 2/25. It mostly happens during webhook processing and batch jobs, so the end user impact has not been huge, but it is obviously concerning. I am going to open a new issue with jruby-openssl. |
We've been experiencing this in production when trying to charge customers. Quite the issue |
Its definitely a thread safety issue. JRuby 9.5.0.0 I can recreate consistently on Debian and OS X with the following code:
(If the system is fast enough you may need to increase thread count to recreate) |
OK so this thread-safety issue specifically occurs when using the new CA bundle included in the Stripe gem.
|
@nadavshatz - Can you try the following monkey patch? This seems to be working for me to prevent the problem:
But keep in mind you will be falling back to your system's default CA file, which needs to be adequately up-to-date and defeats the purpose of Stripe bundling one in the gem. |
This attempts to give some semblance of thread safety to parallel library requests by pre-assigning a certificate store, the initialization of which seems to be a weak point for thread safety. Note that while this does seem to empirically improve the situation, it doesn't guarantee actual thread safety. A longer term solution to the problem is probably to assign a per-thread HTTP client for much stronger guarantees, but this is a little further out. More discussion on that topic in #313. Fixes #382.
This attempts to give some semblance of thread safety to parallel library requests by pre-assigning a certificate store, the initialization of which seems to be a weak point for thread safety. Note that while this does seem to empirically improve the situation, it doesn't guarantee actual thread safety. A longer term solution to the problem is probably to assign a per-thread HTTP client for much stronger guarantees, but this is a little further out. More discussion on that topic in #313. Fixes #382.
@kgx Thanks for continuing to dig into this. Nice find on the thread safety problem. So I think that the fundamental trouble here is that calls to rest-client are not thread safe. It's hard to say what exactly the introduction of a new cert bundle changed, but it seems that something like a larger bundle managed to push a previously tenuous situation far enough over the edge to start being problematic. Would you mind trying out the patch I wrote in #397? It basically uses a technique inspired by what your freedom patch above is causing rest-client to do internally by pre-initializing a CA store containing the gem's cert bundle. When I apply it to your test script with one line added to seed its value like so (the # seed a certificate store before starting to make any concurrent requests
Stripe.ca_store
api_key = 'some_api_key'
headers = {
:user_agent => "Stripe/v1 RubyBindings/#{Stripe::VERSION}",
:authorization => "Bearer #{api_key}",
:content_type => 'application/x-www-form-urlencoded'
}
... ... I'm able to successfully avoid the peer validation problems on JRuby. Note that this is definitely a hack rather than an actual guarantee of thread safety, but it should get you back to more or less where you were before the new bundle was introduced. I think that the "proper" solution to the problem is to introduced a configurable HTTP client as described in #313 and then make sure to seed one per thread. Let me know what you think. |
@nadavshatz Can we get more information on your situation please? Are you see total or intermittent failure? Are you also on JRuby? Is there anything else that might help us debug? Thanks! |
@brandur absolutely: We experience this issue in an intermittent fashion. We use Puma as the server which does use threads so I'm hoping that when you find the solution for @kgx it will work for us as well. Let me know if there is anything else I can include to help. |
@nadavshatz Thanks! It sounds like your problem is likely related to @kgx's given the very similar setup. If it's convenient to do so, you may want to try the branch in #397 mentioned above, combined with the addition of a |
Every now and then, after upgrading to the new Stripe gem 1.3.5.1 with the updated ca bundle, a Stripe API call is failing in production with the following error:
The API works fine 99.9% of the time, so it could also be a network error. However all other external APIs are working reliably with SSL.
Our runtime environment contains the following:
Please advise on how I can troubleshoot this further, as I would like to avoid random payment processing issues for users.
The text was updated successfully, but these errors were encountered: