Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discard unused body of Unary and ClientStream methods in background #5331

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

paskozdilar
Copy link
Contributor

@paskozdilar paskozdilar commented Mar 6, 2025

When io.Copy is called synchronously on HTTP request body, it stalls the clients who keep the request stream open.
This broke WebSocket gateway implementations which rely on Close detection to know when to terminate.

This PR runs the io.Copy method in the background.

References to other Issues or PRs

Fixes #5326.

Have you read the Contributing Guidelines?

Yes.

Brief description of what is fixed or changed

Added go keyword in front of io.Copy call in the template when body annotation is not set.

Other comments

@paskozdilar
Copy link
Contributor Author

Forgot to regenerate files. Doing it now.

@paskozdilar
Copy link
Contributor Author

Missing import for sync.WaitGroup.

I don't feel like tinkering with the import machinery - and benchmarks don't show any significant slowdown when using channels instead.

I'll update the implementation to use plain channels.

@paskozdilar
Copy link
Contributor Author

I'll try to write an integration test for this too.

@paskozdilar
Copy link
Contributor Author

paskozdilar commented Mar 7, 2025

Apparently, transactional request-response flow is inherent to Go HTTP, including grpc-gateway runtime.
As such, it is impossible to "stream" data via HTTP client as we would with a WebSocket wrapper, so it's impossible to write an integration test for this without implementing a WebSocket gateway inside the server.

@johanbrandhorst
Would it be acceptable to add a minimal WebSocket wrapper inside the integration test server?

@johanbrandhorst
Copy link
Collaborator

Sure, it'd be great to know if we break the websocket proxy again in the future, why not!

@paskozdilar
Copy link
Contributor Author

I've opened a repository for the websocket proxy itself:
https://github.com/paskozdilar/grpc-gateway-websocket

It's fairly covered with tests, which fail on io.Copy and pass on go io.Copy.
I'll use this code as reference.

@paskozdilar
Copy link
Contributor Author

Please see: #5338

@paskozdilar
Copy link
Contributor Author

Integration tests are failing even with the new changes :) good thing we decided to include them.

Unfortunately, seems like the very act of waiting for io.Copy() to finish makes the request hang on the client side - even when gRPC method returns, the connection never gets disconnected until client closes the connection.

This is certainly not desirable behavior - we want clients to be disconnected when server returns. That only leaves us with go io.Copy as an option.

I will revert the commits replacing the go io.Copy with blocking version - tests should then pass.

@johanbrandhorst
Copy link
Collaborator

Is it only affecting server side streaming methods, or bi-directional methods too?

@paskozdilar
Copy link
Contributor Author

It happens on Unary methods and Server Stream methods (i.e. non-client-stream methods).

Bidirectional and Client Stream methods don't use this template branch, so the problem doesn't apply - and neither does the root issue in which gateway doesn't read client body...


In other news, I'm having trouble with properly testing WebSocket context cancellation:

  1. in case of Unary and ServerStream methods, closing the WS connection should cancel the context of the method
  2. in case of ClientStream and Bidirectional methods, closing the WS connection should send io.EOF to the gRPC stream

Since WebSocket proxy layer cannot possibly know which method is going to be executed, we cannot decide what to do on WS close message:

  1. if we close the request pipe, Unary and ServerStream methods will treat that as EOF of request and still execute
  2. if we cancel the request context, ClientStream and Bidirectional methods will fail immediately and not return the response.

In an ideal world, grpc-gateway would generate the WebSocket proxy itself, and it would generate two different implementations depending on the method type. Maybe that would be a cool feature in the future?


So yeah, this is going to be a hairy one. I'll try to figure something out.

@johanbrandhorst
Copy link
Collaborator

I think @tmc has wanted to integrate the websocket proxy into the gateway for a long time but we just haven't decided the best way forward. I think it would be a bigger task than this issue. If we have to choose between supporting the websocket proxy and removing a bug in the gateway (#5236), then we're going to fix the bug. It might be good enough that we support the websocket proxy only for bi-directional streaming methods?

@paskozdilar
Copy link
Contributor Author

paskozdilar commented Mar 19, 2025

Eh, you're right. I'm way too deep in the rabbit hole.
I'll take a simpler approach - I'll just use tmc/grpc-gateway-websocket in the first place, and only add tests for context hanging from the bug report. That would be just no-body unary and server-stream methods (which were affected by the original change).

So I can use both 1) coder/websocket library, as suggested, and 2) your suggestion to wait for io.Copy at end of the request.
The fact that it doesn't quite work with my implemetation of websocket proxy is fault of my implementation.

For any further bugs and features, I'll open new issues and possibly further PRs.
I'd love to give full websocket integration a try :)

@paskozdilar
Copy link
Contributor Author

Will test how this works with https://github.com/jnis23/grpc-gateway-proxy-example provided by the bug report.

@paskozdilar
Copy link
Contributor Author

paskozdilar commented Mar 19, 2025

From example provided by the bug report, even "working" server-stream (with body) doesn't properly close websocket methods.
The problem is the same - after the request is unmarshalled, the body remains unread, so the EOF never reaches server.

And since server-stream request needs to return before request body is closed (in case of websockets), we should add go io.Copy there as well, after unmarshalling request, regardless of whether or not body: "*" is defined. The one with deferred wait won't work...

I'll add that to templates and integration test as well...

@paskozdilar
Copy link
Contributor Author

Integration tests are failing on integration-tests branch, bazel workflows are passing on main, and my local tests show that tmc/grpc-gateway-websocket proxy works with these changes.
As far as I'm concerned, I think this is ready for merge.

Let me know if there's anything else I need to do.

{{- end }}
go io.Copy(io.Discard, req.Body)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to do this unconditionally? I thought we were only doing this on body-less endpoints.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

One of the issues that I've noticed in the https://github.com/jnis23/grpc-gateway-proxy-example is that even the "working" example (the one with "body" defined) seems to have issues with closing connections.
After adding the go io.Copy to those too, the issue disappeared.

In both Unary and ServerStream methods, client sends a finite amount of data before server stops reading the req.Body (zero if body is not defined in proto, request-message-length if body is defined in proto).
In both cases, not reading the req.Body further causes the connections to hang.
Since server does not care for any input after those, we can safely do go io.Copy on the request body.

Here's a screen recording of the issue:

report.mp4

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just had a crazy thought - what about if we defer req.Body.Close? How does that change the behavior?

@paskozdilar paskozdilar changed the title Discard body of body-less methods in background Discard unused body of Unary and ClientStream methods in background Mar 21, 2025
@paskozdilar
Copy link
Contributor Author

I've modified the PR name to better align with changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

v2.26.2 breaks grpc-websocket-proxy
2 participants