Skip to content

Commit 9462e9c

Browse files
authoredJun 18, 2024··
Adds zero troubleshooting (#1459)
* adds zero troubleshooting, upstream errors * runs prettier
1 parent a6cab4c commit 9462e9c

File tree

2 files changed

+127
-47
lines changed

2 files changed

+127
-47
lines changed
 

‎content/docs/troubleshooting.mdx

+127-47
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,62 @@ import GenerateRecoveryToken from '@site/content/_generate-recovery-token.md';
1414

1515
This article provides troubleshooting information for various tools and features in Pomerium.
1616

17+
## Pomerium Zero
18+
19+
### Configure port 443 to allow inbound access
20+
21+
**Problem**
22+
23+
Whenever you deploy a cluster, the Pomerium Zero cloud sends an inbound request to the cluster on port 443 to establish a secure connection. This is the default behavior. If the port is unavailable (for example, another process is already listening on port 443, or you haven't allowed a non-root process to bind to port 443), Pomerium Zero won't be able to establish a connection to your cluster.
24+
25+
**Solution**
26+
27+
Open the port so that it grants Pomerium inbound access on port 443. (For example, you can do this in Linux systems with the `CAP_NET_BIND_SERVICE` capability.)
28+
29+
If you've reserved port 443 for something else, you can change the port Pomerium sends inbound requests to by specifying a different listening port (like `:8443`) in the [**Address**](/docs/reference/address) field of the Zero Console:
30+
31+
1. Select **Settings**
32+
1. Select **Advanced**
33+
1. Enter the preferred port address
34+
1. Apply your changes
35+
36+
![Changing the default port address for incoming connections in the Zero Console](./troubleshooting/img/zero/zero-change-port-address.png)
37+
38+
:::info
39+
40+
Pomerium Zero also makes several outbound connections to the following `pomerium.app` domains on port `443` to fetch a cluster's configuration and status:
41+
42+
- console.pomerium.app:443
43+
- connect.pomerium.app:443
44+
- telemetry.pomerium.app:443
45+
46+
:::
47+
48+
### Delete a cluster
49+
50+
At some point, you may want to delete a cluster. Currently, you can only delete a cluster if you have multiple clusters.
51+
52+
To delete a cluster:
53+
54+
1. Select the clusters dropdown in the Zero Console navigation bar
55+
1. Select **Manage Clusters**
56+
1. Select the checkbox next to the cluster you want to delete, then select the **Delete** button in the table
57+
1. In the popup, select **Delete** to confirm
58+
59+
### Pomerium Zero loses configuration after upgrading
60+
61+
If you installed Pomerium using the Linux install script during the Pomerium Zero beta, you will need to re-run the install script the first time you upgrade Pomerium. (Subsequent upgrades will not require this step.)
62+
63+
1. First, find your current cluster token: look for a line beginning with `Environment=POMERIUM_ZERO_TOKEN=` in the file `/usr/lib/systemd/system/pomerium.service`.
64+
1. Copy this token into the following command and run it:
65+
66+
```bash
67+
$ curl https://console.pomerium.app/install.bash | \
68+
env POMERIUM_ZERO_TOKEN=<cluster_token> bash -s install
69+
```
70+
71+
---
72+
1773
## Pomerium Core
1874

1975
### JWT Authentication
@@ -204,6 +260,8 @@ $ sudo rm /tmp/pomerium-envoy-admin.sock
204260

205261
Then start Pomerium again.
206262

263+
---
264+
207265
## Pomerium Enterprise
208266

209267
### Generate Recovery Token
@@ -238,53 +296,75 @@ For example, the `administrators` key allows you to specify a list of names, ema
238296

239297
If you wanted to add an email address like `John.Admin@example.com` to the `administrators` file key, Pomerium wouldn't recognize an email like `john.admin@example` because the strings aren't an exact match.
240298

241-
## Envoy error messages
242-
243-
Because Pomerium relies on Envoy to manage HTTP connections, you will notice Envoy connection errors and messages at some point in your logs as you configure Pomerium.
244-
245-
The [Envoy Response Code Details](https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_conn_man/response_code_details.html) provides an exhaustive list of Envoy-related message details.
246-
247-
We've repurposed a truncated version of the Response Code Details list in the table below for your convenience:
248-
249-
| **Name** | **Description** |
250-
| :-- | :-- |
251-
| absolute_path_rejected | The request was rejected due to using an absolute path on a route not supporting them. |
252-
| admin_filter_response | The response was generated by the admin filter. |
253-
| cluster_not_found | The request was rejected by the router filter because there was no cluster found for the selected route. |
254-
| downstream_local_disconnect | The client connection was locally closed for the provided reason. |
255-
| downstream_remote_disconnect | The client disconnected unexpectedly. |
256-
| duration_timeout | The max connection duration was exceeded. |
257-
| direct_response | A direct response was generated by the router filter. |
258-
| filter_added_invalid_request_data | A filter added request data at the wrong stage in the filter chain. |
259-
| filter_added_invalid_response_data | A filter added response data at the wrong stage in the filter chain. |
260-
| filter_chain_not_found | The request was rejected due to no matching filter chain. |
261-
| filter_removed_required_request_headers | The request was rejected in the filter manager because a configured filter removed required request headers. |
262-
| filter_removed_required_response_headers | The response was rejected in the filter manager because a configured filter removed required response headers or these values were invalid (e.g. overflown status). |
263-
| internal_redirect | The original stream was replaced with an internal redirect. |
264-
| low_version | The HTTP/1.0 or HTTP/0.9 request was rejected due to HTTP/1.0 support not being configured. |
265-
| maintenance_mode | The request was rejected by the router filter because the cluster was in maintenance mode. |
266-
| max_duration_timeout | The per-stream max duration timeout was exceeded. |
267-
| missing_host_header | The request was rejected due to a missing Host: or :authority field. |
268-
| missing_path_rejected | The request was rejected due to a missing Path or :path header field. |
269-
| no_healthy_upstream | The request was rejected by the router filter because there was no healthy upstream found. |
270-
| overload | The request was rejected due to the Overload Manager reaching configured resource limits. |
271-
| rejecting_because_detection_failed | The request was rejected because the original IP couldn’t be detected. |
272-
| path_normalization_failed | The request was rejected because path normalization was configured on and failed, probably due to an invalid path. |
273-
| request_headers_failed_strict_check | The request was rejected due to x-envoy-\* headers failing strict header validation. |
274-
| request_overall_timeout | The per-stream total request timeout was exceeded. |
275-
| request_payload_exceeded_retry_buffer_limit | Envoy is doing streaming proxying but too much data arrived while waiting to attempt a retry. |
276-
| request_payload_too_large | Envoy is doing non-streaming proxying and the request payload exceeded configured limits. |
277-
| response_payload_too_large | Envoy is doing non-streaming proxying and the response payload exceeded configured limits. |
278-
| route_configuration_not_found | The request was rejected because there was no route configuration found. |
279-
| route_not_found | The request was rejected because there was no route found. |
280-
| stream_idle_timeout | The per-stream keepalive timeout was exceeded. |
281-
| upgrade_failed | The request was rejected because it attempted an unsupported upgrade. |
282-
| upstream_max_stream_duration_reached | The request was destroyed because of it exceeded the configured max stream duration. |
283-
| upstream_per_try_timeout | The final upstream try timed out. |
284-
| upstream_reset_after_response_started | The upstream connection was reset after a response was started. This may include further details about the cause of the disconnect. |
285-
| upstream_reset_before_response_started | The upstream connection was reset before a response was started This may include further details about the cause of the disconnect. |
286-
| upstream_response_timeout | The upstream response timed out. |
287-
| via_upstream | The response code was set by the upstream. |
299+
---
300+
301+
## Upstream connection errors
302+
303+
Upstream connection errors indicate that something is wrong with the upstream server, not Pomerium. Please refer to the list of errors below to learn more about a specific issue, and how you can resolve it.
304+
305+
:::note
306+
307+
Configuration errors in Pomerium itself can also cause upstream connection errors. In this case, you'd need to debug your Pomerium configuration to resolve the error.
308+
309+
:::
310+
311+
### No healthy upstream
312+
313+
The `no_healthy_upstream` error means that there is an issue with the upstream server that makes it unreachable from Pomerium. The error may be caused by or related to the upstream server's:
314+
315+
- Configuration or application code
316+
317+
**Resolution**: Check that there are no errors in the server's configuration files or application code that prevent it from running as expected.
318+
319+
- Network or firewall settings
320+
321+
**Resolution**: Check your network or firewall settings to make sure your server is reachable.
322+
323+
- DNS records
324+
325+
**Resolution**: This error may be caused by unresolvable DNS records applied to the upstream server. Make sure the server's DNS records are pointing to the correct IP address.
326+
327+
- Failing health checks configured in Pomerium
328+
329+
**Resolution**: If you've configured [Load Balancing Health Checks](/docs/reference/routes/load-balancing#health-checks) in Pomerium, the `no_healthy_upstream` could be the result of a failing health check from an upstream server. Please check the server's configuration for any errors.
330+
331+
### Upstream Max Stream Duration Reached
332+
333+
The `upstream_max_stream_duration_reached` error means that Pomerium cancelled the request because it exceeded the upstream server's maximum stream duration.
334+
335+
**Resolution**: By default, Pomerium sets a 10-second timeout for all requests. If your requests are taking longer than expected, see the [Connections - Timeouts](/docs/internals/connection#timeouts) page to learn how timeouts work with upstream connections, and how to configure timeouts to avoid this error.
336+
337+
### Upstream Per Try Timeout
338+
339+
The `upstream_per_try_timeout` error means that the final attempt to connect to the upstream server timed out.
340+
341+
**Resolution**: See the [Connections - Timeouts](/docs/internals/connection#timeouts) page to learn how timeouts work with upstream connections, and how to configure timeouts in Pomerium to avoid this error.
342+
343+
### Upstream Reset After Response Started
344+
345+
The `upstream_reset_after_response_started` error means that the upstream server reset the connection _after_ it began transmitting the response.
346+
347+
**Resolution**: See the [Connections - Timeouts](/docs/internals/connection#timeouts) page to learn how timeouts work with upstream connections, and how to configure timeouts in Pomerium to avoid this error.
348+
349+
### Upstream Reset Before Response Started
350+
351+
The `upstream_reset_before_response_started` error means the upstream server reset the connection _before_ it began transmitting the response.
352+
353+
**Resolution**: See the [Connections - Timeouts](/docs/internals/connection#timeouts) page to learn how timeouts work with upstream connections, and how to configure timeouts in Pomerium to avoid this error.
354+
355+
### Upstream Response Timeout
356+
357+
The `upstream_response_timeout` error means that the upstream server's response timed out.
358+
359+
**Resolution**: See the [Connections - Timeouts](/docs/internals/connection#timeouts) page to learn how timeouts work with upstream connections, and how to configure timeouts in Pomerium to avoid this error.
360+
361+
### Via Upstream
362+
363+
The `via_upstream` error means that the upstream service set the response code.
364+
365+
**Resolution**: To resolve this error, check the upstream service's application logs for more information about how the response status code is set.
366+
367+
---
288368

289369
## Miscellaneous
290370

Loading

0 commit comments

Comments
 (0)
Please sign in to comment.