-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[platform] Container Orchestrator Intermittent WorkerException #34590
Comments
Please, you need to share more information about what is your deployment. |
@marcosmarxm added as much as possible . let me know if you need more details |
At Airbyte, we seek to be clear about the project priorities and roadmap. This issue has not had any activity for 180 days, suggesting that it's not as critical as others. It's possible it has already been fixed. It is being marked as stale and will be closed in 20 days if there is no activity. To keep it open, please comment to let us know why it is important to you and if it is still reproducible on recent versions of Airbyte. |
This issue was closed because it has been inactive for 20 days since being marked as stale. |
Helm Chart Version
0.50.4
What step the error happened?
None
Revelant information
Worker jobs intermittently failed and resolved after couple of hours .
c810ba10_3e93_4c4c_976f_8605746e4520_job_509241_attempt_5_txt.log
Relevant log output
2024-01-17 02:25:28 ERROR i.a.w.t.s.a.AppendToAttemptLogActivityImpl(log):54 - Failing job: 509241, reason: Job failed after too many retries for connection b99dbe2c-8e26-4324-9fcb-3bf7e2580b0d
2024-01-17 02:30:27 ERROR i.a.w.g.DefaultCheckConnectionWorker(run):133 - Unexpected error while checking connection:
io.airbyte.workers.exception.WorkerException: Failed to create pod for check step
at io.airbyte.workers.process.KubeProcessFactory.create(KubeProcessFactory.java:188) ~[io.airbyte-airbyte-commons-worker-0.50.34.jar:?]
at io.airbyte.workers.process.AirbyteIntegrationLauncher.check(AirbyteIntegrationLauncher.java:143) ~[io.airbyte-airbyte-commons-worker-0.50.34.jar:?]
at io.airbyte.workers.general.DefaultCheckConnectionWorker.run(DefaultCheckConnectionWorker.java:71) ~[io.airbyte-airbyte-commons-worker-0.50.34.jar:?]
at io.airbyte.workers.general.DefaultCheckConnectionWorker.run(DefaultCheckConnectionWorker.java:44) ~[io.airbyte-airbyte-commons-worker-0.50.34.jar:?]
at io.airbyte.workers.temporal.TemporalAttemptExecution.get(TemporalAttemptExecution.java:135) ~[io.airbyte-airbyte-workers-0.50.34.jar:?]
at io.airbyte.workers.temporal.check.connection.CheckConnectionActivityImpl.lambda$runWithJobOutput$1(CheckConnectionActivityImpl.java:133) ~[io.airbyte-airbyte-workers-0.50.34.jar:?]
at io.airbyte.commons.temporal.HeartbeatUtils.withBackgroundHeartbeat(HeartbeatUtils.java:57) ~[io.airbyte-airbyte-commons-temporal-core-0.50.34.jar:?]
at io.airbyte.workers.temporal.check.connection.CheckConnectionActivityImpl.runWithJobOutput(CheckConnectionActivityImpl.java:118) ~[io.airbyte-airbyte-workers-0.50.34.jar:?]
at jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) ~[?:?]
at java.lang.reflect.Method.invoke(Method.java:578) ~[?:?]
at io.temporal.internal.activity.RootActivityInboundCallsInterceptor$POJOActivityInboundCallsInterceptor.executeActivity(RootActivityInboundCallsInterceptor.java:64) ~[temporal-sdk-1.17.0.jar:?]
at io.temporal.internal.activity.RootActivityInboundCallsInterceptor.execute(RootActivityInboundCallsInterceptor.java:43) ~[temporal-sdk-1.17.0.jar:?]
at io.temporal.internal.activity.ActivityTaskExecutors$BaseActivityTaskExecutor.execute(ActivityTaskExecutors.java:95) ~[temporal-sdk-1.17.0.jar:?]
at io.temporal.internal.activity.ActivityTaskHandlerImpl.handle(ActivityTaskHandlerImpl.java:92) ~[temporal-sdk-1.17.0.jar:?]
at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handleActivity(ActivityWorker.java:241) ~[temporal-sdk-1.17.0.jar:?]
at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handle(ActivityWorker.java:206) ~[temporal-sdk-1.17.0.jar:?]
at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handle(ActivityWorker.java:179) ~[temporal-sdk-1.17.0.jar:?]
at io.temporal.internal.worker.PollTaskExecutor.lambda$process$0(PollTaskExecutor.java:93) ~[temporal-sdk-1.17.0.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
at java.lang.Thread.run(Thread.java:1589) ~[?:?]
Caused by: java.lang.NullPointerException: Cannot invoke "java.lang.Integer.intValue()" because the return value of "io.airbyte.workers.process.KubePortManagerSingleton.take()" is null
at io.airbyte.workers.process.KubeProcessFactory.create(KubeProcessFactory.java:131) ~[io.airbyte-airbyte-commons-worker-0.50.34.jar:?]
... 20 more
Reproduce steps (not sure):
1. Run many sync jobs in parallel/seq and make sure some jobs are failing during sync
2. after sometime, you may face above exception
3. restart worker would resolve the issue . looking for some insight on this .
airbyte setup :
1 . 0.50.34 version
2. 15 workers with 40 ports open and 4 GB memory
3. Running on orchestration mode
4. 20-30 jobs every hours . mssql to snowflake and gitlab to snowflake
The text was updated successfully, but these errors were encountered: