Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[platform] Container Orchestrator Intermittent WorkerException #34590

Closed
sivankumar86 opened this issue Jan 28, 2024 · 4 comments
Closed

[platform] Container Orchestrator Intermittent WorkerException #34590

sivankumar86 opened this issue Jan 28, 2024 · 4 comments
Labels
area/platform issues related to the platform community Stale team/platform-move type/bug Something isn't working

Comments

@sivankumar86
Copy link
Contributor

sivankumar86 commented Jan 28, 2024

Helm Chart Version

0.50.4

What step the error happened?

None

Revelant information

Worker jobs intermittently failed and resolved after couple of hours .
c810ba10_3e93_4c4c_976f_8605746e4520_job_509241_attempt_5_txt.log

Relevant log output

2024-01-17 02:25:28 ERROR i.a.w.t.s.a.AppendToAttemptLogActivityImpl(log):54 - Failing job: 509241, reason: Job failed after too many retries for connection b99dbe2c-8e26-4324-9fcb-3bf7e2580b0d
2024-01-17 02:30:27 ERROR i.a.w.g.DefaultCheckConnectionWorker(run):133 - Unexpected error while checking connection:
io.airbyte.workers.exception.WorkerException: Failed to create pod for check step
at io.airbyte.workers.process.KubeProcessFactory.create(KubeProcessFactory.java:188) ~[io.airbyte-airbyte-commons-worker-0.50.34.jar:?]
at io.airbyte.workers.process.AirbyteIntegrationLauncher.check(AirbyteIntegrationLauncher.java:143) ~[io.airbyte-airbyte-commons-worker-0.50.34.jar:?]
at io.airbyte.workers.general.DefaultCheckConnectionWorker.run(DefaultCheckConnectionWorker.java:71) ~[io.airbyte-airbyte-commons-worker-0.50.34.jar:?]
at io.airbyte.workers.general.DefaultCheckConnectionWorker.run(DefaultCheckConnectionWorker.java:44) ~[io.airbyte-airbyte-commons-worker-0.50.34.jar:?]
at io.airbyte.workers.temporal.TemporalAttemptExecution.get(TemporalAttemptExecution.java:135) ~[io.airbyte-airbyte-workers-0.50.34.jar:?]
at io.airbyte.workers.temporal.check.connection.CheckConnectionActivityImpl.lambda$runWithJobOutput$1(CheckConnectionActivityImpl.java:133) ~[io.airbyte-airbyte-workers-0.50.34.jar:?]
at io.airbyte.commons.temporal.HeartbeatUtils.withBackgroundHeartbeat(HeartbeatUtils.java:57) ~[io.airbyte-airbyte-commons-temporal-core-0.50.34.jar:?]
at io.airbyte.workers.temporal.check.connection.CheckConnectionActivityImpl.runWithJobOutput(CheckConnectionActivityImpl.java:118) ~[io.airbyte-airbyte-workers-0.50.34.jar:?]
at jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) ~[?:?]
at java.lang.reflect.Method.invoke(Method.java:578) ~[?:?]
at io.temporal.internal.activity.RootActivityInboundCallsInterceptor$POJOActivityInboundCallsInterceptor.executeActivity(RootActivityInboundCallsInterceptor.java:64) ~[temporal-sdk-1.17.0.jar:?]
at io.temporal.internal.activity.RootActivityInboundCallsInterceptor.execute(RootActivityInboundCallsInterceptor.java:43) ~[temporal-sdk-1.17.0.jar:?]
at io.temporal.internal.activity.ActivityTaskExecutors$BaseActivityTaskExecutor.execute(ActivityTaskExecutors.java:95) ~[temporal-sdk-1.17.0.jar:?]
at io.temporal.internal.activity.ActivityTaskHandlerImpl.handle(ActivityTaskHandlerImpl.java:92) ~[temporal-sdk-1.17.0.jar:?]
at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handleActivity(ActivityWorker.java:241) ~[temporal-sdk-1.17.0.jar:?]
at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handle(ActivityWorker.java:206) ~[temporal-sdk-1.17.0.jar:?]
at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handle(ActivityWorker.java:179) ~[temporal-sdk-1.17.0.jar:?]
at io.temporal.internal.worker.PollTaskExecutor.lambda$process$0(PollTaskExecutor.java:93) ~[temporal-sdk-1.17.0.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
at java.lang.Thread.run(Thread.java:1589) ~[?:?]
Caused by: java.lang.NullPointerException: Cannot invoke "java.lang.Integer.intValue()" because the return value of "io.airbyte.workers.process.KubePortManagerSingleton.take()" is null
at io.airbyte.workers.process.KubeProcessFactory.create(KubeProcessFactory.java:131) ~[io.airbyte-airbyte-commons-worker-0.50.34.jar:?]
... 20 more

Reproduce steps (not sure):
1. Run many sync jobs in parallel/seq and make sure some jobs are failing during sync
2. after sometime, you may face above exception
3. restart worker would resolve the issue . looking for some insight on this .

airbyte setup :

1 . 0.50.34 version
2. 15 workers with 40 ports open and 4 GB memory
3. Running on orchestration mode
4. 20-30 jobs every hours . mssql to snowflake and gitlab to snowflake

@marcosmarxm
Copy link
Member

Please, you need to share more information about what is your deployment.
What steps you executed until reached the problem for other people to reproduce.

@sivankumar86
Copy link
Contributor Author

@marcosmarxm added as much as possible . let me know if you need more details

@marcosmarxm marcosmarxm changed the title Airbyte (Container Orchestrator) Intermittent WorkerException [platform] Container Orchestrator Intermittent WorkerException Apr 30, 2024
@octavia-squidington-iii
Copy link
Collaborator

At Airbyte, we seek to be clear about the project priorities and roadmap. This issue has not had any activity for 180 days, suggesting that it's not as critical as others. It's possible it has already been fixed. It is being marked as stale and will be closed in 20 days if there is no activity. To keep it open, please comment to let us know why it is important to you and if it is still reproducible on recent versions of Airbyte.

@octavia-squidington-iii
Copy link
Collaborator

This issue was closed because it has been inactive for 20 days since being marked as stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/platform issues related to the platform community Stale team/platform-move type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants