Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Airbyte is not stable in k8s env #38853

Closed
sivankumar86 opened this issue Jun 2, 2024 · 4 comments
Closed

Airbyte is not stable in k8s env #38853

sivankumar86 opened this issue Jun 2, 2024 · 4 comments

Comments

@sivankumar86
Copy link
Contributor

Topic

Airbyte sync stop after sometime

Relevant information

Hi Team,
I am using helm chart to deploy airbyte in EKS. Job sync stopped after sometime and it gets resolved once we restart worker pods. it seems, some resource releasing issue with worker. let me know if you need more details.

Versions:

Helm : 0.94.x
Airbyte : 0.61.x

Env:

io.airbyte.workers.exception.WorkerException: Failed to create pod for check step
	at io.airbyte.workers.process.KubeProcessFactory.create(KubeProcessFactory.java:197) ~[io.airbyte-airbyte-commons-worker-0.60.1.jar:?]
	at io.airbyte.workers.process.AirbyteIntegrationLauncher.check(AirbyteIntegrationLauncher.java:149) ~[io.airbyte-airbyte-commons-worker-0.60.1.jar:?]
	at io.airbyte.workers.general.DefaultCheckConnectionWorker.run(DefaultCheckConnectionWorker.java:71) ~[io.airbyte-airbyte-commons-worker-0.60.1.jar:?]
	at io.airbyte.workers.general.DefaultCheckConnectionWorker.run(DefaultCheckConnectionWorker.java:44) ~[io.airbyte-airbyte-commons-worker-0.60.1.jar:?]
	at io.airbyte.workers.temporal.TemporalAttemptExecution.get(TemporalAttemptExecution.java:142) ~[io.airbyte-airbyte-workers-0.60.1.jar:?]
	at io.airbyte.workers.temporal.check.connection.CheckConnectionActivityImpl.lambda$runWithJobOutput$1(CheckConnectionActivityImpl.java:226) ~[io.airbyte-airbyte-workers-0.60.1.jar:?]
	at io.airbyte.commons.temporal.HeartbeatUtils.withBackgroundHeartbeat(HeartbeatUtils.java:57) ~[io.airbyte-airbyte-commons-temporal-core-0.60.1.jar:?]
	at io.airbyte.workers.temporal.check.connection.CheckConnectionActivityImpl.runWithJobOutput(CheckConnectionActivityImpl.java:211) ~[io.airbyte-airbyte-workers-0.60.1.jar:?]
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) ~[?:?]
	at java.base/java.lang.reflect.Method.invoke(Method.java:580) ~[?:?]
	at io.temporal.internal.activity.RootActivityInboundCallsInterceptor$POJOActivityInboundCallsInterceptor.executeActivity(RootActivityInboundCallsInterceptor.java:64) ~[temporal-sdk-1.22.3.jar:?]
	at io.temporal.internal.activity.RootActivityInboundCallsInterceptor.execute(RootActivityInboundCallsInterceptor.java:43) ~[temporal-sdk-1.22.3.jar:?]
	at io.temporal.internal.activity.ActivityTaskExecutors$BaseActivityTaskExecutor.execute(ActivityTaskExecutors.java:107) ~[temporal-sdk-1.22.3.jar:?]
	at io.temporal.internal.activity.ActivityTaskHandlerImpl.handle(ActivityTaskHandlerImpl.java:124) ~[temporal-sdk-1.22.3.jar:?]
	at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handleActivity(ActivityWorker.java:278) ~[temporal-sdk-1.22.3.jar:?]
	at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handle(ActivityWorker.java:243) ~[temporal-sdk-1.22.3.jar:?]
	at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handle(ActivityWorker.java:216) ~[temporal-sdk-1.22.3.jar:?]
	at io.temporal.internal.worker.PollTaskExecutor.lambda$process$0(PollTaskExecutor.java:105) ~[temporal-sdk-1.22.3.jar:?]
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
	at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
Caused by: java.lang.NullPointerException: Cannot invoke "java.lang.Integer.intValue()" because the return value of "io.airbyte.workers.process.KubePortManagerSingleton.take()" is null
	at io.airbyte.workers.process.KubeProcessFactory.create(KubeProcessFactory.java:139) ~[io.airbyte-airbyte-commons-worker-0.60.1.jar:?]

c810ba10_3e93_4c4c_976f_8605746e4520_job_639176_attempt_1_txt.log

@sivankumar86
Copy link
Contributor Author

I think, failed job is not releasing resource but, not sure. Reproduce steps:

  1. Create a failed sync
  2. Create a k8s env with only one workers with default config (max 5)
  3. Run a sync and make sure it failed more than 5 times (~10 times)
  4. Now, all the sync would fails with unable to create a pod error .
  5. Restart the worker pod then sync job would start run.

@marcosmarxm
Copy link
Member

Thanks for reporting the issue @sivankumar86 I included to the platform team for further investigation

@octavia-squidington-iii
Copy link
Collaborator

At Airbyte, we seek to be clear about the project priorities and roadmap. This issue has not had any activity for 180 days, suggesting that it's not as critical as others. It's possible it has already been fixed. It is being marked as stale and will be closed in 20 days if there is no activity. To keep it open, please comment to let us know why it is important to you and if it is still reproducible on recent versions of Airbyte.

@octavia-squidington-iii
Copy link
Collaborator

This issue was closed because it has been inactive for 20 days since being marked as stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants