Potentially having large jobs block smaller ones #28588

mathetes87 · 2023-07-22T15:44:33Z

mathetes87
Jul 22, 2023

Hi! We are a SaaS company with multiple clients for which we sync their data every day using Airbyte, and so far it has been flawless. You have built a really great developer experience here. Really awesome work!

We have an EC2 instance running Airbyte docker and an RDS database instance for the Airbyte Database.
The thing is, we plan to have a large inflow of new clients that would probably take a while for their first sync job to complete, probably a number of days. Subsequent sync jobs are typically really small since we use cursors to only move the incremental data. Also really important, we trigger all of our sync jobs externally with an API call.

Given this scenario, how can we avoid having this new clients block the sync jobs for our existing and smaller ones?
I'm thinking about two possibilities:

Multiple Airbyte docker instances: have one instance sync our existing clients and the other one our new and long running jobs. This would probably be the easiest solution, but I'm wondering if it would generate problems if we were to have multiple Airbyte instances with the same database backend?
External queues with a logical allocation capacity: have a queue for each group, new syncs and incremental syncs, and assign a given amount of concurring jobs for each one. Once a certain job finishes only a new one from that same queue would use that free capacity. This would require more coordination and orchestration logic somewhere outside Airbyte.

Any other ideas? Is the first option even possible?

Thanks

marcosmarxm · 2023-07-25T19:07:12Z

marcosmarxm
Jul 25, 2023
Maintainer

Interesting question, @mathetes87. Airbyte only has a simple scheduling system to handle connections. In my opinion, if you're running syncs using external calls, maybe following possibility 2 with some orchestration tool like Dagster or Airflow could be a good approach.

For possibility 1, you'll probably need to disable the syncs in your main instance and enable them only in the secondary. I have some concerns about the point you brought up regarding having multiple workers writing to the same database backend, but because Airbyte supports multiple workers on Kubernetes deployment, I don't think this will be a problem.

1 reply

mathetes87 Jul 25, 2023
Author

Thanks @marcosmarxm !
It's good to know that both options should work. We'll see how it goes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potentially having large jobs block smaller ones #28588

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Potentially having large jobs block smaller ones #28588

mathetes87 Jul 22, 2023

Replies: 1 comment · 1 reply

marcosmarxm Jul 25, 2023 Maintainer

mathetes87 Jul 25, 2023 Author

mathetes87
Jul 22, 2023

Replies: 1 comment 1 reply

marcosmarxm
Jul 25, 2023
Maintainer

mathetes87 Jul 25, 2023
Author