Potentially having large jobs block smaller ones #28588
Replies: 1 comment 1 reply
-
Interesting question, @mathetes87. Airbyte only has a simple scheduling system to handle connections. In my opinion, if you're running syncs using external calls, maybe following possibility 2 with some orchestration tool like Dagster or Airflow could be a good approach. For possibility 1, you'll probably need to disable the syncs in your main instance and enable them only in the secondary. I have some concerns about the point you brought up regarding having multiple workers writing to the same database backend, but because Airbyte supports multiple workers on Kubernetes deployment, I don't think this will be a problem. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi! We are a SaaS company with multiple clients for which we sync their data every day using Airbyte, and so far it has been flawless. You have built a really great developer experience here. Really awesome work!
We have an EC2 instance running Airbyte docker and an RDS database instance for the Airbyte Database.
The thing is, we plan to have a large inflow of new clients that would probably take a while for their first sync job to complete, probably a number of days. Subsequent sync jobs are typically really small since we use cursors to only move the incremental data. Also really important, we trigger all of our sync jobs externally with an API call.
Given this scenario, how can we avoid having this new clients block the sync jobs for our existing and smaller ones?
I'm thinking about two possibilities:
Multiple Airbyte docker instances: have one instance sync our existing clients and the other one our new and long running jobs. This would probably be the easiest solution, but I'm wondering if it would generate problems if we were to have multiple Airbyte instances with the same database backend?
External queues with a logical allocation capacity: have a queue for each group, new syncs and incremental syncs, and assign a given amount of concurring jobs for each one. Once a certain job finishes only a new one from that same queue would use that free capacity. This would require more coordination and orchestration logic somewhere outside Airbyte.
Any other ideas? Is the first option even possible?
Thanks
Beta Was this translation helpful? Give feedback.
All reactions