Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data loss when reactivating a connection #55770

Open
SkinnyPigeon opened this issue Mar 14, 2025 · 4 comments
Open

Data loss when reactivating a connection #55770

SkinnyPigeon opened this issue Mar 14, 2025 · 4 comments

Comments

@SkinnyPigeon
Copy link
Contributor

Topic

Data loss

Relevant information

Yesterday we paused all of our MySQL to S3 connections for two hours. When we resumed them, we found that each of them immediately started to clear all of the data for each of the streams in our S3 buckets. Obviously, this has been painful to recover from and we are very interested in preventing this from happening again.

The following are logs from the server pod at the time that one of the connections was reenabled:

i.a.c.t.ConnectionManagerUtils(signalWorkflowAndRepairIfNecessary):95 - Retrieved existing connection manager workflow for connection 8f70b590-876f-422c-b39d-9c41767a8f52. Executing signal.
i.a.c.s.h.SchedulerHandler(createJob):600 - Found the following streams to reset for connection 8f70b590-876f-422c-b39d-9c41767a8f52: [io.airbyte.config.StreamDescriptor@3103115f[namespace=public,name=log,additionalProperties={}]]
i.a.c.h.ResourceRequirementsUtils(getResourceRequirementsForJobType):64 - Merged resource requirements. mergedResourceReqs=io.airbyte.config.ResourceRequirements@46983c1[cpuRequest=250m,cpuLimit=2,memoryRequest=2Gi,memoryLimit=2Gi,ephemeralStorageRequest=<null>,ephemeralStorageLimit=<null>,additionalProperties={}] connectionResourceReqs=null actorResourceReqs=null actorDefinitionResourceReqs=io.airbyte.config.ScopedResourceRequirements@78195c45[_default=<null>,jobSpecific=[io.airbyte.config.JobTypeResourceLimit@1bd523fd[jobType=sync,resourceRequirements=io.airbyte.config.ResourceRequirements@5040cf73[cpuRequest=<null>,cpuLimit=<null>,memoryRequest=2Gi,memoryLimit=2Gi,ephemeralStorageRequest=<null>,ephemeralStorageLimit=<null>,additionalProperties={}],additionalProperties={}]],additionalProperties={}] workerDefaultResourceReqs=io.airbyte.config.ResourceRequirements@18ff2931[cpuRequest=250m,cpuLimit=2,memoryRequest=512Mi,memoryLimit=8Gi,ephemeralStorageRequest=<null>,ephemeralStorageLimit=<null>,additionalProperties={}] jobType=sync
i.a.p.j.DefaultJobPersistence(enqueueJob):589 - enqueuing pending job for scope: 8f70b590-876f-422c-b39d-9c41767a8f52
i.a.c.s.h.ConnectionsHandler(applySchemaChange):1368 - Applying schema change for connection '8f70b590-876f-422c-b39d-9c41767a8f52' only
.a.c.s.h.ConnectionsHandler(applySchemaChange):1414 - Sending notification of manually applying schema change for connectionId: '8f70b590-876f-422c-b39d-9c41767a8f52'

What these appear to show is that Airbyte detected a schema change triggering the data wipe. However, according to your documentation, this should not have occurred as we have approve all changes myself as the setting. No dialog box opened asking us to confirm this action, Airbtye simply took it upon itself to drop a few terrabytes of data.

The screenshot shows the timeline for one of our connections. As can be seen, it was disabled, reenabled, cleared the data, then synced.

Image

We are using:

  • Airbyte 1.5.0 OSS
  • MySQL v3.7.1
  • S3 v0.6.1

Was this a bug? Did we have something incorrectly set? How are we supposed to pause this type of connection without risking massive data loss in the future?

Thanks,
Euan

@marcosmarxm
Copy link
Member

Can you upload the logs from the clear data job?

@SkinnyPigeon
Copy link
Contributor Author

@SkinnyPigeon
Copy link
Contributor Author

Airbyte.Clearing.Data.mp4

This is a video of it occurring. As you can see, the clearing step starts by itself

@natikgadzhi
Copy link
Contributor

cc @theyueli

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants