Data loss when reactivating a connection #55770

SkinnyPigeon · 2025-03-14T12:05:34Z

Topic

Data loss

Relevant information

Yesterday we paused all of our MySQL to S3 connections for two hours. When we resumed them, we found that each of them immediately started to clear all of the data for each of the streams in our S3 buckets. Obviously, this has been painful to recover from and we are very interested in preventing this from happening again.

The following are logs from the server pod at the time that one of the connections was reenabled:

i.a.c.t.ConnectionManagerUtils(signalWorkflowAndRepairIfNecessary):95 - Retrieved existing connection manager workflow for connection 8f70b590-876f-422c-b39d-9c41767a8f52. Executing signal.
i.a.c.s.h.SchedulerHandler(createJob):600 - Found the following streams to reset for connection 8f70b590-876f-422c-b39d-9c41767a8f52: [io.airbyte.config.StreamDescriptor@3103115f[namespace=public,name=log,additionalProperties={}]]
i.a.c.h.ResourceRequirementsUtils(getResourceRequirementsForJobType):64 - Merged resource requirements. mergedResourceReqs=io.airbyte.config.ResourceRequirements@46983c1[cpuRequest=250m,cpuLimit=2,memoryRequest=2Gi,memoryLimit=2Gi,ephemeralStorageRequest=<null>,ephemeralStorageLimit=<null>,additionalProperties={}] connectionResourceReqs=null actorResourceReqs=null actorDefinitionResourceReqs=io.airbyte.config.ScopedResourceRequirements@78195c45[_default=<null>,jobSpecific=[io.airbyte.config.JobTypeResourceLimit@1bd523fd[jobType=sync,resourceRequirements=io.airbyte.config.ResourceRequirements@5040cf73[cpuRequest=<null>,cpuLimit=<null>,memoryRequest=2Gi,memoryLimit=2Gi,ephemeralStorageRequest=<null>,ephemeralStorageLimit=<null>,additionalProperties={}],additionalProperties={}]],additionalProperties={}] workerDefaultResourceReqs=io.airbyte.config.ResourceRequirements@18ff2931[cpuRequest=250m,cpuLimit=2,memoryRequest=512Mi,memoryLimit=8Gi,ephemeralStorageRequest=<null>,ephemeralStorageLimit=<null>,additionalProperties={}] jobType=sync
i.a.p.j.DefaultJobPersistence(enqueueJob):589 - enqueuing pending job for scope: 8f70b590-876f-422c-b39d-9c41767a8f52
i.a.c.s.h.ConnectionsHandler(applySchemaChange):1368 - Applying schema change for connection '8f70b590-876f-422c-b39d-9c41767a8f52' only
.a.c.s.h.ConnectionsHandler(applySchemaChange):1414 - Sending notification of manually applying schema change for connectionId: '8f70b590-876f-422c-b39d-9c41767a8f52'

What these appear to show is that Airbyte detected a schema change triggering the data wipe. However, according to your documentation, this should not have occurred as we have approve all changes myself as the setting. No dialog box opened asking us to confirm this action, Airbtye simply took it upon itself to drop a few terrabytes of data.

The screenshot shows the timeline for one of our connections. As can be seen, it was disabled, reenabled, cleared the data, then synced.

We are using:

Airbyte 1.5.0 OSS
MySQL v3.7.1
S3 v0.6.1

Was this a bug? Did we have something incorrectly set? How are we supposed to pause this type of connection without risking massive data loss in the future?

Thanks,
Euan

The text was updated successfully, but these errors were encountered:

marcosmarxm · 2025-03-14T12:10:03Z

Can you upload the logs from the clear data job?

SkinnyPigeon · 2025-03-14T12:16:35Z

data_platform___stuart_bi__part_1__logs_174201_txt.txt

SkinnyPigeon · 2025-03-14T16:02:34Z

Airbyte.Clearing.Data.mp4

This is a video of it occurring. As you can see, the clearing step starts by itself

natikgadzhi · 2025-03-15T19:32:19Z

cc @theyueli

SkinnyPigeon added the needs-triage label Mar 14, 2025

octavia-squidington-iii added autoteam community team/use labels Mar 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data loss when reactivating a connection #55770

Data loss when reactivating a connection #55770

SkinnyPigeon commented Mar 14, 2025

marcosmarxm commented Mar 14, 2025

SkinnyPigeon commented Mar 14, 2025

SkinnyPigeon commented Mar 14, 2025

natikgadzhi commented Mar 15, 2025

Data loss when reactivating a connection #55770

Data loss when reactivating a connection #55770

Comments

SkinnyPigeon commented Mar 14, 2025

Topic

Relevant information

marcosmarxm commented Mar 14, 2025

SkinnyPigeon commented Mar 14, 2025

SkinnyPigeon commented Mar 14, 2025

natikgadzhi commented Mar 15, 2025