From 2f77be6e6e024a3f3d4c9958928e213e59593da0 Mon Sep 17 00:00:00 2001 From: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Date: Sat, 15 Mar 2025 17:40:16 +0000 Subject: [PATCH 01/10] docs: add Airbyte File Sync and Permission Sync documentation Co-Authored-By: Aaron Steers --- docs/understanding-airbyte/file-transfer.md | 68 +++++++++++++++++ docs/understanding-airbyte/permission-sync.md | 76 +++++++++++++++++++ docusaurus/redirects.yml | 4 + docusaurus/sidebars.js | 2 + 4 files changed, 150 insertions(+) create mode 100644 docs/understanding-airbyte/file-transfer.md create mode 100644 docs/understanding-airbyte/permission-sync.md diff --git a/docs/understanding-airbyte/file-transfer.md b/docs/understanding-airbyte/file-transfer.md new file mode 100644 index 0000000000000..9357eb2e41625 --- /dev/null +++ b/docs/understanding-airbyte/file-transfer.md @@ -0,0 +1,68 @@ +# Airbyte File Sync + +Airbyte File Sync is a capability that allows you to move unstructured data, non-text data, and compressed files between sources and destinations without parsing their contents. This document explains how File Sync works, which connectors support it, and how to use it. + +## Overview + +Traditional data integration in Airbyte involves extracting structured data as individual records, which are then processed and loaded into a destination. However, many use cases require transferring raw files without parsing their contents: + +- Moving binary files (images, videos, PDFs) +- Transferring compressed files (ZIP, GZIP) +- Migrating unstructured text data +- Preserving file formats for specialized processing + +File Sync addresses these needs by copying files exactly as they appear in the source to the destination, preserving their original format and content. + +## How File Sync Works + +When using File Sync: + +1. The source connector identifies files to be transferred +2. Instead of parsing file contents into records, the file is transferred as-is +3. The destination connector writes the raw file to the target location +4. File metadata (name, path, size, etc.) is preserved + +This differs from standard Airbyte syncs where files would be parsed into individual records. + +## Supported Connectors + +File Sync is currently supported by the following connectors: + +### Sources +- [SFTP Bulk](../integrations/sources/sftp-bulk.md) +- [Microsoft SharePoint](../integrations/sources/microsoft-sharepoint.md) +- [S3](../integrations/sources/s3.md) + +### Destinations +- [S3](../integrations/destinations/s3.md) +- [Deepset](../integrations/destinations/deepset.md) + +## Using File Sync + +To use File Sync: + +1. Configure a connection using a source and destination that both support File Sync +2. The File Sync mode will be automatically enabled when compatible connectors are used +3. Files will be transferred without parsing their contents + +### Configuration Example + +When configuring a connection between SFTP Bulk (source) and S3 (destination): + +1. Set up the SFTP Bulk source with your server credentials and file paths +2. Configure the S3 destination with your bucket information +3. The connection will automatically use File Sync mode + +## Limitations + +- Both the source and destination must support File Sync +- File Sync is designed for raw file movement, not for transforming data +- Maximum file size limits may apply depending on the connectors + +## Technical Implementation + +File Sync is implemented in the Airbyte CDK (Connector Development Kit) version 0.48.0 and above. Connectors that support this feature have the `supportsFileTransfer: true` flag in their metadata.yaml file. + +## Future Enhancements + +The File Sync capability is being expanded to support more source and destination connectors. Check the documentation of specific connectors to see if they support File Sync. diff --git a/docs/understanding-airbyte/permission-sync.md b/docs/understanding-airbyte/permission-sync.md new file mode 100644 index 0000000000000..042d41b202b56 --- /dev/null +++ b/docs/understanding-airbyte/permission-sync.md @@ -0,0 +1,76 @@ +# Airbyte Permission Sync + +Permission Sync is a capability in Airbyte that allows you to transfer access control information and permission structures between systems. This document explains how Permission Sync works, which connectors support it, and how to use it. + +## Overview + +When transferring data between systems, it's often important to maintain not just the data itself but also the permission structures that govern access to that data. Permission Sync addresses this need by: + +- Preserving user and group access controls +- Maintaining role-based permissions +- Transferring ownership information +- Replicating sharing settings + +This ensures that when data is moved between systems, the appropriate access controls are maintained. + +## How Permission Sync Works + +When using Permission Sync: + +1. The source connector extracts both data and associated permission metadata +2. Permission structures are mapped between source and destination systems +3. The destination connector applies compatible permission settings +4. User and group mappings are maintained where possible + +Permission Sync can work alongside regular data synchronization or File Sync operations. + +## Supported Connectors + +Permission Sync is currently in early development with limited connector support. The following connectors are planned to support Permission Sync: + +### Sources +- Microsoft SharePoint (in development) +- Google Drive (planned) +- Box (planned) + +### Destinations +- S3 (in development) +- Google Cloud Storage (planned) +- Azure Blob Storage (planned) + +## Using Permission Sync + +To use Permission Sync: + +1. Configure a connection using a source and destination that both support Permission Sync +2. Enable the Permission Sync option in the connection settings +3. Configure user/group mapping if needed for cross-system synchronization + +### Configuration Example + +When configuring a connection between Microsoft SharePoint (source) and S3 (destination): + +1. Set up the SharePoint source with your tenant credentials +2. Configure the S3 destination with your bucket information and IAM settings +3. Enable Permission Sync in the advanced options +4. Configure user mapping between SharePoint users and AWS IAM roles/users + +## Limitations + +- Permission structures vary significantly between systems, so perfect mapping is not always possible +- Some permission types may not have equivalents in destination systems +- User and group identity mapping may require manual configuration +- Permission Sync is most effective between systems with similar access control models + +## Technical Implementation + +Permission Sync is implemented as an extension to the Airbyte protocol, allowing connectors to exchange permission metadata alongside regular data records. Connectors that support this feature have the `supportsPermissionSync: true` flag in their metadata.yaml file. + +## Future Enhancements + +The Permission Sync capability is being actively developed with plans to support more source and destination connectors. Future enhancements will include: + +- More granular permission mapping options +- Support for complex role-based access control (RBAC) systems +- Automated user/group identity mapping +- Audit logging for permission changes during sync diff --git a/docusaurus/redirects.yml b/docusaurus/redirects.yml index 6f505d38f2541..e3b0dbf555813 100644 --- a/docusaurus/redirects.yml +++ b/docusaurus/redirects.yml @@ -37,6 +37,10 @@ to: /cloud/managing-airbyte-cloud/configuring-connections - from: /cloud/managing-airbyte-cloud/manage-schema-changes to: /using-airbyte/schema-change-management +- from: /file-sync + to: /understanding-airbyte/file-transfer +- from: /permission-sync + to: /understanding-airbyte/permission-sync # November 2023 documentation restructure: - from: - /project-overview/product-support-levels diff --git a/docusaurus/sidebars.js b/docusaurus/sidebars.js index 9f1aa15f5547c..25a22dd69c853 100644 --- a/docusaurus/sidebars.js +++ b/docusaurus/sidebars.js @@ -544,6 +544,8 @@ const understandingAirbyte = { "understanding-airbyte/supported-data-types", "understanding-airbyte/secrets", "understanding-airbyte/cdc", + "understanding-airbyte/file-transfer", + "understanding-airbyte/permission-sync", "understanding-airbyte/resumability", "understanding-airbyte/json-avro-conversion", "understanding-airbyte/schemaless-sources-and-destinations", From 96b6bb89c27ca15ea08d708796a49c7eea0241ce Mon Sep 17 00:00:00 2001 From: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Date: Sat, 15 Mar 2025 17:50:02 +0000 Subject: [PATCH 02/10] docs: address PR feedback for file sync and permission sync documentation Co-Authored-By: Aaron Steers --- docs/understanding-airbyte/file-transfer.md | 31 +++++++++++-------- docs/understanding-airbyte/permission-sync.md | 18 ++++++----- 2 files changed, 28 insertions(+), 21 deletions(-) diff --git a/docs/understanding-airbyte/file-transfer.md b/docs/understanding-airbyte/file-transfer.md index 9357eb2e41625..2e7c172d31810 100644 --- a/docs/understanding-airbyte/file-transfer.md +++ b/docs/understanding-airbyte/file-transfer.md @@ -17,10 +17,10 @@ File Sync addresses these needs by copying files exactly as they appear in the s When using File Sync: -1. The source connector identifies files to be transferred -2. Instead of parsing file contents into records, the file is transferred as-is -3. The destination connector writes the raw file to the target location -4. File metadata (name, path, size, etc.) is preserved +1. The source connector identifies files to be transferred. +2. Instead of parsing file contents into records, the file is transferred as-is. +3. The destination connector writes the raw file to the target location. +4. File metadata (name, path, size, etc.) is preserved. This differs from standard Airbyte syncs where files would be parsed into individual records. @@ -29,13 +29,14 @@ This differs from standard Airbyte syncs where files would be parsed into indivi File Sync is currently supported by the following connectors: ### Sources -- [SFTP Bulk](../integrations/sources/sftp-bulk.md) + +- [SFTP (Gen 2)](../integrations/sources/sftp-bulk.md) - [Microsoft SharePoint](../integrations/sources/microsoft-sharepoint.md) - [S3](../integrations/sources/s3.md) ### Destinations + - [S3](../integrations/destinations/s3.md) -- [Deepset](../integrations/destinations/deepset.md) ## Using File Sync @@ -49,19 +50,23 @@ To use File Sync: When configuring a connection between SFTP Bulk (source) and S3 (destination): -1. Set up the SFTP Bulk source with your server credentials and file paths -2. Configure the S3 destination with your bucket information -3. The connection will automatically use File Sync mode +1. Set up the SFTP Bulk source with your server credentials and file paths. +2. Configure the S3 destination with your bucket information. +3. The connection will automatically use File Sync mode. ## Limitations -- Both the source and destination must support File Sync -- File Sync is designed for raw file movement, not for transforming data -- Maximum file size limits may apply depending on the connectors +- Both the source and destination must support File Sync. +- File Sync is designed for raw file movement, not for transforming data. +- Maximum file size limits may apply depending on the connectors. ## Technical Implementation -File Sync is implemented in the Airbyte CDK (Connector Development Kit) version 0.48.0 and above. Connectors that support this feature have the `supportsFileTransfer: true` flag in their metadata.yaml file. +File Sync is implemented in two Airbyte CDKs: +- Python Files CDK: Provides file transfer capabilities for Python-based connectors +- Java/Kotlin Bulk Destination CDK: Supports file transfer for Java-based connectors + +Connectors that support this feature have the `supportsFileTransfer: true` flag in their metadata.yaml file. ## Future Enhancements diff --git a/docs/understanding-airbyte/permission-sync.md b/docs/understanding-airbyte/permission-sync.md index 042d41b202b56..2a02c17c4edc0 100644 --- a/docs/understanding-airbyte/permission-sync.md +++ b/docs/understanding-airbyte/permission-sync.md @@ -6,10 +6,10 @@ Permission Sync is a capability in Airbyte that allows you to transfer access co When transferring data between systems, it's often important to maintain not just the data itself but also the permission structures that govern access to that data. Permission Sync addresses this need by: -- Preserving user and group access controls -- Maintaining role-based permissions -- Transferring ownership information -- Replicating sharing settings +- Preserving user and group access controls. +- Maintaining role-based permissions. +- Transferring ownership information. +- Replicating sharing settings. This ensures that when data is moved between systems, the appropriate access controls are maintained. @@ -17,10 +17,10 @@ This ensures that when data is moved between systems, the appropriate access con When using Permission Sync: -1. The source connector extracts both data and associated permission metadata -2. Permission structures are mapped between source and destination systems -3. The destination connector applies compatible permission settings -4. User and group mappings are maintained where possible +1. The source connector extracts both data and associated permission metadata. +2. Permission structures are mapped between source and destination systems. +3. The destination connector applies compatible permission settings. +4. User and group mappings are maintained where possible. Permission Sync can work alongside regular data synchronization or File Sync operations. @@ -29,11 +29,13 @@ Permission Sync can work alongside regular data synchronization or File Sync ope Permission Sync is currently in early development with limited connector support. The following connectors are planned to support Permission Sync: ### Sources + - Microsoft SharePoint (in development) - Google Drive (planned) - Box (planned) ### Destinations + - S3 (in development) - Google Cloud Storage (planned) - Azure Blob Storage (planned) From 8b3e4b3259b2a007ec0e9d1873bb0a44c0bf1df8 Mon Sep 17 00:00:00 2001 From: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Date: Sat, 15 Mar 2025 17:53:29 +0000 Subject: [PATCH 03/10] docs: add periods to all bulleted lists with sentence structure Co-Authored-By: Aaron Steers --- docs/understanding-airbyte/file-transfer.md | 18 +++++------ docs/understanding-airbyte/permission-sync.md | 30 +++++++++---------- 2 files changed, 24 insertions(+), 24 deletions(-) diff --git a/docs/understanding-airbyte/file-transfer.md b/docs/understanding-airbyte/file-transfer.md index 2e7c172d31810..1bfe2b714e333 100644 --- a/docs/understanding-airbyte/file-transfer.md +++ b/docs/understanding-airbyte/file-transfer.md @@ -6,10 +6,10 @@ Airbyte File Sync is a capability that allows you to move unstructured data, non Traditional data integration in Airbyte involves extracting structured data as individual records, which are then processed and loaded into a destination. However, many use cases require transferring raw files without parsing their contents: -- Moving binary files (images, videos, PDFs) -- Transferring compressed files (ZIP, GZIP) -- Migrating unstructured text data -- Preserving file formats for specialized processing +- Moving binary files (images, videos, PDFs). +- Transferring compressed files (ZIP, GZIP). +- Migrating unstructured text data. +- Preserving file formats for specialized processing. File Sync addresses these needs by copying files exactly as they appear in the source to the destination, preserving their original format and content. @@ -42,9 +42,9 @@ File Sync is currently supported by the following connectors: To use File Sync: -1. Configure a connection using a source and destination that both support File Sync -2. The File Sync mode will be automatically enabled when compatible connectors are used -3. Files will be transferred without parsing their contents +1. Configure a connection using a source and destination that both support File Sync. +2. The File Sync mode will be automatically enabled when compatible connectors are used. +3. Files will be transferred without parsing their contents. ### Configuration Example @@ -63,8 +63,8 @@ When configuring a connection between SFTP Bulk (source) and S3 (destination): ## Technical Implementation File Sync is implemented in two Airbyte CDKs: -- Python Files CDK: Provides file transfer capabilities for Python-based connectors -- Java/Kotlin Bulk Destination CDK: Supports file transfer for Java-based connectors +- Python Files CDK: Provides file transfer capabilities for Python-based connectors. +- Java/Kotlin Bulk Destination CDK: Supports file transfer for Java-based connectors. Connectors that support this feature have the `supportsFileTransfer: true` flag in their metadata.yaml file. diff --git a/docs/understanding-airbyte/permission-sync.md b/docs/understanding-airbyte/permission-sync.md index 2a02c17c4edc0..ad9a723fdba25 100644 --- a/docs/understanding-airbyte/permission-sync.md +++ b/docs/understanding-airbyte/permission-sync.md @@ -44,25 +44,25 @@ Permission Sync is currently in early development with limited connector support To use Permission Sync: -1. Configure a connection using a source and destination that both support Permission Sync -2. Enable the Permission Sync option in the connection settings -3. Configure user/group mapping if needed for cross-system synchronization +1. Configure a connection using a source and destination that both support Permission Sync. +2. Enable the Permission Sync option in the connection settings. +3. Configure user/group mapping if needed for cross-system synchronization. ### Configuration Example When configuring a connection between Microsoft SharePoint (source) and S3 (destination): -1. Set up the SharePoint source with your tenant credentials -2. Configure the S3 destination with your bucket information and IAM settings -3. Enable Permission Sync in the advanced options -4. Configure user mapping between SharePoint users and AWS IAM roles/users +1. Set up the SharePoint source with your tenant credentials. +2. Configure the S3 destination with your bucket information and IAM settings. +3. Enable Permission Sync in the advanced options. +4. Configure user mapping between SharePoint users and AWS IAM roles/users. ## Limitations -- Permission structures vary significantly between systems, so perfect mapping is not always possible -- Some permission types may not have equivalents in destination systems -- User and group identity mapping may require manual configuration -- Permission Sync is most effective between systems with similar access control models +- Permission structures vary significantly between systems, so perfect mapping is not always possible. +- Some permission types may not have equivalents in destination systems. +- User and group identity mapping may require manual configuration. +- Permission Sync is most effective between systems with similar access control models. ## Technical Implementation @@ -72,7 +72,7 @@ Permission Sync is implemented as an extension to the Airbyte protocol, allowing The Permission Sync capability is being actively developed with plans to support more source and destination connectors. Future enhancements will include: -- More granular permission mapping options -- Support for complex role-based access control (RBAC) systems -- Automated user/group identity mapping -- Audit logging for permission changes during sync +- More granular permission mapping options. +- Support for complex role-based access control (RBAC) systems. +- Automated user/group identity mapping. +- Audit logging for permission changes during sync. From 8e684e1296f05907384871f0b05ab9a024fdcdc0 Mon Sep 17 00:00:00 2001 From: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Date: Sat, 15 Mar 2025 17:56:32 +0000 Subject: [PATCH 04/10] docs: update permission sync destination compatibility information Co-Authored-By: Aaron Steers --- docs/understanding-airbyte/permission-sync.md | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/docs/understanding-airbyte/permission-sync.md b/docs/understanding-airbyte/permission-sync.md index ad9a723fdba25..b01ba50b89c63 100644 --- a/docs/understanding-airbyte/permission-sync.md +++ b/docs/understanding-airbyte/permission-sync.md @@ -26,7 +26,7 @@ Permission Sync can work alongside regular data synchronization or File Sync ope ## Supported Connectors -Permission Sync is currently in early development with limited connector support. The following connectors are planned to support Permission Sync: +Permission Sync is currently in early development with limited connector support. The following source connectors are planned to support Permission Sync: ### Sources @@ -36,9 +36,7 @@ Permission Sync is currently in early development with limited connector support ### Destinations -- S3 (in development) -- Google Cloud Storage (planned) -- Azure Blob Storage (planned) +Permission Sync uses standard record-type processing, making it compatible with all Airbyte destinations. ## Using Permission Sync From 8f46b25b766ab8fadb26f0961dfd56f6885f2d51 Mon Sep 17 00:00:00 2001 From: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Date: Sat, 15 Mar 2025 18:01:04 +0000 Subject: [PATCH 05/10] docs: update permission sync workflow description Co-Authored-By: Aaron Steers --- docs/understanding-airbyte/permission-sync.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/understanding-airbyte/permission-sync.md b/docs/understanding-airbyte/permission-sync.md index b01ba50b89c63..4662cfaa15ef4 100644 --- a/docs/understanding-airbyte/permission-sync.md +++ b/docs/understanding-airbyte/permission-sync.md @@ -18,11 +18,11 @@ This ensures that when data is moved between systems, the appropriate access con When using Permission Sync: 1. The source connector extracts both data and associated permission metadata. -2. Permission structures are mapped between source and destination systems. -3. The destination connector applies compatible permission settings. -4. User and group mappings are maintained where possible. +2. Permission structures are replicated from the source and sent as records. +3. The destination connector receives permission information as incoming records. +4. Permission logic allows restrictions to be reconstructed in downstream applications. -Permission Sync can work alongside regular data synchronization or File Sync operations. +While Permission Sync and File Sync connections can often complement each other, they are distinct and separate features and should be set up as separate connections. ## Supported Connectors From b8edb39fd0d2a28302f8830cd8384897e45c0b81 Mon Sep 17 00:00:00 2001 From: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Date: Sat, 15 Mar 2025 18:04:27 +0000 Subject: [PATCH 06/10] docs: add unstructured documents parsing documentation Co-Authored-By: Aaron Steers --- .../unstructured-documents.md | 78 +++++++++++++++++++ docusaurus/redirects.yml | 2 + docusaurus/sidebars.js | 1 + 3 files changed, 81 insertions(+) create mode 100644 docs/understanding-airbyte/unstructured-documents.md diff --git a/docs/understanding-airbyte/unstructured-documents.md b/docs/understanding-airbyte/unstructured-documents.md new file mode 100644 index 0000000000000..cc269d8390a94 --- /dev/null +++ b/docs/understanding-airbyte/unstructured-documents.md @@ -0,0 +1,78 @@ +# Parsing Unstructured Documents + +Airbyte provides capabilities for extracting and processing unstructured text documents from various sources. This document explains how Airbyte's unstructured document parsing works, which connectors support it, and how to use it. + +## Overview + +Traditional data integration typically focuses on structured data with well-defined schemas. However, many organizations need to extract value from unstructured documents such as: + +- Text documents (Word, PDF, TXT) +- Emails and email attachments +- Web pages and HTML content +- Presentations and spreadsheets +- Scanned documents with OCR text + +Airbyte's unstructured document parsing capabilities address these needs by extracting text content from various document formats and making it available for analysis, search, or AI processing. + +## How Unstructured Document Parsing Works + +When using unstructured document parsing: + +1. The source connector identifies documents to be processed. +2. The document parser extracts text content from the documents. +3. The extracted text is normalized and cleaned. +4. The text is sent as records to the destination. + +This process enables you to work with text from documents in the same way you work with other structured data in Airbyte. + +## Supported Connectors + +Unstructured document parsing is currently supported by the following connectors: + +### Sources + +- Google Drive +- Microsoft SharePoint +- S3 +- SFTP (Gen 2) + +## Using Unstructured Document Parsing + +To use unstructured document parsing: + +1. Configure a connection using a source that supports document parsing. +2. Enable the document parsing option in the connection settings. +3. Configure any additional parsing options (e.g., language detection, OCR settings). +4. The parsed text will be extracted and sent to your destination. + +### Configuration Example + +When configuring a connection between Google Drive (source) and a destination: + +1. Set up the Google Drive source with your account credentials. +2. Enable the "Parse Documents" option in the advanced settings. +3. Configure document type filters if needed (e.g., only process PDFs). +4. Complete the connection setup with your desired destination. + +## Limitations + +- Document parsing may not extract formatting, images, or complex layouts. +- Very large documents may be truncated based on size limits. +- OCR accuracy depends on document quality and language support. +- Some document types may require specific parser configurations. + +## Technical Implementation + +Unstructured document parsing is implemented using the "Unstructured Text Documents" parser in the Python Files CDK. This parser leverages open-source libraries to extract text from various document formats. + +Connectors that support this feature have the `supportsUnstructuredDocumentParsing: true` flag in their metadata.yaml file. + +## Future Enhancements + +The unstructured document parsing capability is being actively developed with plans to support more document types and extraction features. Future enhancements will include: + +- Improved layout preservation. +- Better table extraction from documents. +- Enhanced metadata extraction. +- Support for more document formats. +- Integration with AI models for content analysis. diff --git a/docusaurus/redirects.yml b/docusaurus/redirects.yml index e3b0dbf555813..1fb95c0c3c613 100644 --- a/docusaurus/redirects.yml +++ b/docusaurus/redirects.yml @@ -41,6 +41,8 @@ to: /understanding-airbyte/file-transfer - from: /permission-sync to: /understanding-airbyte/permission-sync +- from: /unstructured-data + to: /understanding-airbyte/unstructured-documents # November 2023 documentation restructure: - from: - /project-overview/product-support-levels diff --git a/docusaurus/sidebars.js b/docusaurus/sidebars.js index 25a22dd69c853..0a13c861d3bdd 100644 --- a/docusaurus/sidebars.js +++ b/docusaurus/sidebars.js @@ -546,6 +546,7 @@ const understandingAirbyte = { "understanding-airbyte/cdc", "understanding-airbyte/file-transfer", "understanding-airbyte/permission-sync", + "understanding-airbyte/unstructured-documents", "understanding-airbyte/resumability", "understanding-airbyte/json-avro-conversion", "understanding-airbyte/schemaless-sources-and-destinations", From e5778eb5d63dbda5b157e7908eced62c15e339a2 Mon Sep 17 00:00:00 2001 From: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Date: Sat, 15 Mar 2025 18:05:53 +0000 Subject: [PATCH 07/10] docs: add related topics sections to all documentation pages Co-Authored-By: Aaron Steers --- docs/understanding-airbyte/file-transfer.md | 5 +++++ docs/understanding-airbyte/permission-sync.md | 5 +++++ docs/understanding-airbyte/unstructured-documents.md | 5 +++++ 3 files changed, 15 insertions(+) diff --git a/docs/understanding-airbyte/file-transfer.md b/docs/understanding-airbyte/file-transfer.md index 1bfe2b714e333..798b7c6551a5c 100644 --- a/docs/understanding-airbyte/file-transfer.md +++ b/docs/understanding-airbyte/file-transfer.md @@ -71,3 +71,8 @@ Connectors that support this feature have the `supportsFileTransfer: true` flag ## Future Enhancements The File Sync capability is being expanded to support more source and destination connectors. Check the documentation of specific connectors to see if they support File Sync. + +## Related Topics + +- [Permission Sync](./permission-sync.md) - Learn about transferring access control information between systems +- [Parsing Unstructured Documents](./unstructured-documents.md) - Learn about extracting text from unstructured documents diff --git a/docs/understanding-airbyte/permission-sync.md b/docs/understanding-airbyte/permission-sync.md index 4662cfaa15ef4..8efe6fdaf560f 100644 --- a/docs/understanding-airbyte/permission-sync.md +++ b/docs/understanding-airbyte/permission-sync.md @@ -74,3 +74,8 @@ The Permission Sync capability is being actively developed with plans to support - Support for complex role-based access control (RBAC) systems. - Automated user/group identity mapping. - Audit logging for permission changes during sync. + +## Related Topics + +- [File Sync](./file-transfer.md) - Learn about transferring files between systems without parsing +- [Parsing Unstructured Documents](./unstructured-documents.md) - Learn about extracting text from unstructured documents diff --git a/docs/understanding-airbyte/unstructured-documents.md b/docs/understanding-airbyte/unstructured-documents.md index cc269d8390a94..d8010e497b4f6 100644 --- a/docs/understanding-airbyte/unstructured-documents.md +++ b/docs/understanding-airbyte/unstructured-documents.md @@ -76,3 +76,8 @@ The unstructured document parsing capability is being actively developed with pl - Enhanced metadata extraction. - Support for more document formats. - Integration with AI models for content analysis. + +## Related Topics + +- [File Sync](./file-transfer.md) - Learn about transferring files between systems without parsing +- [Permission Sync](./permission-sync.md) - Learn about transferring access control information between systems From 0954438f1c759de5b781097bf29c5e4ead49038b Mon Sep 17 00:00:00 2001 From: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Date: Sat, 15 Mar 2025 18:13:33 +0000 Subject: [PATCH 08/10] docs: move Understanding Airbyte section to Using Airbyte Co-Authored-By: Aaron Steers --- docusaurus/sidebars.js | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docusaurus/sidebars.js b/docusaurus/sidebars.js index 0a13c861d3bdd..440cfb68d4cdd 100644 --- a/docusaurus/sidebars.js +++ b/docusaurus/sidebars.js @@ -605,6 +605,7 @@ module.exports = { type: "doc", id: "using-airbyte/mappings", }, + understandingAirbyte, { type: "category", label: "Transformations", @@ -763,7 +764,6 @@ module.exports = { label: "Using PyAirbyte", id: "using-airbyte/pyairbyte/getting-started", }, - understandingAirbyte, { type: "category", label: "Licenses", From 0832ab9711969688667873a18ca6517d2d0e9c65 Mon Sep 17 00:00:00 2001 From: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Date: Sat, 15 Mar 2025 18:19:30 +0000 Subject: [PATCH 09/10] docs: move documentation files to using-airbyte directory Co-Authored-By: Aaron Steers --- .../file-transfer.md | 0 .../permission-sync.md | 0 .../unstructured-documents.md | 10 +++++----- docusaurus/redirects.yml | 6 +++--- 4 files changed, 8 insertions(+), 8 deletions(-) rename docs/{understanding-airbyte => using-airbyte}/file-transfer.md (100%) rename docs/{understanding-airbyte => using-airbyte}/permission-sync.md (100%) rename docs/{understanding-airbyte => using-airbyte}/unstructured-documents.md (95%) diff --git a/docs/understanding-airbyte/file-transfer.md b/docs/using-airbyte/file-transfer.md similarity index 100% rename from docs/understanding-airbyte/file-transfer.md rename to docs/using-airbyte/file-transfer.md diff --git a/docs/understanding-airbyte/permission-sync.md b/docs/using-airbyte/permission-sync.md similarity index 100% rename from docs/understanding-airbyte/permission-sync.md rename to docs/using-airbyte/permission-sync.md diff --git a/docs/understanding-airbyte/unstructured-documents.md b/docs/using-airbyte/unstructured-documents.md similarity index 95% rename from docs/understanding-airbyte/unstructured-documents.md rename to docs/using-airbyte/unstructured-documents.md index d8010e497b4f6..76ae65755133b 100644 --- a/docs/understanding-airbyte/unstructured-documents.md +++ b/docs/using-airbyte/unstructured-documents.md @@ -6,11 +6,11 @@ Airbyte provides capabilities for extracting and processing unstructured text do Traditional data integration typically focuses on structured data with well-defined schemas. However, many organizations need to extract value from unstructured documents such as: -- Text documents (Word, PDF, TXT) -- Emails and email attachments -- Web pages and HTML content -- Presentations and spreadsheets -- Scanned documents with OCR text +- Text documents (Word, PDF, TXT). +- Emails and email attachments. +- Web pages and HTML content. +- Presentations and spreadsheets. +- Scanned documents with OCR text. Airbyte's unstructured document parsing capabilities address these needs by extracting text content from various document formats and making it available for analysis, search, or AI processing. diff --git a/docusaurus/redirects.yml b/docusaurus/redirects.yml index 1fb95c0c3c613..ebeb753501e5f 100644 --- a/docusaurus/redirects.yml +++ b/docusaurus/redirects.yml @@ -38,11 +38,11 @@ - from: /cloud/managing-airbyte-cloud/manage-schema-changes to: /using-airbyte/schema-change-management - from: /file-sync - to: /understanding-airbyte/file-transfer + to: /using-airbyte/file-transfer - from: /permission-sync - to: /understanding-airbyte/permission-sync + to: /using-airbyte/permission-sync - from: /unstructured-data - to: /understanding-airbyte/unstructured-documents + to: /using-airbyte/unstructured-documents # November 2023 documentation restructure: - from: - /project-overview/product-support-levels From 5ced64e10ddf36ed89b10e85ff2f81fa8f3b1d93 Mon Sep 17 00:00:00 2001 From: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Date: Sat, 15 Mar 2025 18:19:41 +0000 Subject: [PATCH 10/10] docs: revert sidebar changes Co-Authored-By: Aaron Steers --- docusaurus/sidebars.js | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/docusaurus/sidebars.js b/docusaurus/sidebars.js index 440cfb68d4cdd..9f1aa15f5547c 100644 --- a/docusaurus/sidebars.js +++ b/docusaurus/sidebars.js @@ -544,9 +544,6 @@ const understandingAirbyte = { "understanding-airbyte/supported-data-types", "understanding-airbyte/secrets", "understanding-airbyte/cdc", - "understanding-airbyte/file-transfer", - "understanding-airbyte/permission-sync", - "understanding-airbyte/unstructured-documents", "understanding-airbyte/resumability", "understanding-airbyte/json-avro-conversion", "understanding-airbyte/schemaless-sources-and-destinations", @@ -605,7 +602,6 @@ module.exports = { type: "doc", id: "using-airbyte/mappings", }, - understandingAirbyte, { type: "category", label: "Transformations", @@ -764,6 +760,7 @@ module.exports = { label: "Using PyAirbyte", id: "using-airbyte/pyairbyte/getting-started", }, + understandingAirbyte, { type: "category", label: "Licenses",