Skip to content

Commit 81e9aff

Browse files
Joe Reuteraaronsteers
authored andcommitted
airbyte-lib: Hidden documentation (#34702)
Co-authored-by: Aaron ("AJ") Steers <[email protected]>
1 parent b9335d8 commit 81e9aff

16 files changed

+312
-61
lines changed

docs/assets/docs/airbyte-lib-high-level-architecture.svg

+1
Loading

docs/contributing-to-airbyte/writing-docs.md

+20
Original file line numberDiff line numberDiff line change
@@ -324,6 +324,26 @@ Back to ordinary markdown content.
324324
```
325325
Eagle-eyed readers may note that _all_ markdown should support this feature since it's part of the html spec. However, it's worth special mention since these dropdowns have been styled to be a graceful visual fit within our rendered documentation in all environments.
326326

327+
#### Documenting airbyte-lib usage
328+
329+
airbyte-lib is a Python library that allows to run syncs within a Python script for a subset of connectors. Documentation around airbyte-lib connectors is automatically generated from the connector's JSON schema spec.
330+
There are a few approaches to combine full control over the documentation with automatic generation for common cases:
331+
* If a connector is airbyte-lib enabled (`remoteRegistries.pypi.enabled` set in the `metadata.yaml` file of the connector) and there is no second-level heading `Usage with airbyte-lib` in the documentation, the documentation will be automatically generated and placed above the `Changelog` section.
332+
* By manually specifying a `Usage with airbyte-lib` section, this automatism is disabled. The following is a good starting point for this section:
333+
```md
334+
<HideInUI>
335+
336+
## Usage with airbyte-lib
337+
338+
<AirbyteLibExample connector="source-google-sheets" />
339+
340+
<SpecSchema connector="source-google-sheets" />
341+
342+
</HideInUI>
343+
```
344+
345+
The `AirbyteLibExample` component will generate a code example that can be run with airbyte-lib, excluding an auto-generated sample configuration based on the configuration schema. The `SpecSchema` component will generate a reference table with the connector's JSON schema spec, like a non-interactive version of the connector form in the UI. It can be used on any docs page.
346+
327347
## Additional guidelines
328348

329349
- If you're updating a connector doc, follow the [Connector documentation template](https://hackmd.io/Bz75cgATSbm7DjrAqgl4rw)
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
import AirbyteLibConnectors from '@site/src/components/AirbyteLibConnectors';
2+
3+
# Getting Started with AirbyteLib (Beta)
4+
5+
AirbyteLib is a library that provides a set of utilities to use Airbyte connectors in Python. It is meant to be used in situations where setting up an Airbyte server or cloud account is not possible or desirable, for example in a Jupyter notebook or when iterating on early prototypes on a developer's workstation.
6+
7+
## Installation
8+
9+
```bash
10+
pip install airbyte-lib
11+
```
12+
13+
Or during the beta, you may want to install the latest from from source with:
14+
15+
```bash
16+
pip install 'git+airbytehq/airbyte.git@master#egg=airbyte-lib&subdirectory=airbyte-lib'
17+
```
18+
19+
## Usage
20+
21+
Data can be extracted from sources and loaded into caches:
22+
23+
```python
24+
import airbyte_lib as ab
25+
26+
source = ab.get_connector(
27+
"source-spacex-api",
28+
config={"id": "605b4b6aaa5433645e37d03f"},
29+
install_if_missing=True,
30+
)
31+
source.check()
32+
33+
source.set_streams(["launches", "rockets", "capsules"])
34+
35+
cache = ab.new_local_cache()
36+
result = source.read_all(cache)
37+
38+
for name, records in result.cache.streams.items():
39+
print(f"Stream {name}: {len(records)} records")
40+
```
41+
42+
## API Reference
43+
44+
For details on specific classes and methods, please refer to our [AirbyteLib API Reference](./reference).
45+
46+
## Architecture
47+
48+
[comment]: <> (Edit under https://docs.google.com/drawings/d/1M7ti2D4ha6cEtPnk04RLp1SSh3au4dRJsLupnGPigHQ/edit?usp=sharing)
49+
50+
![Architecture](../../assets/docs/airbyte-lib-high-level-architecture.svg)
51+
52+
airbyte-lib is a python library that can be run in any context that supports Python >=3.9. It contains the following main components:
53+
* **Source**: A source object is using a Python connector and includes a configuration object. The configuration object is a dictionary that contains the configuration of the connector, like authentication or connection modalities. The source object is used to read data from the connector.
54+
* **Cache**: Data can be read directly from the source object. However, it is recommended to use a cache object to store the data. The cache object allows to temporarily store records from the source in a SQL database like a local DuckDB file or a Postgres or Snowflake instance.
55+
* **Result**: An object holding the records from a read operation on a source. It allows quick access to the records of each synced stream via the used cache object. Data can be accessed as a list of records, a Pandas DataFrame or via SQLAlchemy queries.
56+
57+
## Available connectors
58+
59+
The following connectors are available:
60+
61+
<AirbyteLibConnectors />
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
import AirbyteLibDefinitions from '@site/src/components/AirbyteLibDefinitions';
2+
3+
# airbyte-lib reference
4+
5+
This page contains the reference documentation for the airbyte-lib library.
6+
7+
## Main `airbyte_lib` module
8+
9+
<AirbyteLibDefinitions module="airbyte_lib" />
10+
11+
## Caches `airbyte_lib.caches`
12+
13+
The following cache implementations are available
14+
15+
<AirbyteLibDefinitions module="airbyte_lib.caches" />

docusaurus/docusaurus.config.js

+7-1
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ const darkCodeTheme = themes.dracula;
1111

1212
const docsHeaderDecoration = require("./src/remark/docsHeaderDecoration");
1313
const productInformation = require("./src/remark/productInformation");
14+
const connectorList = require("./src/remark/connectorList");
1415
const specDecoration = require("./src/remark/specDecoration");
1516

1617
const redirects = yaml.load(
@@ -66,6 +67,10 @@ const config = {
6667
test: /\.ya?ml$/,
6768
use: "yaml-loader",
6869
},
70+
{
71+
test: /\.html$/i,
72+
loader: "html-loader",
73+
},
6974
],
7075
},
7176
};
@@ -90,7 +95,8 @@ const config = {
9095
editUrl: "https://github.com/airbytehq/airbyte/blob/master/docs",
9196
path: "../docs",
9297
exclude: ["**/*.inapp.md"],
93-
remarkPlugins: [docsHeaderDecoration, productInformation, specDecoration],
98+
beforeDefaultRemarkPlugins: [specDecoration, connectorList], // use before-default plugins so TOC rendering picks up inserted headings
99+
remarkPlugins: [docsHeaderDecoration, productInformation],
94100
},
95101
blog: false,
96102
theme: {

docusaurus/package.json

+1
Original file line numberDiff line numberDiff line change
@@ -105,6 +105,7 @@
105105
"del": "6.1.1",
106106
"docusaurus-plugin-hubspot": "^1.0.0",
107107
"docusaurus-plugin-segment": "^1.0.3",
108+
"html-loader": "^4.2.0",
108109
"js-yaml": "^4.1.0",
109110
"json-schema-faker": "^0.5.4",
110111
"node-fetch": "^3.3.2",

docusaurus/pnpm-lock.yaml

+14
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
export default function AirbyteLibConnectors({
2+
connectorsJSON,
3+
}) {
4+
const connectors = JSON.parse(connectorsJSON);
5+
return <ul>
6+
{connectors.map((connector) => <li key={connector.name_oss}>
7+
<a href={`${getRelativeDocumentationUrl(connector)}#reference`}>{connector.name_oss}</a>
8+
</li>)}
9+
</ul>
10+
}
11+
12+
function getRelativeDocumentationUrl(connector) {
13+
// get the relative path from the the dockerRepository_oss (e.g airbyte/source-amazon-sqs -> /integrations/sources/amazon-sqs)
14+
15+
const fullDockerImage = connector.dockerRepository_oss;
16+
console.log(fullDockerImage);
17+
const dockerImage = fullDockerImage.split("airbyte/")[1];
18+
19+
const [integrationType, ...integrationName] = dockerImage.split("-");
20+
21+
return `/integrations/${integrationType}s/${integrationName.join("-")}`;
22+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
import React from 'react';
2+
3+
// Add additional modules here
4+
import main_docs from "../../../airbyte-lib/docs/generated/airbyte_lib.html";
5+
import caches_docs from "../../../airbyte-lib/docs/generated/airbyte_lib/caches.html";
6+
7+
const docs = {
8+
"airbyte_lib": main_docs,
9+
"airbyte_lib.caches": caches_docs,
10+
}
11+
12+
13+
export default function AirbyteLibDefinitions({ module }) {
14+
return <>
15+
<div dangerouslySetInnerHTML={{ __html: docs[module] }} />
16+
</>
17+
}

docusaurus/src/components/AirbyteLibExample.jsx

+24-6
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,32 @@
1-
import React from "react";
1+
import React, { useMemo } from "react";
22
import { JSONSchemaFaker } from "json-schema-faker";
33
import CodeBlock from '@theme/CodeBlock';
44

5+
/**
6+
* Generate a fake config based on the spec.
7+
*
8+
* As our specs are not 100% consistent, errors may occur.
9+
* Try to generate a few times before giving up.
10+
*/
11+
function generateFakeConfig(spec) {
12+
let tries = 5;
13+
while (tries > 0) {
14+
try {
15+
return JSON.stringify(JSONSchemaFaker.generate(spec), null, 2)
16+
}
17+
catch (e) {
18+
tries--;
19+
}
20+
}
21+
return "{ ... }";
22+
}
523

624
export const AirbyteLibExample = ({
725
specJSON,
8-
connector
26+
connector,
927
}) => {
10-
const spec = JSON.parse(specJSON);
11-
const fakeConfig = JSONSchemaFaker.generate(spec);
28+
const spec = useMemo(() => JSON.parse(specJSON), [specJSON]);
29+
const fakeConfig = useMemo(() => generateFakeConfig(spec), [spec]);
1230
return <>
1331
<p>
1432
Install the Python library via:
@@ -20,12 +38,12 @@ export const AirbyteLibExample = ({
2038
language="python"
2139
>{`import airbyte_lib as ab
2240
23-
config = ${JSON.stringify(fakeConfig, null, 2)}
41+
config = ${fakeConfig}
2442
2543
result = ab.get_connector(
2644
"${connector}",
2745
config=config,
28-
).read_all()
46+
).read()
2947
3048
for record in result.cache.streams["my_stream:name"]:
3149
print(record)`} </CodeBlock>

docusaurus/src/components/SpecSchema.jsx

+9-9
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ function JSONSchemaViewer(props) {
2727
Type
2828
</div>
2929
<div class={className(styles.headerItem, styles.tableHeader)}>
30-
Title
30+
Property name
3131
</div>
3232
<JSONSchemaObject schema={props.schema} />
3333
</div>
@@ -108,16 +108,16 @@ function getType(schema) {
108108

109109
function JSONSchemaProperty({ propertyKey, schema, required, depth = 0 }) {
110110
const newDepth = depth + 1;
111-
const propertyName = <>
112-
<div>{propertyKey || schema.title}</div>
111+
const fieldName = <>
112+
<div>{schema.title || propertyKey}</div>
113113
{required && <div className={styles.tag}>required</div>}
114114
</>;
115-
const typeAndTitle = <>
115+
const typeAndPropertyName = <>
116116
<div className={styles.headerItem}>
117117
{getType(schema)}
118118
</div>
119119
<div className={styles.headerItem}>
120-
{schema.title && <div>{schema.title}</div>}
120+
{propertyKey && <div>{propertyKey}</div>}
121121
</div>
122122
</>;
123123
if (showCollapsible(schema)) {
@@ -126,9 +126,9 @@ function JSONSchemaProperty({ propertyKey, schema, required, depth = 0 }) {
126126
<>
127127
<Disclosure.Button className={className(styles.headerItem, styles.clickable, styles.propertyName)} style={getIndentStyle(newDepth)}>
128128
<div className={className({ [styles.open]: open })}></div>
129-
{propertyName}
129+
{fieldName}
130130
</Disclosure.Button>
131-
{typeAndTitle}
131+
{typeAndPropertyName}
132132
<Disclosure.Panel className={styles.contents}>
133133
{showDescription(schema) && <Description schema={schema} style={getIndentStyle(newDepth + 1)} />}
134134
{schema.type === "object" && schema.oneOf && <JSONSchemaOneOf schema={schema} depth={newDepth} />}
@@ -140,9 +140,9 @@ function JSONSchemaProperty({ propertyKey, schema, required, depth = 0 }) {
140140
} else {
141141
return <>
142142
<div className={className(styles.headerItem, styles.propertyName)} style={getIndentStyle(newDepth)}>
143-
{propertyName}
143+
{fieldName}
144144
</div>
145-
{typeAndTitle}
145+
{typeAndPropertyName}
146146
</>
147147
}
148148
}

docusaurus/src/connector_registry.js

+6-1
Original file line numberDiff line numberDiff line change
@@ -8,4 +8,9 @@ const fetchCatalog = async () => {
88
return json;
99
};
1010

11-
module.exports = fetchCatalog();
11+
module.exports = {
12+
catalog: fetchCatalog(),
13+
isPypiConnector: (connector) => {
14+
return Boolean(connector.remoteRegistries_oss?.pypi?.enabled);
15+
}
16+
}
+24
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
const visit = require("unist-util-visit").visit;
2+
const { catalog, isPypiConnector } = require("../connector_registry");
3+
4+
const plugin = () => {
5+
const transformer = async (ast, vfile) => {
6+
7+
const registry = await catalog;
8+
9+
visit(ast, "mdxJsxFlowElement", (node) => {
10+
if (node.name !== "AirbyteLibConnectors") return;
11+
12+
const connectors = registry.filter(isPypiConnector);
13+
14+
node.attributes.push({
15+
type: "mdxJsxAttribute",
16+
name: "connectorsJSON",
17+
value: JSON.stringify(connectors)
18+
});
19+
});
20+
};
21+
return transformer;
22+
};
23+
24+
module.exports = plugin;

0 commit comments

Comments
 (0)