Skip to content

Commit 9031b49

Browse files
jjoyce0510John JoyceJohn Joyce
authored
fix(docs): Add improvements in examples for PATCH documentation (#12165)
Co-authored-by: John Joyce <[email protected]> Co-authored-by: John Joyce <[email protected]>
1 parent 89acda6 commit 9031b49

14 files changed

+321
-148
lines changed

docs/advanced/patch.md

+78-32
Original file line numberDiff line numberDiff line change
@@ -1,69 +1,120 @@
11
import Tabs from '@theme/Tabs';
22
import TabItem from '@theme/TabItem';
33

4-
# But First, Semantics: Upsert versus Patch
4+
# Emitting Patch Updates to DataHub
55

66
## Why Would You Use Patch
77

8-
By default, most of the SDK tutorials and API-s involve applying full upserts at the aspect level. This means that typically, when you want to change one field within an aspect without modifying others, you need to do a read-modify-write to not overwrite existing fields.
9-
To support these scenarios, DataHub supports PATCH based operations so that targeted changes to single fields or values within arrays of fields are possible without impacting other existing metadata.
8+
By default, most of the SDK tutorials and APIs involve applying full upserts at the aspect level, e.g. replacing the aspect entirely.
9+
This means that when you want to change even a single field within an aspect without modifying others, you need to do a read-modify-write to avoid overwriting existing fields.
10+
To support these scenarios, DataHub supports `PATCH` operations to perform targeted changes for individual fields or values within arrays of fields are possible without impacting other existing metadata.
1011

1112
:::note
1213

13-
Currently, PATCH support is only available for a selected set of aspects, so before pinning your hopes on using PATCH as a way to make modifications to aspect values, confirm whether your aspect supports PATCH semantics. The complete list of Aspects that are supported are maintained [here](https://github.com/datahub-project/datahub/blob/9588440549f3d99965085e97b214a7dabc181ed2/entity-registry/src/main/java/com/linkedin/metadata/models/registry/template/AspectTemplateEngine.java#L24). In the near future, we do have plans to automatically support PATCH semantics for aspects by default.
14+
Currently, PATCH support is only available for a selected set of aspects, so before pinning your hopes on using PATCH as a way to make modifications to aspect values, confirm whether your aspect supports PATCH semantics. The complete list of Aspects that are supported are maintained [here](https://github.com/datahub-project/datahub/blob/9588440549f3d99965085e97b214a7dabc181ed2/entity-registry/src/main/java/com/linkedin/metadata/models/registry/template/AspectTemplateEngine.java#L24).
1415

1516
:::
1617

17-
## How To Use Patch
18+
## How To Use Patches
1819

19-
Examples for using Patch are sprinkled throughout the API guides.
2020
Here's how to find the appropriate classes for the language for your choice.
2121

22-
2322
<Tabs>
24-
<TabItem value="Java" label="Java SDK">
23+
<TabItem value="Python" label="Python SDK" default>
2524

26-
The Java Patch builders are aspect-oriented and located in the [datahub-client](https://github.com/datahub-project/datahub/tree/master/metadata-integration/java/datahub-client/src/main/java/datahub/client/patch) module under the `datahub.client.patch` namespace.
25+
The Python Patch builders are entity-oriented and located in the [metadata-ingestion](https://github.com/datahub-project/datahub/tree/9588440549f3d99965085e97b214a7dabc181ed2/metadata-ingestion/src/datahub/specific) module and located in the `datahub.specific` module.
26+
Patch builder helper classes exist for
2727

28-
Here are a few illustrative examples using the Java Patch builders:
28+
- [Datasets](https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/specific/dataset.py)
29+
- [Charts](https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/specific/chart.py)
30+
- [Dashboards](https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/specific/dashboard.py)
31+
- [Data Jobs (Tasks)](https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/specific/datajob.py)
32+
- [Data Products](https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/specific/dataproduct.py)
2933

34+
And we are gladly accepting contributions for Containers, Data Flows (Pipelines), Tags, Glossary Terms, Domains, and ML Models.
3035

31-
### Add Custom Properties
36+
### Add & Remove Owners for Dataset
3237

33-
```java
34-
{{ inline /metadata-integration/java/examples/src/main/java/io/datahubproject/examples/DatasetCustomPropertiesAdd.java show_path_as_comment }}
38+
To add & remove specific owners for a dataset:
39+
40+
```python
41+
{{ inline /metadata-ingestion/examples/library/dataset_add_owner_patch.py show_path_as_comment }}
3542
```
3643

37-
### Add and Remove Custom Properties
44+
### Add & Remove Tags for Dataset
3845

39-
```java
40-
{{ inline /metadata-integration/java/examples/src/main/java/io/datahubproject/examples/DatasetCustomPropertiesAddRemove.java show_path_as_comment }}
46+
To add & remove specific tags for a dataset:
47+
48+
```python
49+
{{ inline /metadata-ingestion/examples/library/dataset_add_tag_patch.py show_path_as_comment }}
4150
```
4251

43-
### Add Data Job Lineage
52+
And for a specific schema field within the Dataset:
4453

45-
```java
46-
{{ inline /metadata-integration/java/examples/src/main/java/io/datahubproject/examples/DataJobLineageAdd.java show_path_as_comment }}
54+
```python
55+
{{ inline /metadata-ingestion/examples/library/dataset_field_add_tag_patch.py show_path_as_comment }}
4756
```
4857

49-
</TabItem>
50-
<TabItem value="Python" label="Python SDK" default>
58+
### Add & Remove Glossary Terms for Dataset
59+
60+
To add & remove specific glossary terms for a dataset:
61+
62+
```python
63+
{{ inline /metadata-ingestion/examples/library/dataset_add_glossary_term_patch.py show_path_as_comment }}
64+
```
65+
66+
And for a specific schema field within the Dataset:
67+
68+
```python
69+
{{ inline /metadata-ingestion/examples/library/dataset_field_add_glossary_term_patch.py show_path_as_comment }}
70+
```
71+
72+
### Add & Remove Structured Properties for Dataset
5173

52-
The Python Patch builders are entity-oriented and located in the [metadata-ingestion](https://github.com/datahub-project/datahub/tree/9588440549f3d99965085e97b214a7dabc181ed2/metadata-ingestion/src/datahub/specific) module and located in the `datahub.specific` module.
74+
To add & remove structured properties for a dataset:
5375

54-
Here are a few illustrative examples using the Python Patch builders:
76+
```python
77+
{{ inline /metadata-ingestion/examples/library/dataset_add_structured_properties_patch.py show_path_as_comment }}
78+
```
5579

56-
### Add Properties to Dataset
80+
### Add & Remove Upstream Lineage for Dataset
81+
82+
To add & remove a lineage edge connecting a dataset to it's upstream or input at both the dataset and schema field level:
5783

5884
```python
59-
{{ inline /metadata-ingestion/examples/library/dataset_add_properties.py show_path_as_comment }}
85+
{{ inline /metadata-ingestion/examples/library/dataset_add_upstream_lineage_patch.py show_path_as_comment }}
86+
```
87+
88+
### Add & Remove Read-Only Custom Properties for Dataset
89+
90+
To add & remove specific custom properties for a dataset:
91+
92+
```python
93+
{{ inline /metadata-ingestion/examples/library/dataset_add_remove_custom_properties_patch.py show_path_as_comment }}
94+
```
95+
96+
</TabItem>
97+
<TabItem value="Java" label="Java SDK">
98+
99+
The Java Patch builders are aspect-oriented and located in the [datahub-client](https://github.com/datahub-project/datahub/tree/master/metadata-integration/java/datahub-client/src/main/java/datahub/client/patch) module under the `datahub.client.patch` namespace.
100+
101+
### Add & Remove Read-Only Custom Properties
102+
103+
```java
104+
{{ inline /metadata-integration/java/examples/src/main/java/io/datahubproject/examples/DatasetCustomPropertiesAddRemove.java show_path_as_comment }}
105+
```
106+
107+
### Add Data Job Lineage
108+
109+
```java
110+
{{ inline /metadata-integration/java/examples/src/main/java/io/datahubproject/examples/DataJobLineageAdd.java show_path_as_comment }}
60111
```
61112

62113
</TabItem>
63114
</Tabs>
64115

65116

66-
## How Patch works
117+
## Advanced: How Patch works
67118

68119
To understand how patching works, it's important to understand a bit about our [models](../what/aspect.md). Entities are comprised of Aspects
69120
which can be reasoned about as JSON representations of the object models. To be able to patch these we utilize [JsonPatch](https://jsonpatch.com/). The components of a JSON Patch are the path, operation, and value.
@@ -73,9 +124,6 @@ which can be reasoned about as JSON representations of the object models. To be
73124
The JSON path refers to a value within the schema. This can be a single field or can be an entire object reference depending on what the path is.
74125
For our patches we are primarily targeting single fields or even single array elements within a field. To be able to target array elements by id, we go through a translation process
75126
of the schema to transform arrays into maps. This allows a path to reference a particular array element by key rather than by index, for example a specific tag urn being added to a dataset.
76-
This is important to note that for some fields in our schema that are arrays which do not necessarily restrict uniqueness, this puts a uniqueness constraint on the key.
77-
The key for objects stored in arrays is determined manually by examining the schema and a long term goal is to make these keys annotation driven to reduce the amount of code needed to support
78-
additional aspects to be patched. There is a generic patch endpoint, but it requires any array field keys to be specified at request time, putting a lot of burden on the API user.
79127

80128
#### Examples
81129

@@ -87,8 +135,7 @@ Breakdown:
87135
* `/upstreams` -> References the upstreams field of the UpstreamLineage aspect, this is an array of Upstream objects where the key is the Urn
88136
* `/urn:...` -> The dataset to be targeted by the operation
89137

90-
91-
A patch path for targeting a fine grained lineage upstream:
138+
A patch path for targeting a fine-grained lineage upstream:
92139

93140
`/fineGrainedLineages/TRANSFORM/urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_created,PROD),foo)/urn:li:query:queryId/urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_created_upstream,PROD),bar)`
94141

@@ -118,7 +165,6 @@ using adds, but generally the most useful use case for patch is to add elements
118165

119166
Remove operations require the path specified to be present, or an error will be thrown, otherwise they operate as one would expect. The specified path will be removed from the aspect.
120167

121-
122168
### Value
123169

124170
Value is the actual information that will be stored at a path. If the path references an object then this will include the JSON key value pairs for that object.

docs/api/tutorials/custom-properties.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@ The following code adds custom properties `cluster_name` and `retention_time` to
7474
<TabItem value="python" label="Python" default>
7575

7676
```python
77-
{{ inline /metadata-ingestion/examples/library/dataset_add_properties.py show_path_as_comment }}
77+
{{ inline /metadata-ingestion/examples/library/dataset_add_custom_properties_patch.py show_path_as_comment }}
7878
```
7979

8080
</TabItem>
@@ -128,7 +128,7 @@ The following code shows you how can add and remove custom properties in the sam
128128
<TabItem value="python" label="Python" default>
129129

130130
```python
131-
{{ inline /metadata-ingestion/examples/library/dataset_add_remove_properties.py show_path_as_comment }}
131+
{{ inline /metadata-ingestion/examples/library/dataset_add_remove_custom_properties_patch.py show_path_as_comment }}
132132
```
133133

134134
</TabItem>
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
from datahub.emitter.mce_builder import make_dataset_urn
2+
from datahub.ingestion.graph.client import DataHubGraph, DataHubGraphConfig
3+
from datahub.specific.dataset import DatasetPatchBuilder
4+
5+
# Create DataHub Client
6+
datahub_client = DataHubGraph(DataHubGraphConfig(server="http://localhost:8080"))
7+
8+
# Create Dataset URN
9+
dataset_urn = make_dataset_urn(platform="hive", name="fct_users_created", env="PROD")
10+
11+
# Create Dataset Patch to Add Custom Properties
12+
patch_builder = DatasetPatchBuilder(dataset_urn)
13+
patch_builder.add_custom_property("cluster_name", "datahubproject.acryl.io")
14+
patch_builder.add_custom_property("retention_time", "2 years")
15+
patch_mcps = patch_builder.build()
16+
17+
# Emit Dataset Patch
18+
for patch_mcp in patch_mcps:
19+
datahub_client.emit(patch_mcp)
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
from datahub.emitter.mce_builder import make_dataset_urn, make_term_urn
2+
from datahub.ingestion.graph.client import DataHubGraph, DataHubGraphConfig
3+
from datahub.metadata.schema_classes import GlossaryTermAssociationClass
4+
from datahub.specific.dataset import DatasetPatchBuilder
5+
6+
# Create DataHub Client
7+
datahub_client = DataHubGraph(DataHubGraphConfig(server="http://localhost:8080"))
8+
9+
# Create Dataset URN
10+
dataset_urn = make_dataset_urn(
11+
platform="snowflake", name="fct_users_created", env="PROD"
12+
)
13+
14+
# Create Dataset Patch to Add + Remove Term for 'profile_id' column
15+
patch_builder = DatasetPatchBuilder(dataset_urn)
16+
patch_builder.add_term(GlossaryTermAssociationClass(make_term_urn("term-to-add-id")))
17+
patch_builder.remove_term(make_term_urn("term-to-remove-id"))
18+
patch_mcps = patch_builder.build()
19+
20+
# Emit Dataset Patch
21+
for patch_mcp in patch_mcps:
22+
datahub_client.emit(patch_mcp)
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
from datahub.emitter.mce_builder import make_dataset_urn, make_group_urn, make_user_urn
2+
from datahub.ingestion.graph.client import DataHubGraph, DataHubGraphConfig
3+
from datahub.metadata.schema_classes import OwnerClass, OwnershipTypeClass
4+
from datahub.specific.dataset import DatasetPatchBuilder
5+
6+
# Create DataHub Client
7+
datahub_client = DataHubGraph(DataHubGraphConfig(server="http://localhost:8080"))
8+
9+
# Create Dataset URN
10+
dataset_urn = make_dataset_urn(
11+
platform="snowflake", name="fct_users_created", env="PROD"
12+
)
13+
14+
# Create Dataset Patch to Add + Remove Owners
15+
patch_builder = DatasetPatchBuilder(dataset_urn)
16+
patch_builder.add_owner(
17+
OwnerClass(make_user_urn("user-to-add-id"), OwnershipTypeClass.TECHNICAL_OWNER)
18+
)
19+
patch_builder.remove_owner(make_group_urn("group-to-remove-id"))
20+
patch_mcps = patch_builder.build()
21+
22+
# Emit Dataset Patch
23+
for patch_mcp in patch_mcps:
24+
datahub_client.emit(patch_mcp)

metadata-ingestion/examples/library/dataset_add_properties.py

-44
This file was deleted.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
from datahub.emitter.mce_builder import make_dataset_urn
2+
from datahub.ingestion.graph.client import DataHubGraph, DataHubGraphConfig
3+
from datahub.specific.dataset import DatasetPatchBuilder
4+
5+
# Create DataHub Client
6+
datahub_client = DataHubGraph(DataHubGraphConfig(server="http://localhost:8080"))
7+
8+
# Create Dataset URN
9+
dataset_urn = make_dataset_urn(platform="hive", name="fct_users_created", env="PROD")
10+
11+
# Create Dataset Patch to Add + Remove Custom Properties
12+
patch_builder = DatasetPatchBuilder(dataset_urn)
13+
patch_builder.add_custom_property("cluster_name", "datahubproject.acryl.io")
14+
patch_builder.remove_custom_property("retention_time")
15+
patch_mcps = patch_builder.build()
16+
17+
# Emit Dataset Patch
18+
for patch_mcp in patch_mcps:
19+
datahub_client.emit(patch_mcp)

metadata-ingestion/examples/library/dataset_add_remove_properties.py

-46
This file was deleted.

metadata-ingestion/examples/library/dataset_add_structured_properties.py

-24
This file was deleted.

0 commit comments

Comments
 (0)