Skip to content

Commit eefb576

Browse files
Address review comments
1 parent 5dbebbc commit eefb576

File tree

19 files changed

+588
-1028
lines changed

19 files changed

+588
-1028
lines changed

docs/lineage/prefect.md

+90-4
Original file line numberDiff line numberDiff line change
@@ -8,13 +8,13 @@ DataHub supports integration of
88

99
## What is Prefect Datahub Block?
1010

11-
Blocks are primitive within Prefect that enable the storage of configuration and provide an interface for interacting with external systems. We integrated [prefect-datahub](https://prefecthq.github.io/prefect-datahub/) block which use [Datahub Rest](../../metadata-ingestion/sink_docs/datahub.md#datahub-rest) emitter to emit metadata events while running prefect flow.
11+
Blocks are primitive within Prefect that enable the storage of configuration and provide an interface for interacting with external systems. We integrated `prefect-datahub` block which use [Datahub Rest](../../metadata-ingestion/sink_docs/datahub.md#datahub-rest) emitter to emit metadata events while running prefect flow.
1212

1313
## Prerequisites to use Prefect Datahub Block
1414

1515
1. You need to use either Prefect Cloud (recommended) or the self hosted Prefect server.
16-
2. Refer [Cloud Quickstart](https://docs.prefect.io/2.10.13/cloud/cloud-quickstart/) to setup Prefect Cloud.
17-
3. Refer [Host Prefect server](https://docs.prefect.io/2.10.13/host/) to setup self hosted Prefect server.
16+
2. Refer [Cloud Quickstart](https://docs.prefect.io/latest/getting-started/quickstart/) to setup Prefect Cloud.
17+
3. Refer [Host Prefect server](https://docs.prefect.io/latest/guides/host/) to setup self hosted Prefect server.
1818
4. Make sure the Prefect api url is set correctly. You can check it by running below command:
1919
```shell
2020
prefect profile inspect
@@ -24,7 +24,93 @@ prefect profile inspect
2424

2525
## Setup
2626

27-
For setup details please refer [prefect-datahub](https://prefecthq.github.io/prefect-datahub/).
27+
### Installation
28+
29+
Install `prefect-datahub` with `pip`:
30+
31+
```shell
32+
pip install 'prefect-datahub'
33+
```
34+
35+
Requires an installation of Python 3.7+.
36+
37+
### Saving configurations to a block
38+
39+
This is a one-time activity, where you can save the configuration on the [Prefect block document store](https://docs.prefect.io/latest/concepts/blocks/#saving-blocks).
40+
While saving you can provide below configurations. Default value will get set if not provided while saving the configuration to block.
41+
42+
Config | Type | Default | Description
43+
--- | --- | --- | ---
44+
datahub_rest_url | `str` | *http://localhost:8080* | DataHub GMS REST URL
45+
env | `str` | *PROD* | The environment that all assets produced by this orchestrator belong to. For more detail and possible values refer [here](https://datahubproject.io/docs/graphql/enums/#fabrictype).
46+
platform_instance | `str` | *None* | The instance of the platform that all assets produced by this recipe belong to. For more detail please refer [here](https://datahubproject.io/docs/platform-instances/).
47+
48+
```python
49+
from prefect_datahub.datahub_emitter import DatahubEmitter
50+
DatahubEmitter(
51+
datahub_rest_url="http://localhost:8080",
52+
env="PROD",
53+
platform_instance="local_prefect"
54+
).save("BLOCK-NAME-PLACEHOLDER")
55+
```
56+
57+
Congrats! You can now load the saved block to use your configurations in your Flow code:
58+
59+
```python
60+
from prefect_datahub.datahub_emitter import DatahubEmitter
61+
DatahubEmitter.load("BLOCK-NAME-PLACEHOLDER")
62+
```
63+
64+
!!! info "Registering blocks"
65+
66+
Register blocks in this module to
67+
[view and edit them](https://docs.prefect.io/ui/blocks/)
68+
on Prefect Cloud:
69+
70+
```bash
71+
prefect block register -m prefect_datahub
72+
```
73+
74+
### Load the saved block in prefect workflows
75+
76+
After installing `prefect-datahub` and [saving the configution](#saving-configurations-to-a-block), you can easily use it within your prefect workflows to help you emit metadata event as show below!
77+
78+
```python
79+
from prefect import flow, task
80+
from prefect_datahub.dataset import Dataset
81+
from prefect_datahub.datahub_emitter import DatahubEmitter
82+
83+
datahub_emitter = DatahubEmitter.load("MY_BLOCK_NAME")
84+
85+
@task(name="Transform", description="Transform the data")
86+
def transform(data):
87+
data = data.split(" ")
88+
datahub_emitter.add_task(
89+
inputs=[Dataset("snowflake", "mydb.schema.tableA")],
90+
outputs=[Dataset("snowflake", "mydb.schema.tableC")],
91+
)
92+
return data
93+
94+
@flow(name="ETL flow", description="Extract transform load flow")
95+
def etl():
96+
data = transform("This is data")
97+
datahub_emitter.emit_flow()
98+
```
99+
100+
**Note**: To emit the tasks, user compulsory need to emit flow. Otherwise nothing will get emit.
101+
102+
## Concept mapping
103+
104+
Prefect concepts are documented [here](https://docs.prefect.io/latest/concepts/), and datahub concepts are documented [here](https://datahubproject.io/docs/what-is-datahub/datahub-concepts).
105+
106+
Prefect Concept | DataHub Concept
107+
--- | ---
108+
[Flow](https://docs.prefect.io/latest/concepts/flows/) | [DataFlow](https://datahubproject.io/docs/generated/metamodel/entities/dataflow/)
109+
[Flow Run](https://docs.prefect.io/latest/concepts/flows/#flow-runs) | [DataProcessInstance](https://datahubproject.io/docs/generated/metamodel/entities/dataprocessinstance)
110+
[Task](https://docs.prefect.io/latest/concepts/tasks/) | [DataJob](https://datahubproject.io/docs/generated/metamodel/entities/datajob/)
111+
[Task Run](https://docs.prefect.io/latest/concepts/tasks/#tasks) | [DataProcessInstance](https://datahubproject.io/docs/generated/metamodel/entities/dataprocessinstance)
112+
[Task Tag](https://docs.prefect.io/latest/concepts/tasks/#tags) | [Tag](https://datahubproject.io/docs/generated/metamodel/entities/tag/)
113+
28114

29115
## How to validate saved block and emit of metadata
30116

metadata-ingestion-modules/prefect-plugin/docs/concept_mapping.md

-12
This file was deleted.

metadata-ingestion-modules/prefect-plugin/docs/datahub_emitter.md

-2
This file was deleted.

metadata-ingestion-modules/prefect-plugin/docs/gen_blocks_catalog.py

-102
This file was deleted.

metadata-ingestion-modules/prefect-plugin/docs/gen_examples_catalog.py

-120
This file was deleted.

metadata-ingestion-modules/prefect-plugin/docs/gen_home_page.py

-21
This file was deleted.
Binary file not shown.
Binary file not shown.

metadata-ingestion-modules/prefect-plugin/docs/overrides/partials/integrations/analytics/custom.html

-16
This file was deleted.

0 commit comments

Comments
 (0)