Skip to content

Commit 2989175

Browse files
authored
docs(ingest): custom transformer remote executor (#12864)
1 parent 4305a62 commit 2989175

File tree

5 files changed

+63
-3
lines changed

5 files changed

+63
-3
lines changed

metadata-ingestion/docs/transformer/dataset_transformer.md

+20
Original file line numberDiff line numberDiff line change
@@ -1632,4 +1632,24 @@ After running `datahub ingest -c <path_to_recipe>`, our MCEs will now have the f
16321632
],
16331633
```
16341634

1635+
### Using this in the remote executor (DataHub Cloud only)
1636+
1637+
Build the image with your transformer
1638+
```
1639+
docker build -t acryldata:customtransform1 -f metadata-ingestion/examples/transforms/example.Dockerfile metadata-ingestion/examples/transforms
1640+
```
1641+
1642+
Test it works
1643+
```
1644+
docker run -it --rm acryldata:customtransform1 bash
1645+
```
1646+
1647+
Inside the docker container
1648+
```
1649+
source venv/bin/activate
1650+
datahub ingest -c ./custom_transformer/recipe.dhub.yaml
1651+
```
1652+
1653+
If you use this image for remote executor then you can set `file:///datahub-executor/custom_transformer` as an extra pip dependency to install the transformer in your ingestion.
1654+
16351655
All the files for this tutorial may be found [here](../../examples/transforms/).
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
FROM 795586375822.dkr.ecr.us-west-2.amazonaws.com/datahub-executor:v0.3.8.2-acryl
2+
3+
COPY setup.py custom_transformer/setup.py
4+
COPY custom_transform_example.py custom_transformer/custom_transform_example.py
5+
COPY owners.json custom_transformer/owners.json
6+
COPY recipe.dhub.yaml custom_transformer/recipe.dhub.yaml
7+
8+
USER root
9+
10+
RUN chown -R root:root custom_transformer/* && \
11+
chmod -R u+rw custom_transformer && \
12+
python3 -m venv venv && \
13+
. venv/bin/activate && \
14+
pip install uv && \
15+
uv pip install 'acryl-datahub' && \
16+
uv pip install -e 'file:///datahub-executor/custom_transformer'
17+
18+
USER datahub
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
[
2+
"urn:li:corpuser:athos",
3+
"urn:li:corpuser:porthos",
4+
"urn:li:corpuser:aramis",
5+
"urn:li:corpGroup:the_three_musketeers"
6+
]
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
source:
2+
type: demo-data
3+
config: {}
4+
5+
transformers:
6+
- type: "custom_transform_example_alias"
7+
config:
8+
owners_json: /datahub-executor/custom_transformer/owners.json
9+
10+
sink:
11+
type: "console"
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,12 @@
1-
from setuptools import find_packages, setup
1+
from setuptools import setup
22

33
setup(
4-
name="custom_transform_example",
4+
name="custom_transformer",
55
version="1.0",
6-
packages=find_packages(),
6+
py_modules=["custom_transform_example"],
7+
entry_points={
8+
"datahub.ingestion.transformer.plugins": [
9+
"custom_transform_example_alias = custom_transform_example:AddCustomOwnership",
10+
],
11+
},
712
)

0 commit comments

Comments
 (0)