-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs(ingest): custom transformer remote executor #12864
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅ ✅ All tests successful. No failed tests found. 📢 Thoughts on this report? Let us know! 🚀 New features to boost your workflow:
|
@@ -0,0 +1,18 @@ | |||
FROM 795586375822.dkr.ecr.us-west-2.amazonaws.com/datahub-executor:v0.3.8.2-acryl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be a build arg: https://docs.docker.com/build/building/variables/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an example. Nobody will be using this actually. Everyone will have their own structure of transformers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't mean we can't have a good default for folks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is meant to be just an example. Please do explicitly call it out in dataset_transfomer.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is in examples folder. That is explicit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole heading https://datahubproject.io/docs/metadata-ingestion/docs/transformer/dataset_transformer/#writing-a-custom-transformer-from-scratch is an example already.
datahub ingest -c ./custom_transformer/recipe.dhub.yaml | ||
``` | ||
|
||
If you use this image for remote executor then you can set `file:///datahub-executor/custom_transformer` as an extra pip dependency to install the transformer in your ingestion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could add a screenshot here.
@@ -1632,4 +1632,24 @@ After running `datahub ingest -c <path_to_recipe>`, our MCEs will now have the f | |||
], | |||
``` | |||
|
|||
### Using this in the remote executor (DataHub Cloud only) | |||
|
|||
Build the image with your transformer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add a line or two explaining what's in metadata-ingestion/examples/transforms/example.Dockerfile metadata-ingestion/examples/transforms and how to use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work! It will save us so much time.
Checklist