Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(ingest): custom transformer remote executor #12864

Merged
merged 1 commit into from
Mar 13, 2025

Conversation

anshbansal
Copy link
Collaborator

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@github-actions github-actions bot added the ingestion PR or Issue related to the ingestion of metadata label Mar 13, 2025
Copy link

codecov bot commented Mar 13, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

✅ All tests successful. No failed tests found.

📢 Thoughts on this report? Let us know!

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@datahub-cyborg datahub-cyborg bot added the needs-review Label for PRs that need review from a maintainer. label Mar 13, 2025
@@ -0,0 +1,18 @@
FROM 795586375822.dkr.ecr.us-west-2.amazonaws.com/datahub-executor:v0.3.8.2-acryl
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be a build arg: https://docs.docker.com/build/building/variables/

Copy link
Collaborator Author

@anshbansal anshbansal Mar 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an example. Nobody will be using this actually. Everyone will have their own structure of transformers.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't mean we can't have a good default for folks.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is meant to be just an example. Please do explicitly call it out in dataset_transfomer.md

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is in examples folder. That is explicit

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@datahub-cyborg datahub-cyborg bot added pending-submitter-response Issue/request has been reviewed but requires a response from the submitter and removed needs-review Label for PRs that need review from a maintainer. labels Mar 13, 2025
pedro93
pedro93 approved these changes Mar 13, 2025
@datahub-cyborg datahub-cyborg bot added pending-submitter-merge and removed pending-submitter-response Issue/request has been reviewed but requires a response from the submitter labels Mar 13, 2025
datahub ingest -c ./custom_transformer/recipe.dhub.yaml
```

If you use this image for remote executor then you can set `file:///datahub-executor/custom_transformer` as an extra pip dependency to install the transformer in your ingestion.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add a screenshot here.

@@ -1632,4 +1632,24 @@ After running `datahub ingest -c <path_to_recipe>`, our MCEs will now have the f
],
```

### Using this in the remote executor (DataHub Cloud only)

Build the image with your transformer
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a line or two explaining what's in metadata-ingestion/examples/transforms/example.Dockerfile metadata-ingestion/examples/transforms and how to use it.

@skrydal skrydal self-requested a review March 13, 2025 14:08
Copy link
Collaborator

@skrydal skrydal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work! It will save us so much time.

@anshbansal anshbansal merged commit 2989175 into master Mar 13, 2025
156 of 163 checks passed
@anshbansal anshbansal deleted the ab-2025-mar-13-custom-transformer branch March 13, 2025 15:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ingestion PR or Issue related to the ingestion of metadata pending-submitter-merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants