-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ingestion) Adding vertexAI ingestion source (v1 - model group and model) #12632
Conversation
Codecov ReportAttention: Patch coverage is
📢 Thoughts on this report? Let us know! 🚀 New features to boost your workflow:
|
a204827
to
98aa10a
Compare
8e36548
to
51a13d7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every entity needs container aspects
super().__init__(**data) | ||
|
||
if self.credential: | ||
self._credentials_path = self.credential.create_credential_temp_file( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we actually need to create a credentials file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, not needed while GCPcredential need it, deleted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a bit confused - it looks like we're still writing the credentials to a file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not a blocker - but we should not be writing credentials to disk if we can avoid it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this file is actually used to create credentials object using service_account.Credentials util, which feed into aitplatform.init()
credentials = (
service_account.Credentials.from_service_account_file(
self.config._credentials_path
)
if self.config.credential
else None
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right - but we're getting the config as json or as a true file path in the original GCPCredential
then we take that credential and write it to a new file, storing the file path in self.config._credentials_path
. and then we load self.config._credentials_path
again.
that flow is pretty strange
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not so unusual to see this pattern (cred part only is written to temp file and loaded), but I understand your point of avoiding yet another file write, how about changing it to something like below,
credentials = (
service_account.Credentials.from_service_account_info(
self.config.get_credentials(). --> passing dict
)
super().__init__(**data) | ||
|
||
if self.credential: | ||
self._credentials_path = self.credential.create_credential_temp_file( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a bit confused - it looks like we're still writing the credentials to a file
super().__init__(**data) | ||
|
||
if self.credential: | ||
self._credentials_path = self.credential.create_credential_temp_file( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not a blocker - but we should not be writing credentials to disk if we can avoid it
if job.create_time | ||
else datetime_to_ts_millis(datetime.now()) | ||
) | ||
created_actor = f"urn:li:platformResource:{self.platform}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we using a platformResource here? or is this just a dummy value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is placeholder, since actor info is missing in Vertex Training job itself.
MLflow connector is using "urn:li:corpuser:datahub
. I can change to it for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving to unblock you, but noting that we agreed a number of things will need to be done in follow ups
- Fixing the way the unit tests are set up
- Moving to use
MCPW.construct_many
- Changing the audit stamp urns to not be platformResources
- Changing the auth flow to avoid writing credentials to a file
- (couple others that I might be forgetting - basically all pending "conversation" in the github PR review)
thanks for review! @hsheth2
|
Checklist