✨ Source-LinkedIn-Ads: Performance improvements for Campaign Analytics Streams #55747
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What
When working with Ad Campaign Analytics streams, the connector sends an excessive number of requests to LinkedIn API. Some of them are unnecessary as we know that the API will not return any data beforehand. The reason is that the connector creates the same slices for all campaigns fetched incrementally. However, some of the campaigns are already
COMPLETED
orPAUSED
, etc.I have already created an issue for this: Slow Performance of Analytics Streams. You can find more details in the issue.
How
As this connector uses
PerPartitionCursor
, I extended this class and passed some extra information aboutcampaigns
to theDatatimeBasedCursor
. With this information, extendedDatatimeBasedCursor
class filters slices it creates.To do so, I extended the partition routers of Ad Campaign Analytics streams in
metadata.yaml
so we getstatus
,runSchedule
andlastModified
extra fields from parent. I createdAnalyticsPerPartitionCursor
cursor that passes extra information to theCampaignAnalyticsDatetimeBasedCursor
. Then, this cursor uses the information while generating slices.I have also changed the state structure for the Ad Campaign Analytics streams. With the new structure, states will also keep the information about the latest values of
status
,lastModified
andrunschedule
for campaigns. This information will be used in the next sync to decide on slices.Review guide
Please check that the logic for filtering slices for campaigns is correctly defined in
CampaignAnalyticsDatetimeBasedCursor.stream_slices
.User Impact
It will shorten the sync duration for Ad Campaign Analytics streams. For instance, we have over 1000 campaigns in our LinkedIn Ads account after
2024-01-01
. Previously incremental sync time for any Ad Campaign Analytics was over 4 hours. Currently, it's around 40-45 minutes.Can this PR be safely reverted and rolled back?