Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add update deduplication when there are multiple entries that updates the #30

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

kingman
Copy link
Collaborator

@kingman kingman commented Jul 16, 2024

deduplication when there are multiple entries that updates the same ad performance entry

@kingman kingman requested a review from chmstimoteo July 16, 2024 10:05
@martenlindblad
Copy link
Contributor

We're also seeing issues related to this.

Copy link
Collaborator

@chmstimoteo chmstimoteo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@martenlindblad
Copy link
Contributor

@chmstimoteo
We had the same issue with definitions/ads_domain/google_ads/update/update_dim_ads.sqlx

I added a dedup filter to the source sqls, if there's intereste in that solution I can add a PR for it and skip the workaraound in this PR.
Files updated in views: ad, adgroup, campaign, customer.sqlx

@chmstimoteo
Copy link
Collaborator

@martenlindblad pls create a PR with the proposed changes.
Very helpful, love getting these collaborations going.

@martenlindblad
Copy link
Contributor

martenlindblad commented Nov 8, 2024

@martenlindblad pls create a PR with the proposed changes. Very helpful, love getting these collaborations going.

Just added it:
#33

Copy link
Collaborator

@chmstimoteo chmstimoteo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls, apply similar fix to the update_fact_ad_conversion_daily.sqlx if yet not applied.

ad_id,
date_id,
device,
MAX(account_status) AS account_status,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kingman these columns with MAX() operators to deduplicate are slow changing dimensions, right?

Click, costs, impressions can be set to MAX... to get the latest performance values.
However, statuses we should get the current statuses (check this: https://cloud.google.com/bigquery/docs/google-ads-transfer#query_your_data)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants