Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New BYOD ydd files #20

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

New BYOD ydd files #20

wants to merge 1 commit into from

Conversation

zbayoff
Copy link
Owner

@zbayoff zbayoff commented Feb 16, 2023

BYOD Dataset Review

BYOD Dataset Information

Dataset High Level Information

Dataset owner information
Owner Name owner name
Owner Team owner team
Owner Email team email
Owner Slack Channel #slack channel
What data is being provided?

asdfas

Where does this data come from?

asdf

  • PR description for this section completed

Why do you need this data and how will you use it?

How do you plan on using this data?

asdf

How often will this data be queried?

fdsa

Is there any existing IA data that you would like to join this data on?

asdf

  • PR description for this section completed

Dataset detailed information

What regions/cloud providers/datacenter will this data reside in?

fdsa

Will this dataset potentially need to be backfilled?

asdf

How often will the structure of this data change?

fasdf

What is the approximate size of the data?

asdfas

  • PR description for this section completed

Data sensitivity and access control

What teams (other than the owner team) will need access to this data?

fdsa

Does the data constitute substantive “Customer Data” sent to Datadog for processing ?

asdf

Do you have a data deletion process? [Answer If your dataset DOES include Customer Data (answered yes above]

fdsa

Does the data contain usage data? (https://datadoghq.atlassian.net/wiki/spaces/IA/pages/2717450835/Usage+Data)?

asdf

Are any parts of the data region-restricted?

fdsa

Does this data contain any PII? (If yes, please add details)

fasdf

  • PR description for this section completed

Review Checklist

For Dataset Owner/ PR Submitter

Go through the checklist and make sure you can check each box. We cannot merge the PR until each of these is done.

  • Reviewed Best practices for new datasets for naming and description guidelines.
  • All sections of BYOD Dataset Information filled out
  • YDD YAML File completely filled out for Dataset
    • base_table_name matches the file name and dataset name from description
    • Table name is specific enough to this dataset. This name is clear, unambiguous, and suggests what data can be found in this dataset and what its scope is.
    • data.display_name appropriately matches
    • description is filled out and is a high level description of the dataset. Make sure that the description :
      • can be understood by any user not familiar with the dataset
      • explains the main analysis use cases for the dataset
      • explains where the data comes from and which team is the owner
      • explicitly defines any acronyms
      • includes what is the granularity of the dataset (e.g. “one line per ID per data date”, etc.)
    • region_restrictions field matches answer to data being region restricted in PR description
    • data.cloud_locations is filled in for each DC you want data ingested from
    • My dataset actually exists at these locations so it can be tested
    • file_format is accurate to the data being provided
    • owner, owner_slack_channel match PR description
    • optional_parameters section is filled in if needed
    • optional_parameters.load_type if this is not INSERT
    • optional_parameters.date_partition_interval_hours if this is not produced daily. Or put 24 to be explicit.
    • table_columns is filled out and matches the schema of the dataset in cloud storage
    • each column has a description. This description does not just repeat the column name, and it explicitly defines any acronyms or obscure terminology.

For Internal Analytics

Go through the checklist and make sure you can check each box. We cannot merge the PR until each of these is done.

Dataset details review (IAX / IAD)

  • Confirmed that all sections of BYOD Dataset Information filled out
  • Confirmed that YDD is completely and accurately filled out
    • Table and column have quality descriptions
    • privacy_metadata is filled out and matches description
    • dataset_documentation link works and has accurate description of dataset
  • Raised a review request with Privacy team (github team privacy-ops) if any of contains_pii, contains_customer_data, or contains_employee_data are True. (If there are other sensitive aspects to the data beyond these questions, use your discretion of when to contact privacy.)

BYOD technical review (IAI)

  • Ran a test for this dataset (mortar byod job with --test_run true)
  • If adding this dataset required changes to Snowflake (users/secrets/storage_integrations) or cloudops/cloud-inventory (data access for storage_integrations), those PRs are linked here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant