Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Telemetry implementation #2359

Closed
wants to merge 22 commits into from
Closed

Telemetry implementation #2359

wants to merge 22 commits into from

Conversation

djsaunde
Copy link
Contributor

Description

Telemetry for Axolotl. Implementation of system described in #2323.

Features the following:

  • axolotl.telemetry.manager: TelemetryManager class that interacts with the PostHog Python SDK and sends telemetry. Handles opt-out (via DO_NOT_TRACK or AXOLOTL_DO_NOT_TRACK env vars), organization whitelisting (all base_models except for those that are whitelisted are redacted), and path sanitization (_path or _dir config (key, value) pairs are redacted).
  • axolotl.telemetry.errors: Function decorator implementation for tracking errors in various important methods in the codebase (preprocess, train, evaluate, inference, etc.). Includes path sanitization to remove local paths, but keeps axolotl paths / other Python library paths for debugging purposes.
  • axolotl.telemetry.runtime_metrics: Dataclass for metrics tracked during model training, including timing metrics, memory metrics, and elapsed steps / epochs.
  • axolotl.telemetry.callbacks: Includes callback for trainer to send telemetry related to the aforementioned runtime metrics class.

Motivation and Context

See #2323.

How has this been tested?

Pytest tests for each added module + some manual testing and viewing of telemetry in PostHog console.

Copy link
Collaborator

@NanoCode012 NanoCode012 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just quick look, do we need to disable telemetry on CI runs or move them to a separate group tracking to not skew the data?

@djsaunde
Copy link
Contributor Author

Example events in PostHog:

image

E.g., expanding the properties under system-info:

image

@winglian
Copy link
Collaborator

It might be possible that some of the environment manipulation is breaking deepspeed as hf needs information set into the environment about some of the deepspeed options.

@djsaunde
Copy link
Contributor Author

Closing for now in favor of #2366.

@djsaunde djsaunde closed this Feb 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants