-
-
Notifications
You must be signed in to change notification settings - Fork 962
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Telemetry implementation #2359
Telemetry implementation #2359
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just quick look, do we need to disable telemetry on CI runs or move them to a separate group tracking to not skew the data?
It might be possible that some of the environment manipulation is breaking deepspeed as hf needs information set into the environment about some of the deepspeed options. |
…/ end, error tracking
Closing for now in favor of #2366. |
Description
Telemetry for Axolotl. Implementation of system described in #2323.
Features the following:
axolotl.telemetry.manager
:TelemetryManager
class that interacts with the PostHog Python SDK and sends telemetry. Handles opt-out (viaDO_NOT_TRACK
orAXOLOTL_DO_NOT_TRACK
env vars), organization whitelisting (allbase_model
s except for those that are whitelisted are redacted), and path sanitization (_path
or_dir
config (key, value) pairs are redacted).axolotl.telemetry.errors
: Function decorator implementation for tracking errors in various important methods in the codebase (preprocess
,train
,evaluate
,inference
, etc.). Includes path sanitization to remove local paths, but keeps axolotl paths / other Python library paths for debugging purposes.axolotl.telemetry.runtime_metrics
: Dataclass for metrics tracked during model training, including timing metrics, memory metrics, and elapsed steps / epochs.axolotl.telemetry.callbacks
: Includes callback for trainer to send telemetry related to the aforementioned runtime metrics class.Motivation and Context
See #2323.
How has this been tested?
Pytest tests for each added module + some manual testing and viewing of telemetry in PostHog console.