Replies: 8 comments 14 replies
-
"opt-out"??? No thanks. Not cool. |
Beta Was this translation helpful? Give feedback.
-
"opt-in" + nice request to participate = +1 |
Beta Was this translation helpful? Give feedback.
-
My thoughts on telemetry is that it needs to be opt-out or you might as well not bother. The people who do opt-in to something like telemetry are in my experience the same people who will take the time to write a detailed bug report/issue. Perhaps a compromise for the "no opt-out telemetry at all" group would be to have axolotl prominently display a warning/message and then wait for X seconds to let the user cancel and disable the telemetry if they so wish. Though this should probably be accompanied by a setting to simply skip the wait (perhaps unifying it into a single AXOLOTL_TELEMTRY=yes|no, with it defaulting to yes + the wait) But overall my stance is that telemetry, even if opt-in, would be a boon for the project. Though I don't believe opt-in would be worth the effort to set up infrastructure and code changes. |
Beta Was this translation helpful? Give feedback.
-
I would have to see an implementation that isn't a listicle that reads like Cursor slop (no offense :p), but as long as nothing too bad is sent upstream (private model names, private dataset samples, etc) and its redacted on the client I dont have much of a problem with opt-out telemetry here |
Beta Was this translation helpful? Give feedback.
-
I believe all data collection should be based on consent instead of opt-out. You could show a message on startup asking for consent, or even refuse to start unless a variable or command line flag is present. That is the only alternative to opt-in that ensures that no unwanted telemetry is collected. Adding opt-out mechanisms is bad, as many people likely wouldn't even know that there is telemetry and would accidentally leave it on. If few people give consent, then it implies that this feature is not wanted and that is a risk that just needs to be taken. |
Beta Was this translation helpful? Give feedback.
-
It's worth pointing out that you're probably already sending data to HuggingFace, they just happened to have this implemented from the beginning. https://huggingface.co/docs/huggingface_hub/en/package_reference/environment_variables#hfhubdisabletelemetry |
Beta Was this translation helpful? Give feedback.
-
I am uncomfortable with sharing even the smallest amount of data. Thanks for alerting me to https://consoledonottrack.com/, at least that helps patch up the huggingface telemetry. But it is sad that as a user I have little control over my information, even basic device information, and have to blindly trust that open source projects and the telemetry providers like PostHog that serve them respect my preferences. I understand the need and desire for data collection. It would be great to see beforehand what configurations go OOM and which ones don't. That would be super valuable, save countless hours and money, etc. But unless it's extremely obvious to the users I would not support it. Like @fergusq says, refusing to start axolotl until the user has made a decision would be something I would be okay with. Like the Debian installer which requires users to choose one or the other: |
Beta Was this translation helpful? Give feedback.
-
I prefer the reverse. This is open source software and the environment variables could not be clearer how to opt out. The telemetry events are super super clear, and do not contain PII. I hate the idea that the maintainers on this project are being forced to do gymnastics to even understand what users are doing in order to build better products. If you care about telemetry, then we should have a notice at the start that says how to opt-out which should be no more complicated than setting an environment variable. Let's not make the lives of the maintainers here harder! There's even an industry standard for this! https://consoledonottrack.com/ |
Beta Was this translation helpful? Give feedback.
-
This RFC proposes adding opt-out telemetry to Axolotl to better understand user engagement patterns and improve the library's functionality. The telemetry system will be designed with user privacy as the top priority, collecting only non-personally identifiable information (non-PII) about library usage patterns and system configurations.
Motivation
Currently, we lack visibility into how users interact with Axolotl, which models and configurations are most popular, which hardware configurations are most common, and where users encounter issues. Since Axolotl runs exclusively on users' local machines or their configured cloud providers, we have no centralized way to understand usage patterns.
This information gap makes it challenging to:
Design
axolotl
library, and regularly report stats / insights gained to the community.Data Collection Points
Implementation Details
Each training run will be assigned a unique ID (UUID4) that will be used to associate different telemetry events from the same run. This ID will not be tied to any user information but will allow us to understand the flow of events in a single training session.
Env Variables
Users can opt out of telemetry using environment variables:
The telemetry system will respect both the global
DO_NOT_TRACK
environment variable (an established convention across many OSS applications) and the Axolotl-specificAXOLOTL_DO_NOT_TRACK
variable. If either of these is set to 1, no telemetry will be collected.Data Structure
We are planning to track events using PostHog. We'll use PostHog's built-in properties for system information where available, and custom properties for Axolotl-specific data. E.g.:
Privacy and Security
Alternatives?
Implementation Considerations
Next Steps
Beta Was this translation helpful? Give feedback.
All reactions