Skip to content

Data Usage policy for pyOpenSci - what does our policy look like for data usage? #183

Open
@lwasser

Description

@lwasser

Hi Everyone,
I know that this issue of data usage is an important one. As such i wanted to start an issue to capture community thoughts on the issue. Based upon our first iterations below is the language that we are using around data usage.

### Telemetry & user-informed consent

Your package should avoid collecting usage analytics. With
that in mind, we understand that package-use data can be invaluable for the
development process. If the package does collect such data, it should do so
by prioritizing user-informed-consent. This means that before any data are
collected, the user understands:

1. What data are collected
2. How the data are collected.
3. What you plan to do with the data
4. How and where the data are stored

Once the user is informed of what will be collected and how that data will be handled, stored and used, you can implement `opt-in` consent. `opt-in` means that the user agrees to usage-data collection prior to it being collected (rather than having to opt-out when using your package).

We will evaluate usage data collected by packages on a case-by-case basis
and reserve the right not to review a package if the data collection is overly
invasive.

There are some good, important points here regarding how maintainers collect data to support development

For maintainers

  • The data can be useful to improve user experience
  • The data can inform / focus development efforts

On the user end:
There is so much assumption today that collecting data is ok and is even a part of many organizations (think facebook) business model. In return the offer users a tool / service at now / low cost.
However there is a level of trust and ethical acknowledgement to consider. People should have some control about data derived from their activities.

We have a few models such as homebrew which up front is clear about collecting data upon install. the user can opt in or out there.
Is there a model for o(scientific) python that we could follow that would balance the needs of developers with the privacy of users?

References to two items:

pinging:

@sneakers-the-rat @NickleDave @tupui @stefanv @Batalex @yuvipanda @pradyun @choldgraf @skrawcz and anyone else who has some thoughts on what our policy looks like. Let's have an open, productive conversation here so we have a record of it!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions