Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Field to specify whether a dataset can be downloaded using datalad #885

Open
emmetaobrien opened this issue Apr 10, 2024 · 6 comments
Open

Comments

@emmetaobrien
Copy link
Collaborator

emmetaobrien commented Apr 10, 2024

This issue proposes the addition of a new field to the extraProperties section of the CONP DATS.json file, named download_options. This field will indicate whether it is possible to download the dataset in question by the default datalad procedure as documented in the CONP interface page, and be used to determine which of the current "Direct download", "Download with Datalad" and "Third-party account required" buttons/links on the right-hand side of the dataset card are displayed and/or emphasised. Suggestions and other feedback welcome, this is very much an opening draft.

Suggested behaviour:

  1. By default, when a DATS.json file does not contain an extraProperties=>download_options entry, the dataset card should be displayed as it is currently.
  2. Datasets that require a third-party account for access should have that emphasised more strongly than the current design does, by removing non-functional "Direct download" and "Download with Datalad" buttons and adding a much more prominent "Third party account required" button, perhaps in a different colour from the existing green buttons in order to be immediately visible on the list of datasets. (Initial design suggestions: not red/amber that would suggest the dataset is unavailable rather than available by a different means; background colour should have a significantly different brightness level as well as hue to be similarly visual distinct to colour-blind users, which might also recommend dark text on brighter background?)
  3. While initial design discussions have focused on this as a yes/no flag, to distinguish two categories of datasets which cannot be straightforwardly separated using existing fields, the download_option field could equally well support multiple values, to facilitate multiple alternative download button layouts. Is this an option we want to consider at this time?

Implementation:

Once the questions raised above are addressed to the point of implementing a prototype, this functionality calls for:

  1. Annotating existing datasets that require a non-default download_options entry (such as OBI data.) A couple of these can be done quickly for initial test purposes to allow the other steps to proceed in parallel (Emmet).
  2. Modifying the dataset display card to show redesigned buttons/links if a download_options value is read. (Laetitia)
  3. Modifying the DATS editor to enable users to specify a download_options value (Alexandre) and come up with a clear and concise description of what it is for and when/why/how it should be filled in. (Patrick) Radio buttons or a pull-down list seem like workable options here.
@emmetaobrien
Copy link
Collaborator Author

Edit: for consistency with general practice, the field name should be downloadOptions rather than download_options.

@GHPBZ
Copy link
Collaborator

GHPBZ commented Apr 24, 2024

The current download options are:

  • Direct/browser-based (local archive offered through "Direct Download" button)
    • Triggered by Privacy=open and availability availability of zip archive (or only zip archive?)
      • What creates zip archive?
  • DataLad ("Download with DataLad" button)
    • Necessarily in all cases where browser download is available
    • Also some other cases, like Community Server
    • Currently lacking trigger conditions
  • DataLad with credentials (account required but provides DataLad access if we have the right crawler; "Download with DataLad" button)
    • Currently lacking trigger conditions
  • Offsite only (no CONP download; currently offered linking from "Third-party account required")
    • Currently, "Third-party account required" is triggered by registrationPage but there there is no trigger for 'only offsite' download
  • Should we have a default display of buttons that is modified by the proposed downloadOptions field?
  • We probably can't expect data contributors to toggle all the right options themselves (as it would require knowing more about the Portal than they should) so we'll have to think about what information we collect from them that can then be mapped to Portal download options.

@emmetaobrien
Copy link
Collaborator Author

emmetaobrien commented Apr 24, 2024

Thinking about this, it seems to me that the default display of buttons should match with the most restrictive option we provide, which in this case is "Offsite only", so that any errors here have the failure mode of being overly restrictive rather than releasing (or appearing to release) data more widely than we should.

Having the interface read a DATS.json file with no entry for downloadOptions as "Offsite only" has the practical advantage that many of the datasets with "Offsite only" status are provided by conp-bot crawling OSF or Zenodo, and our workflow for these cases is specifically designed to exclude manual editing of the datasets at our end.

@emmetaobrien
Copy link
Collaborator Author

A first test entry for downloadOptions has been set up to allow interface development to begin: #888.

@emmetaobrien
Copy link
Collaborator Author

emmetaobrien commented Aug 21, 2024

Proposed refinements of the categorisation laid out by @GHPBZ above, based on recent meeting:

"Direct download" and "datalad download" categories are exactly the same right now, so can be treated as a single category.

The desired logic is as follows:

IF privacy == "open" THEN
dataset gets "direct download"/"datalad download" button
ELSE {
IF storedIn contains the string "loris" THEN
dataset gets LORIS-specific "download with credentials" button
ELSE
(..other options as we define them for specific conditions later..)
ELSE (if none of the previous conditions apply)
dataset is marked "offsite"
}

This separates datasets into clearly defined categories based on information already stored in the DATS.json file so avoids the issue of asking users to explicitly specify something fiddly and non-intuitive to control the button attached to their datasets

Suggestions for the "other options" above include checks to identify braincode and eegnet data. We could identify each of these by looking for a matching string in the name or source URL of the dataset; like the condition for "loris", this will want close monitoring.

@emmetaobrien
Copy link
Collaborator Author

Note from this morning's CONP dev meeting; a case has arisen which our current setup does not well handle, which is that of connecting to an offsite dataset with no login needed. Design for this TBD.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants