Skip to content

Commit

Permalink
Add overview/architecture docs
Browse files Browse the repository at this point in the history
  • Loading branch information
asmacdo committed Feb 10, 2025
1 parent fce80c7 commit f9f80d0
Show file tree
Hide file tree
Showing 6 changed files with 78 additions and 30 deletions.
4 changes: 0 additions & 4 deletions docs/59_getting_started_replicating_dandi.md

This file was deleted.

70 changes: 70 additions & 0 deletions docs/59_overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# Overview and Architecture

The series of docs in this directory define how to create your own DANDI ecosystem (i.e. a clone of the entire DANDI ecosystem).
It is suggested that you briefly read through each of the documents in this guide before starting.

This section provides a high-level view of how DANDI’s core components fit together in a typical “full stack” deployment.

## The Big Picture

The DANDI platform is essentially composed of:

1. **Storage**: S3 buckets (AWS) where data actually resides.
2. **API**: A Django/Girder-based backend application (hosted on Heroku) that handles the DANDI data model, user authentication, and orchestrates S3 interactions.
3. **Frontend**: A Vue-based web application (hosted on Netlify) for users to browse, search, and manage data in the archive.
4. **Workers**: Celery workers (also on Heroku) for asynchronous tasks such as file checksum calculations, analytics, and housekeeping.

5. **Observability**: Log aggregation and alerting (Heroku logs, optional additional logs), plus Sentry for error-tracking and notifications. TODO(asmacdo) verify
6. **Infrastructure-As-Code**: Terraform scripts that glue everything together—AWS S3 resources, Netlify or domain DNS, Heroku apps, etc.

These services interconnect as follows:

<img
src="../img/client_requests.png"
alt="client_requests"
style="width: 90%; height: auto; display: block; margin-left: auto; margin-right: auto;"/>

* The user (or script) interacts with the **Web UI** or the **DANDI CLI**.
* The **Web UI** calls into the **API** (over HTTPS).
* The **API** queries or updates metadata in its Postgres DB (hosted on Heroku).
* The **API** calls AWS S3 to read/write DANDI assets.
* Certain heavy-lift or background tasks get queued into Celery tasks, handled by the **Workers**.
* Domain names, certificates, and load-balancing records are handled by AWS Route 53 or Netlify’s DNS, depending on whether it’s the API subdomain or the apex domain for the UI.
* Large chunks of data can be streamed from S3 directly to the Client via presigned URLs

## Key Components

<img
src="../img/deployment.png"
alt="dandi_deployment"
style="width: 90%; height: auto; display: block; margin-left: auto; margin-right: auto;"/>


### 1. AWS S3 Storage

* **Primary Storage**: S3 buckets are the primary storage of the data (Zarr, NWB, etc.).
* **Configured via terraform**: Bucket creation, IAM policies, route to logs, etc., are specified in `terraform/*.tf`.
Provides storage buckets, as well as domain management, for resources across the DANDI ecosystem

### 2. Heroku

Provisions the servers, worker processes, and the database for the API.

1. **API**: Django, extended by Girder 4, provides REST endpoints for metadata, asset management, versioning, and authentication.
2. **Postgres**: Stores user metadata, dandiset metadata, and references to S3 objects.
3. **Workers (Celery)**: Offload long-running tasks (checksums, analytics, zarr validation, etc.).

### 3. Netlify (UI)

* **Frontend server**: Serves a static build of the DANDI Archive frontend (Vue.js).
* **Autodeployment**: On each push or merge to `main` (or whichever branch is configured), Netlify automatically builds and deploys.
* **Configuration**:
- **`netlify.toml`**: Describes build commands, environment variables for staging vs. production.
- **`.env.production`**: Holds the environment variables for the Vue-based app at runtime (e.g. `VITE_API_URL`, `VITE_SENTRY_DSN`).

### 4. Terraform Infrastructure

The single source of truth for spinning up or tearing down resources such as S3 buckets, IAM users, Route 53 DNS, Heroku pipeline config, Netlify domain config, etc.

* **Repo**: The [`dandi-infrastructure`](https://github.com/dandi/dandi-infrastructure) repo.
* **Terraform Cloud**: Used to run or apply changes after you push commits to the infrastructure repo.
32 changes: 7 additions & 25 deletions docs/60_initialize_vendors.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,30 +2,12 @@

The DANDI ecosystem relies on vendor services to operate. So first you will need to set up accounts with the following vendors:

**Heroku**

Provisions the servers and worker processes for the API. Heroku also handles the Postgres instance responsible
for data models in the DANDI Archive.

**AWS**

Provides storage buckets, as well as domain management, for resources across the DANDI ecosystem

**GitHub**

Serves as the authentication provider for accounts across the DANDI ecosystem

**Terraform Cloud**

Manages provisioned resources across cloud vendors in a version-controlled manner.

**Netlify**

Deploys production frontend build, as well as s staging previews to assist with frontend development

**Sentry**

Provides observability and monitoring for API events
**Heroku**: Provisions the API componenents
**AWS**: Provides storage buckets, as well as domain management, for resources across the DANDI ecosystem
**GitHub**: Serves as the authentication provider for accounts across the DANDI ecosystem
**Terraform Cloud**: Manages provisioned resources across cloud vendors in a version-controlled manner.
**Netlify**: Deploys production frontend build, as well as staging previews to assist with frontend development
**Sentry**: Provides observability and monitoring for API events

Some services are not yet integrated within the main infrastructure:

Expand Down Expand Up @@ -440,4 +422,4 @@ style="width: 60%; height: auto; display: block; margin-left: auto; margin-righ

## datalad (TBD)

## git-annex (TBD)
## git-annex (TBD)
Binary file added docs/img/client_requests.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/img/deployment.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ nav:
- REST API Swagger: https://api.dandiarchive.org/swagger
- REST API Redoc: https://api.dandiarchive.org/redoc
- Create DANDI Instance:
- Getting Started: "59_getting_started_replicating_dandi.md"
- Overview and Architecture: "59_overview.md"
- Initialize Vendor Accounts for DANDI: "60_initialize_vendors.md"
- DANDI Authentication: "61_dandi_authentication.md"
- DANDI CLI: "62_dandi_cli.md"
Expand Down

0 comments on commit f9f80d0

Please sign in to comment.