Skip to content

Commit

Permalink
Add example notebook
Browse files Browse the repository at this point in the history
Add example notebook for uploading data to the caltechdata system.
  • Loading branch information
AbakahAlexander authored Dec 18, 2024
1 parent abb6cb3 commit 206c35b
Show file tree
Hide file tree
Showing 2 changed files with 275 additions and 1 deletion.
263 changes: 263 additions & 0 deletions Uploading_dataset_to_CaltechDATA.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,263 @@
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": [],
"mount_file_id": "16c_qOZHmh9T8mBCqk5a6_os2H-D4lJuH",
"authorship_tag": "ABX9TyMYP9gVRivbIGD6aN1ednNp",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/github/AbakahAlexander/caltechdata_api/blob/main/Uploading_dataset_to_CaltechDATA.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"source": [
"Use this comman to install the *Caltechdata API*. This gives you access to all the functions for uploading data, making changes, etc However, remember that if you install it in an online code editor like Google Colab, you have to run this command each time you're connected back to the runtime."
],
"metadata": {
"id": "vs_acZFiE-nF"
}
},
{
"cell_type": "code",
"source": [
"pip install caltechdata_api"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "s5BHYqHwlKVp",
"outputId": "d9936859-53fd-4312-df21-af2c846b9f8d"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Requirement already satisfied: caltechdata_api in /usr/local/lib/python3.10/dist-packages (1.8.2)\n",
"Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from caltechdata_api) (2.32.3)\n",
"Requirement already satisfied: datacite>1.1.0 in /usr/local/lib/python3.10/dist-packages (from caltechdata_api) (1.2.0)\n",
"Requirement already satisfied: tqdm>=4.62.3 in /usr/local/lib/python3.10/dist-packages (from caltechdata_api) (4.66.6)\n",
"Requirement already satisfied: pyyaml in /usr/local/lib/python3.10/dist-packages (from caltechdata_api) (6.0.2)\n",
"Requirement already satisfied: s3fs in /usr/local/lib/python3.10/dist-packages (from caltechdata_api) (2024.10.0)\n",
"Requirement already satisfied: cryptography in /usr/local/lib/python3.10/dist-packages (from caltechdata_api) (43.0.3)\n",
"Requirement already satisfied: s3cmd in /usr/local/lib/python3.10/dist-packages (from caltechdata_api) (2.4.0)\n",
"Requirement already satisfied: jsonschema>=3.0.0 in /usr/local/lib/python3.10/dist-packages (from datacite>1.1.0->caltechdata_api) (4.23.0)\n",
"Requirement already satisfied: lxml>=4.5.2 in /usr/local/lib/python3.10/dist-packages (from datacite>1.1.0->caltechdata_api) (5.3.0)\n",
"Requirement already satisfied: idutils>=1.0.0 in /usr/local/lib/python3.10/dist-packages (from datacite>1.1.0->caltechdata_api) (1.4.2)\n",
"Requirement already satisfied: importlib-metadata>=6.11.0 in /usr/local/lib/python3.10/dist-packages (from datacite>1.1.0->caltechdata_api) (8.5.0)\n",
"Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests->caltechdata_api) (3.4.0)\n",
"Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->caltechdata_api) (3.10)\n",
"Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->caltechdata_api) (2.2.3)\n",
"Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->caltechdata_api) (2024.8.30)\n",
"Requirement already satisfied: cffi>=1.12 in /usr/local/lib/python3.10/dist-packages (from cryptography->caltechdata_api) (1.17.1)\n",
"Requirement already satisfied: python-dateutil in /usr/local/lib/python3.10/dist-packages (from s3cmd->caltechdata_api) (2.8.2)\n",
"Requirement already satisfied: python-magic in /usr/local/lib/python3.10/dist-packages (from s3cmd->caltechdata_api) (0.4.27)\n",
"Requirement already satisfied: aiobotocore<3.0.0,>=2.5.4 in /usr/local/lib/python3.10/dist-packages (from s3fs->caltechdata_api) (2.15.2)\n",
"Requirement already satisfied: fsspec==2024.10.0.* in /usr/local/lib/python3.10/dist-packages (from s3fs->caltechdata_api) (2024.10.0)\n",
"Requirement already satisfied: aiohttp!=4.0.0a0,!=4.0.0a1 in /usr/local/lib/python3.10/dist-packages (from s3fs->caltechdata_api) (3.10.10)\n",
"Requirement already satisfied: botocore<1.35.37,>=1.35.16 in /usr/local/lib/python3.10/dist-packages (from aiobotocore<3.0.0,>=2.5.4->s3fs->caltechdata_api) (1.35.36)\n",
"Requirement already satisfied: wrapt<2.0.0,>=1.10.10 in /usr/local/lib/python3.10/dist-packages (from aiobotocore<3.0.0,>=2.5.4->s3fs->caltechdata_api) (1.16.0)\n",
"Requirement already satisfied: aioitertools<1.0.0,>=0.5.1 in /usr/local/lib/python3.10/dist-packages (from aiobotocore<3.0.0,>=2.5.4->s3fs->caltechdata_api) (0.12.0)\n",
"Requirement already satisfied: aiohappyeyeballs>=2.3.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->s3fs->caltechdata_api) (2.4.3)\n",
"Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.10/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->s3fs->caltechdata_api) (1.3.1)\n",
"Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->s3fs->caltechdata_api) (24.2.0)\n",
"Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->s3fs->caltechdata_api) (1.5.0)\n",
"Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.10/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->s3fs->caltechdata_api) (6.1.0)\n",
"Requirement already satisfied: yarl<2.0,>=1.12.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->s3fs->caltechdata_api) (1.17.1)\n",
"Requirement already satisfied: async-timeout<5.0,>=4.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->s3fs->caltechdata_api) (4.0.3)\n",
"Requirement already satisfied: pycparser in /usr/local/lib/python3.10/dist-packages (from cffi>=1.12->cryptography->caltechdata_api) (2.22)\n",
"Requirement already satisfied: isbnlib>=3.10.8 in /usr/local/lib/python3.10/dist-packages (from idutils>=1.0.0->datacite>1.1.0->caltechdata_api) (3.10.14)\n",
"Requirement already satisfied: zipp>=3.20 in /usr/local/lib/python3.10/dist-packages (from importlib-metadata>=6.11.0->datacite>1.1.0->caltechdata_api) (3.20.2)\n",
"Requirement already satisfied: jsonschema-specifications>=2023.03.6 in /usr/local/lib/python3.10/dist-packages (from jsonschema>=3.0.0->datacite>1.1.0->caltechdata_api) (2024.10.1)\n",
"Requirement already satisfied: referencing>=0.28.4 in /usr/local/lib/python3.10/dist-packages (from jsonschema>=3.0.0->datacite>1.1.0->caltechdata_api) (0.35.1)\n",
"Requirement already satisfied: rpds-py>=0.7.1 in /usr/local/lib/python3.10/dist-packages (from jsonschema>=3.0.0->datacite>1.1.0->caltechdata_api) (0.21.0)\n",
"Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil->s3cmd->caltechdata_api) (1.16.0)\n",
"Requirement already satisfied: jmespath<2.0.0,>=0.7.1 in /usr/local/lib/python3.10/dist-packages (from botocore<1.35.37,>=1.35.16->aiobotocore<3.0.0,>=2.5.4->s3fs->caltechdata_api) (1.0.1)\n",
"Requirement already satisfied: typing-extensions>=4.1.0 in /usr/local/lib/python3.10/dist-packages (from multidict<7.0,>=4.5->aiohttp!=4.0.0a0,!=4.0.0a1->s3fs->caltechdata_api) (4.12.2)\n",
"Requirement already satisfied: propcache>=0.2.0 in /usr/local/lib/python3.10/dist-packages (from yarl<2.0,>=1.12.0->aiohttp!=4.0.0a0,!=4.0.0a1->s3fs->caltechdata_api) (0.2.0)\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"This command is used to import the *caltechdata_write* function from the *caltechdata_api*. This is the function used to make an upload to the caltechdata site.You can import other functions as well if you wish to do other things aside uploading."
],
"metadata": {
"id": "RU9G1N45HL8W"
}
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "tHe1a9jFk9gn"
},
"outputs": [],
"source": [
"from caltechdata_api import caltechdata_write"
]
},
{
"cell_type": "markdown",
"source": [
"Before you proceede, make sure you generate your access token from the appropriate site. If you're uploading to the caltechdata test system, make sure you generate the access token at *https://data.caltechlibrary.dev/*. If you're uploading to the real system, make sure you generate the access token at *https://data.caltech.edu/* Put your API token here to authenticate access to the caltechdata api."
],
"metadata": {
"id": "pdVHbtELH4Dm"
}
},
{
"cell_type": "code",
"source": [
"import os\n",
"os.environ[\"RDMTOK\"] = \"your access token here\""
],
"metadata": {
"id": "A1rOU5lvSrrm"
},
"execution_count": 30,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"The metadata contains basic information about the upload you're making such as the title of that data,creators etc."
],
"metadata": {
"id": "A0BAd60EmoRd"
}
},
{
"cell_type": "code",
"source": [
"metadata = {\n",
" \"titles\": [{\"title\": \"title of data\"}],\n",
" \"creators\": [{\"name\": \"name of creator\"}],\n",
" \"description\": \"brief description of the data\",\n",
" \"publication_date\": \"date of upload\",\n",
" \"types\" : {\"resourceType\": \"Dataset\", \"resourceTypeGeneral\": \"Dataset\"},\n",
" \"descriptions\": []\n",
"}"
],
"metadata": {
"id": "sRFby6hVldz5"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"There are two options here. Use *file_links* if the data you're uploading is on a remote server. Use *files* if the data you're uploading is on your local machine or google drive (if you're using colab)."
],
"metadata": {
"id": "Abo_NlSCm42f"
}
},
{
"cell_type": "code",
"source": [
"#file_links = [\"links to the files\"]\n",
"files = [\"path to the files\"]\n",
"#files = file_links[0].split(\"/\")[-1]"
],
"metadata": {
"id": "wbqM_N9-mGt4"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Put your access token here."
],
"metadata": {
"id": "McI_z-hind1n"
}
},
{
"cell_type": "code",
"source": [
"token = \"your access token here\""
],
"metadata": {
"id": "Y9O7758SliOr"
},
"execution_count": 31,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"If you used *files* from the previous snippet, then you don't have to use *file_links* here and vice versa. Also set production to True if you're uploading the data to the real caltechdata website. If you're uoloading to the test system, set production to False. Set publish to True if you want the data published on the website. All other arguments should remain the same.\n"
],
"metadata": {
"id": "Zue_AfuMnlRX"
}
},
{
"cell_type": "code",
"source": [
"response = caltechdata_write(\n",
" metadata = metadata,\n",
" token=token,\n",
" files=files,\n",
" production=False,\n",
" schema=\"43\",\n",
" publish=False,\n",
" #file_links= file_links,\n",
" s3=None,\n",
" community=None,\n",
" authors=False,\n",
" file_descriptions=[],\n",
" s3_link=None,\n",
" default_preview=None,\n",
" review_message=None,\n",
")\n",
"\n",
"print(response)"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "MKc6rv6llVPO",
"outputId": "515864b6-63ce-436c-f652-208097ea720c"
},
"execution_count": 32,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"wjms7-60x72\n"
]
}
]
}
]
}
13 changes: 12 additions & 1 deletion codemeta.json
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,17 @@
"@type": "Organization",
"name": "Caltech"
}
},
{
"@type": "Person",
"givenName": "Alexander A",
"familyName": "Abakah",
"affiliation": {
"@type": "Organization",
"name": "Caltech Library"
},
"email": "aabakah@caltech.edu",
"@id": "https://orcid.org/0009-0003-5640-6691"
}
],
"developmentStatus": "active",
Expand Down Expand Up @@ -72,4 +83,4 @@
},
"programmingLanguage": "Python",
"identifier": "10.22002/bv2pv-2b295"
}
}

0 comments on commit 206c35b

Please sign in to comment.