-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add example notebook for uploading data to the caltechdata system.
- Loading branch information
1 parent
abb6cb3
commit 206c35b
Showing
2 changed files
with
275 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,263 @@ | ||
{ | ||
"nbformat": 4, | ||
"nbformat_minor": 0, | ||
"metadata": { | ||
"colab": { | ||
"provenance": [], | ||
"mount_file_id": "16c_qOZHmh9T8mBCqk5a6_os2H-D4lJuH", | ||
"authorship_tag": "ABX9TyMYP9gVRivbIGD6aN1ednNp", | ||
"include_colab_link": true | ||
}, | ||
"kernelspec": { | ||
"name": "python3", | ||
"display_name": "Python 3" | ||
}, | ||
"language_info": { | ||
"name": "python" | ||
} | ||
}, | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"id": "view-in-github", | ||
"colab_type": "text" | ||
}, | ||
"source": [ | ||
"<a href=\"https://colab.research.google.com/github/AbakahAlexander/caltechdata_api/blob/main/Uploading_dataset_to_CaltechDATA.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"source": [ | ||
"Use this comman to install the *Caltechdata API*. This gives you access to all the functions for uploading data, making changes, etc However, remember that if you install it in an online code editor like Google Colab, you have to run this command each time you're connected back to the runtime." | ||
], | ||
"metadata": { | ||
"id": "vs_acZFiE-nF" | ||
} | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"source": [ | ||
"pip install caltechdata_api" | ||
], | ||
"metadata": { | ||
"colab": { | ||
"base_uri": "https://localhost:8080/" | ||
}, | ||
"id": "s5BHYqHwlKVp", | ||
"outputId": "d9936859-53fd-4312-df21-af2c846b9f8d" | ||
}, | ||
"execution_count": null, | ||
"outputs": [ | ||
{ | ||
"output_type": "stream", | ||
"name": "stdout", | ||
"text": [ | ||
"Requirement already satisfied: caltechdata_api in /usr/local/lib/python3.10/dist-packages (1.8.2)\n", | ||
"Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from caltechdata_api) (2.32.3)\n", | ||
"Requirement already satisfied: datacite>1.1.0 in /usr/local/lib/python3.10/dist-packages (from caltechdata_api) (1.2.0)\n", | ||
"Requirement already satisfied: tqdm>=4.62.3 in /usr/local/lib/python3.10/dist-packages (from caltechdata_api) (4.66.6)\n", | ||
"Requirement already satisfied: pyyaml in /usr/local/lib/python3.10/dist-packages (from caltechdata_api) (6.0.2)\n", | ||
"Requirement already satisfied: s3fs in /usr/local/lib/python3.10/dist-packages (from caltechdata_api) (2024.10.0)\n", | ||
"Requirement already satisfied: cryptography in /usr/local/lib/python3.10/dist-packages (from caltechdata_api) (43.0.3)\n", | ||
"Requirement already satisfied: s3cmd in /usr/local/lib/python3.10/dist-packages (from caltechdata_api) (2.4.0)\n", | ||
"Requirement already satisfied: jsonschema>=3.0.0 in /usr/local/lib/python3.10/dist-packages (from datacite>1.1.0->caltechdata_api) (4.23.0)\n", | ||
"Requirement already satisfied: lxml>=4.5.2 in /usr/local/lib/python3.10/dist-packages (from datacite>1.1.0->caltechdata_api) (5.3.0)\n", | ||
"Requirement already satisfied: idutils>=1.0.0 in /usr/local/lib/python3.10/dist-packages (from datacite>1.1.0->caltechdata_api) (1.4.2)\n", | ||
"Requirement already satisfied: importlib-metadata>=6.11.0 in /usr/local/lib/python3.10/dist-packages (from datacite>1.1.0->caltechdata_api) (8.5.0)\n", | ||
"Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests->caltechdata_api) (3.4.0)\n", | ||
"Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->caltechdata_api) (3.10)\n", | ||
"Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->caltechdata_api) (2.2.3)\n", | ||
"Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->caltechdata_api) (2024.8.30)\n", | ||
"Requirement already satisfied: cffi>=1.12 in /usr/local/lib/python3.10/dist-packages (from cryptography->caltechdata_api) (1.17.1)\n", | ||
"Requirement already satisfied: python-dateutil in /usr/local/lib/python3.10/dist-packages (from s3cmd->caltechdata_api) (2.8.2)\n", | ||
"Requirement already satisfied: python-magic in /usr/local/lib/python3.10/dist-packages (from s3cmd->caltechdata_api) (0.4.27)\n", | ||
"Requirement already satisfied: aiobotocore<3.0.0,>=2.5.4 in /usr/local/lib/python3.10/dist-packages (from s3fs->caltechdata_api) (2.15.2)\n", | ||
"Requirement already satisfied: fsspec==2024.10.0.* in /usr/local/lib/python3.10/dist-packages (from s3fs->caltechdata_api) (2024.10.0)\n", | ||
"Requirement already satisfied: aiohttp!=4.0.0a0,!=4.0.0a1 in /usr/local/lib/python3.10/dist-packages (from s3fs->caltechdata_api) (3.10.10)\n", | ||
"Requirement already satisfied: botocore<1.35.37,>=1.35.16 in /usr/local/lib/python3.10/dist-packages (from aiobotocore<3.0.0,>=2.5.4->s3fs->caltechdata_api) (1.35.36)\n", | ||
"Requirement already satisfied: wrapt<2.0.0,>=1.10.10 in /usr/local/lib/python3.10/dist-packages (from aiobotocore<3.0.0,>=2.5.4->s3fs->caltechdata_api) (1.16.0)\n", | ||
"Requirement already satisfied: aioitertools<1.0.0,>=0.5.1 in /usr/local/lib/python3.10/dist-packages (from aiobotocore<3.0.0,>=2.5.4->s3fs->caltechdata_api) (0.12.0)\n", | ||
"Requirement already satisfied: aiohappyeyeballs>=2.3.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->s3fs->caltechdata_api) (2.4.3)\n", | ||
"Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.10/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->s3fs->caltechdata_api) (1.3.1)\n", | ||
"Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->s3fs->caltechdata_api) (24.2.0)\n", | ||
"Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->s3fs->caltechdata_api) (1.5.0)\n", | ||
"Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.10/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->s3fs->caltechdata_api) (6.1.0)\n", | ||
"Requirement already satisfied: yarl<2.0,>=1.12.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->s3fs->caltechdata_api) (1.17.1)\n", | ||
"Requirement already satisfied: async-timeout<5.0,>=4.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->s3fs->caltechdata_api) (4.0.3)\n", | ||
"Requirement already satisfied: pycparser in /usr/local/lib/python3.10/dist-packages (from cffi>=1.12->cryptography->caltechdata_api) (2.22)\n", | ||
"Requirement already satisfied: isbnlib>=3.10.8 in /usr/local/lib/python3.10/dist-packages (from idutils>=1.0.0->datacite>1.1.0->caltechdata_api) (3.10.14)\n", | ||
"Requirement already satisfied: zipp>=3.20 in /usr/local/lib/python3.10/dist-packages (from importlib-metadata>=6.11.0->datacite>1.1.0->caltechdata_api) (3.20.2)\n", | ||
"Requirement already satisfied: jsonschema-specifications>=2023.03.6 in /usr/local/lib/python3.10/dist-packages (from jsonschema>=3.0.0->datacite>1.1.0->caltechdata_api) (2024.10.1)\n", | ||
"Requirement already satisfied: referencing>=0.28.4 in /usr/local/lib/python3.10/dist-packages (from jsonschema>=3.0.0->datacite>1.1.0->caltechdata_api) (0.35.1)\n", | ||
"Requirement already satisfied: rpds-py>=0.7.1 in /usr/local/lib/python3.10/dist-packages (from jsonschema>=3.0.0->datacite>1.1.0->caltechdata_api) (0.21.0)\n", | ||
"Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil->s3cmd->caltechdata_api) (1.16.0)\n", | ||
"Requirement already satisfied: jmespath<2.0.0,>=0.7.1 in /usr/local/lib/python3.10/dist-packages (from botocore<1.35.37,>=1.35.16->aiobotocore<3.0.0,>=2.5.4->s3fs->caltechdata_api) (1.0.1)\n", | ||
"Requirement already satisfied: typing-extensions>=4.1.0 in /usr/local/lib/python3.10/dist-packages (from multidict<7.0,>=4.5->aiohttp!=4.0.0a0,!=4.0.0a1->s3fs->caltechdata_api) (4.12.2)\n", | ||
"Requirement already satisfied: propcache>=0.2.0 in /usr/local/lib/python3.10/dist-packages (from yarl<2.0,>=1.12.0->aiohttp!=4.0.0a0,!=4.0.0a1->s3fs->caltechdata_api) (0.2.0)\n" | ||
] | ||
} | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"source": [ | ||
"This command is used to import the *caltechdata_write* function from the *caltechdata_api*. This is the function used to make an upload to the caltechdata site.You can import other functions as well if you wish to do other things aside uploading." | ||
], | ||
"metadata": { | ||
"id": "RU9G1N45HL8W" | ||
} | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"id": "tHe1a9jFk9gn" | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"from caltechdata_api import caltechdata_write" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"source": [ | ||
"Before you proceede, make sure you generate your access token from the appropriate site. If you're uploading to the caltechdata test system, make sure you generate the access token at *https://data.caltechlibrary.dev/*. If you're uploading to the real system, make sure you generate the access token at *https://data.caltech.edu/* Put your API token here to authenticate access to the caltechdata api." | ||
], | ||
"metadata": { | ||
"id": "pdVHbtELH4Dm" | ||
} | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"source": [ | ||
"import os\n", | ||
"os.environ[\"RDMTOK\"] = \"your access token here\"" | ||
], | ||
"metadata": { | ||
"id": "A1rOU5lvSrrm" | ||
}, | ||
"execution_count": 30, | ||
"outputs": [] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"source": [ | ||
"The metadata contains basic information about the upload you're making such as the title of that data,creators etc." | ||
], | ||
"metadata": { | ||
"id": "A0BAd60EmoRd" | ||
} | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"source": [ | ||
"metadata = {\n", | ||
" \"titles\": [{\"title\": \"title of data\"}],\n", | ||
" \"creators\": [{\"name\": \"name of creator\"}],\n", | ||
" \"description\": \"brief description of the data\",\n", | ||
" \"publication_date\": \"date of upload\",\n", | ||
" \"types\" : {\"resourceType\": \"Dataset\", \"resourceTypeGeneral\": \"Dataset\"},\n", | ||
" \"descriptions\": []\n", | ||
"}" | ||
], | ||
"metadata": { | ||
"id": "sRFby6hVldz5" | ||
}, | ||
"execution_count": null, | ||
"outputs": [] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"source": [ | ||
"There are two options here. Use *file_links* if the data you're uploading is on a remote server. Use *files* if the data you're uploading is on your local machine or google drive (if you're using colab)." | ||
], | ||
"metadata": { | ||
"id": "Abo_NlSCm42f" | ||
} | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"source": [ | ||
"#file_links = [\"links to the files\"]\n", | ||
"files = [\"path to the files\"]\n", | ||
"#files = file_links[0].split(\"/\")[-1]" | ||
], | ||
"metadata": { | ||
"id": "wbqM_N9-mGt4" | ||
}, | ||
"execution_count": null, | ||
"outputs": [] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"source": [ | ||
"Put your access token here." | ||
], | ||
"metadata": { | ||
"id": "McI_z-hind1n" | ||
} | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"source": [ | ||
"token = \"your access token here\"" | ||
], | ||
"metadata": { | ||
"id": "Y9O7758SliOr" | ||
}, | ||
"execution_count": 31, | ||
"outputs": [] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"source": [ | ||
"If you used *files* from the previous snippet, then you don't have to use *file_links* here and vice versa. Also set production to True if you're uploading the data to the real caltechdata website. If you're uoloading to the test system, set production to False. Set publish to True if you want the data published on the website. All other arguments should remain the same.\n" | ||
], | ||
"metadata": { | ||
"id": "Zue_AfuMnlRX" | ||
} | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"source": [ | ||
"response = caltechdata_write(\n", | ||
" metadata = metadata,\n", | ||
" token=token,\n", | ||
" files=files,\n", | ||
" production=False,\n", | ||
" schema=\"43\",\n", | ||
" publish=False,\n", | ||
" #file_links= file_links,\n", | ||
" s3=None,\n", | ||
" community=None,\n", | ||
" authors=False,\n", | ||
" file_descriptions=[],\n", | ||
" s3_link=None,\n", | ||
" default_preview=None,\n", | ||
" review_message=None,\n", | ||
")\n", | ||
"\n", | ||
"print(response)" | ||
], | ||
"metadata": { | ||
"colab": { | ||
"base_uri": "https://localhost:8080/" | ||
}, | ||
"id": "MKc6rv6llVPO", | ||
"outputId": "515864b6-63ce-436c-f652-208097ea720c" | ||
}, | ||
"execution_count": 32, | ||
"outputs": [ | ||
{ | ||
"output_type": "stream", | ||
"name": "stdout", | ||
"text": [ | ||
"wjms7-60x72\n" | ||
] | ||
} | ||
] | ||
} | ||
] | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters