You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/guide/storage.md
+21-15Lines changed: 21 additions & 15 deletions
Original file line number
Diff line number
Diff line change
@@ -382,9 +382,9 @@ After you [configure access to your S3 bucket](#Configure-access-to-your-S3-buck
382
382
- In the **Session Token** field, specify a session token of the temporary security credentials for an AWS account with access to your S3 bucket.
383
383
- (Optional) Enable **Treat every bucket object as a source file** if your bucket contains BLOB storage files such as JPG, MP3, or similar file types. This setting creates a URL for each bucket object to use for labeling. Leave this option disabled if you have multiple JSON files in the bucket with one task per JSON file.
384
384
- (Optional) Enable **Recursive scan** to perform recursive scans over the bucket contents if you have nested folders in your S3 bucket.
385
-
- Choose whether to disable **Use pre-signed URLs**.
386
-
-All s3://... links will be resolved on the fly and converted to https URLs, if this option is on.
387
-
-All s3://... objects will be preloaded into Label Studio tasks as base64 codes, if this option is off. It's not recommended way, because Label Studio task payload will be huge and UI will slow down. Also it requires GET permissions from your storage.
385
+
- Choose whether to disable [**Use pre-signed URLs**](#Pre-signed-URLs-vs-storage-proxies).
386
+
-**ON** - Label Studio generates a pre-signed URL to load media.
387
+
-**OFF** - Label Studio proxies media using its own backend.
388
388
- Adjust the counter for how many minutes the pre-signed URLs are valid.
389
389
8. Click **Add Storage**.
390
390
@@ -547,7 +547,9 @@ In the Label Studio UI, do the following to set up the connection:
547
547
- In the **External ID** field, specify the external ID that identifies Label Studio to your AWS account. You can find the external ID on your **Organization** page.
548
548
- Enable **Treat every bucket object as a source file** if your bucket contains BLOB storage files such as JPG, MP3, or similar file types. This setting creates a URL for each bucket object to use for labeling. Leave this option disabled if you have multiple JSON files in the bucket with one task per JSON file.
549
549
- Enable **Recursive scan** to perform recursive scans over the bucket contents if you have nested folders in your S3 bucket.
550
-
- Choose whether to disable **Use pre-signed URLs**. If your tasks contain s3://... links, they must be pre-signed in order to be displayed in the browser.
550
+
- Choose whether to disable [**Use pre-signed URLs**](#Pre-signed-URLs-vs-storage-proxies).
551
+
-**ON** - Label Studio generates a pre-signed URL to load media.
552
+
-**OFF** - Label Studio proxies media using its own backend.
551
553
- Adjust the counter for how many minutes the pre-signed URLs are valid.
552
554
8. Click **Add Storage**.
553
555
@@ -686,7 +688,9 @@ In the Label Studio UI, do the following to set up the connection:
686
688
7. Adjust the remaining optional parameters:
687
689
- In the **File Filter Regex** field, specify a regular expression to filter bucket objects. Use `.*` to collect all objects.
688
690
- Enable **Treat every bucket object as a source file** if your bucket contains BLOB storage files such as JPG, MP3, or similar file types. This setting creates a URL for each bucket object to use for labeling, such as `gs://my-gcs-bucket/image.jpg`. Leave this option disabled if you have multiple JSON files in the bucket with one task per JSON file.
689
-
- Choose whether to disable **Use pre-signed URLs**. If your tasks contain gs://... links, they must be pre-signed in order to be displayed in the browser.
691
+
- Choose whether to disable [**Use pre-signed URLs**](#Pre-signed-URLs-vs-storage-proxies).
692
+
-**ON** - Label Studio generates a pre-signed URL to load media.
693
+
-**OFF** - Label Studio proxies media using its own backend.
690
694
- Adjust the counter for how many minutes the pre-signed URLs are valid.
691
695
8. In the **Google Application Credentials** field, add a JSON file with the GCS credentials you created to manage authentication for your bucket.
692
696
@@ -951,7 +955,7 @@ Before you begin, ensure you are in the correct project:
951
955
4. Under **Add a provider pool**, complete the following fields:
952
956
953
957
***Select a provider**: Select AWS. This is the location where the Label Studio components responsible for issuing requests are stored.
954
-
***Provider name**: Enter `Label Studio Production`or another display name.
958
+
***Provider name**: Enter `Label Studio App Production`(you can use a different display name, but you need to ensure that the corresponding provider ID is still `label-studio-app-production`)
955
959
***Provider ID**: Enter `label-studio-app-production`.
956
960
***AWS Account ID**: Enter `490065312183`.
957
961
@@ -964,15 +968,15 @@ Before you begin, ensure you are in the correct project:
964
968
* Click **Edit mapping** and then add the following:
965
969
966
970
- `google.subject = assertion.arn`
967
-
- `attribute.aws_role = assertion.arn`
971
+
- `attribute.aws_role = assertion.arn.contains('assumed-role') ? assertion.arn.extract('{account_arn}assumed-role/') + 'assumed-role/' + assertion.arn.extract('assumed-role/{role_name}/') : assertion.arn` (this might be filled in by default)
968
972
- `attribute.aws_account = assertion.account`
969
973
- `attribute.external_id = assertion.external_id`
970
974
971
975
6. Click **Save**.
972
976
973
977
7. Go to **IAM & Admin > Service Accounts** and find the service account you want to allow AWS (Label Studio) to impersonate. See [Service account permissions](#Service-account-permissions) above.
974
978
975
-
8. From the **Permissions** tab, click **Grant Access**.
979
+
8. From the **Principals with access** tab, click **Grant Access**.
976
980
977
981

978
982
@@ -994,10 +998,10 @@ Before you begin, ensure you are in the correct project:
994
998
Before setting up your connection in Label Studio, note the following (you will be asked to provide them)
995
999
996
1000
* Your pool ID - available from **IAM & Admin > Workload Identity Pools**
997
-
* Your provider ID - available from **IAM & Admin > Workload Identity Pools**
1001
+
* Your provider ID - available from **IAM & Admin > Workload Identity Pools**(this should be `label-studio-app-production`)
998
1002
* Your service account email - available from **IAM & Admin > Service Accounts**. Select the service account and the email is listed under **Details**.
999
-
* Your Google project number - available from **IAM & Admin > Settings**.
1000
-
* Your Google project ID - available from **IAM & Admin > Settings**.
1003
+
* Your Google project number - available from **IAM & Admin > Settings**
1004
+
* Your Google project ID - available from **IAM & Admin > Settings**
1001
1005
1002
1006
</details>
1003
1007
@@ -1013,9 +1017,9 @@ Select the **GCS (WIF auth)** storage type and then complete the following field
| Bucket Name | Enter the name of the Google Cloud bucket. |
1015
1019
| Bucket Prefix | Optionally, enter the folder name within the bucket that you would like to use. For example, `data-set-1` or `data-set-1/subfolder-2`. |
1016
-
| File Name Filter |Specify a regular expression to filter bucket objects. Use `.*` to collect all objects. |
1017
-
| Treat every bucket object as a source file | Enable this option if your bucket contains BLOB storage files such as JPG, MP3, or similar file types. This setting creates a URL foreach bucket object to use for labeling, such as `gs://my-gcs-bucket/image.jpg`. Leave this option disabled if you have multiple JSON filesinthe bucket with one task per JSON file. |
1018
-
| Use pre-signed URLs |If your tasks contain `gs://…` links, they must be pre-signed in order to be displayed in the browser. |
1020
+
| File Name Filter |Optionally, specify a regular expression to filter bucket objects. |
1021
+
|[Treat every bucket object as a source file](#Treat-every-bucket-object-as-a-source-file) | Enable this option if your bucket contains BLOB storage files such as JPG, MP3, or similar file types. This setting creates a URL for each bucket object to use for labeling, such as `gs://my-gcs-bucket/image.jpg`. Leave this option disabled if you have are specifying your tasks in JSON files. |
1022
+
|[Use pre-signed URLs](#Pre-signed-URLs-vs-storage-proxies) | **ON** - Label Studio generates a pre-signed URL to load media. <br /> **OFF** - Label Studio proxies media using its own backend. |
1019
1023
| Pre-signed URL counter | Adjust the counter for how many minutes the pre-signed URLs are valid. |
1020
1024
| Workload Identity Pool ID | This is the ID you specified when creating the Work Identity Pool. You can find this in Google Cloud Console under **IAM & Admin > Workload Identity Pools**. |
1021
1025
| Workload Identity Provider ID | This is the ID you specified when setting up the provider. You can find this in Google Cloud Console under **IAM & Admin > Workload Identity Pools**. |
@@ -1140,7 +1144,9 @@ In the Label Studio UI, do the following to set up the connection:
1140
1144
- In the **Account Name** field, specify the account name for the Azure storage. You can also set this field as an environment variable,`AZURE_BLOB_ACCOUNT_NAME`.
1141
1145
- In the **Account Key** field, specify the secret key to access the storage account. You can also set this field as an environment variable,`AZURE_BLOB_ACCOUNT_KEY`.
1142
1146
- Enable **Treat every bucket object as a source file**if your bucket contains BLOB storage files such as JPG, MP3, or similar file types. This setting creates a URL foreach bucket object to use for labeling, for example `azure-blob://container-name/image.jpg`. Leave this option disabled if you have multiple JSON filesin the bucket with one task per JSON file.
1143
-
- Choose whether to disable **Use pre-signed URLs**, or [shared access signatures](https://docs.microsoft.com/en-us/rest/api/storageservices/delegate-access-with-shared-access-signature). If your tasks contain azure-blob://... links, they must be pre-signed in order to be displayed in the browser.
1147
+
- Choose whether to disable [**Use pre-signed URLs**](#Pre-signed-URLs-vs-storage-proxies), or [shared access signatures](https://docs.microsoft.com/en-us/rest/api/storageservices/delegate-access-with-shared-access-signature).
1148
+
- **ON** - Label Studio generates a pre-signed URL to load media.
1149
+
- **OFF** - Label Studio proxies media using its own backend.
1144
1150
- Adjust the counter for how many minutes the shared access signatures are valid.
1145
1151
8. Click **Add Storage**.
1146
1152
9. Repeat these steps for**Target Storage** to sync completed data annotations to a container.
0 commit comments