Skip to content

Commit 9ff40a9

Browse files
committed
Merge branch 'develop' into 'fb-utc-22'
Workflow run: https://github.com/HumanSignal/label-studio/actions/runs/15350785737
2 parents ee9c70b + c5e5295 commit 9ff40a9

File tree

13 files changed

+147
-56
lines changed

13 files changed

+147
-56
lines changed

.github/workflows/git-command.yml

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -30,8 +30,6 @@ jobs:
3030
token: ${{ secrets.GIT_PAT }}
3131
repository: ${{ github.event.client_payload.pull_request.head.repo.full_name }}
3232
ref: ${{ github.event.client_payload.pull_request.head.ref }}
33-
submodules: 'recursive'
34-
fetch-depth: 0
3533

3634
- name: Checkout Actions Hub
3735
uses: actions/checkout@v4
@@ -52,7 +50,8 @@ jobs:
5250
uses: ./.github/actions-hub/actions/git-merge
5351
with:
5452
base_branch: ${{ github.event.client_payload.slash_command.args.unnamed.arg2 || github.event.client_payload.pull_request.base.ref }}
55-
head_branch: ${{ github.event.client_payload.pull_request.head.ref }}
53+
base_repository: ${{ github.event.client_payload.pull_request.base.repo.full_name }}
54+
head_sha: ${{ github.event.client_payload.pull_request.head.sha }}
5655
our_files: "pyproject.toml poetry.lock web"
5756

5857
- name: Git Push

.gitleaks.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ description = "Global allow list"
88
stopwords = [
99
'''keyEntities''',
1010
'''shiftKey''',
11+
'''duplicated''',
1112
]
1213
paths = [
1314
'''gitleaks\.toml''',

docs/source/guide/release_notes/onprem/2.24.0.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,9 @@ When you open Label Studio, you will see a new Home page. Here you can find link
3838

3939
![Screenshot of home page](/images/releases/2-24-home.png)
4040

41+
!!! note
42+
The home page is not available for environments using whitelabeling.
43+
4144
#### Annotator Evaluation settings
4245

4346
There is a new Annotator Evaluation section under **Settings > Quality**.

docs/source/guide/storage.md

Lines changed: 21 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -382,9 +382,9 @@ After you [configure access to your S3 bucket](#Configure-access-to-your-S3-buck
382382
- In the **Session Token** field, specify a session token of the temporary security credentials for an AWS account with access to your S3 bucket.
383383
- (Optional) Enable **Treat every bucket object as a source file** if your bucket contains BLOB storage files such as JPG, MP3, or similar file types. This setting creates a URL for each bucket object to use for labeling. Leave this option disabled if you have multiple JSON files in the bucket with one task per JSON file.
384384
- (Optional) Enable **Recursive scan** to perform recursive scans over the bucket contents if you have nested folders in your S3 bucket.
385-
- Choose whether to disable **Use pre-signed URLs**.
386-
- All s3://... links will be resolved on the fly and converted to https URLs, if this option is on.
387-
- All s3://... objects will be preloaded into Label Studio tasks as base64 codes, if this option is off. It's not recommended way, because Label Studio task payload will be huge and UI will slow down. Also it requires GET permissions from your storage.
385+
- Choose whether to disable [**Use pre-signed URLs**](#Pre-signed-URLs-vs-storage-proxies).
386+
- **ON** - Label Studio generates a pre-signed URL to load media.
387+
- **OFF** - Label Studio proxies media using its own backend.
388388
- Adjust the counter for how many minutes the pre-signed URLs are valid.
389389
8. Click **Add Storage**.
390390

@@ -547,7 +547,9 @@ In the Label Studio UI, do the following to set up the connection:
547547
- In the **External ID** field, specify the external ID that identifies Label Studio to your AWS account. You can find the external ID on your **Organization** page.
548548
- Enable **Treat every bucket object as a source file** if your bucket contains BLOB storage files such as JPG, MP3, or similar file types. This setting creates a URL for each bucket object to use for labeling. Leave this option disabled if you have multiple JSON files in the bucket with one task per JSON file.
549549
- Enable **Recursive scan** to perform recursive scans over the bucket contents if you have nested folders in your S3 bucket.
550-
- Choose whether to disable **Use pre-signed URLs**. If your tasks contain s3://... links, they must be pre-signed in order to be displayed in the browser.
550+
- Choose whether to disable [**Use pre-signed URLs**](#Pre-signed-URLs-vs-storage-proxies).
551+
- **ON** - Label Studio generates a pre-signed URL to load media.
552+
- **OFF** - Label Studio proxies media using its own backend.
551553
- Adjust the counter for how many minutes the pre-signed URLs are valid.
552554
8. Click **Add Storage**.
553555

@@ -686,7 +688,9 @@ In the Label Studio UI, do the following to set up the connection:
686688
7. Adjust the remaining optional parameters:
687689
- In the **File Filter Regex** field, specify a regular expression to filter bucket objects. Use `.*` to collect all objects.
688690
- Enable **Treat every bucket object as a source file** if your bucket contains BLOB storage files such as JPG, MP3, or similar file types. This setting creates a URL for each bucket object to use for labeling, such as `gs://my-gcs-bucket/image.jpg`. Leave this option disabled if you have multiple JSON files in the bucket with one task per JSON file.
689-
- Choose whether to disable **Use pre-signed URLs**. If your tasks contain gs://... links, they must be pre-signed in order to be displayed in the browser.
691+
- Choose whether to disable [**Use pre-signed URLs**](#Pre-signed-URLs-vs-storage-proxies).
692+
- **ON** - Label Studio generates a pre-signed URL to load media.
693+
- **OFF** - Label Studio proxies media using its own backend.
690694
- Adjust the counter for how many minutes the pre-signed URLs are valid.
691695
8. In the **Google Application Credentials** field, add a JSON file with the GCS credentials you created to manage authentication for your bucket.
692696

@@ -951,7 +955,7 @@ Before you begin, ensure you are in the correct project:
951955
4. Under **Add a provider pool**, complete the following fields:
952956

953957
* **Select a provider**: Select AWS. This is the location where the Label Studio components responsible for issuing requests are stored.
954-
* **Provider name**: Enter `Label Studio Production` or another display name.
958+
* **Provider name**: Enter `Label Studio App Production` (you can use a different display name, but you need to ensure that the corresponding provider ID is still `label-studio-app-production`)
955959
* **Provider ID**: Enter `label-studio-app-production`.
956960
* **AWS Account ID**: Enter `490065312183`.
957961

@@ -964,15 +968,15 @@ Before you begin, ensure you are in the correct project:
964968
* Click **Edit mapping** and then add the following:
965969

966970
- `google.subject = assertion.arn`
967-
- `attribute.aws_role = assertion.arn`
971+
- `attribute.aws_role = assertion.arn.contains('assumed-role') ? assertion.arn.extract('{account_arn}assumed-role/') + 'assumed-role/' + assertion.arn.extract('assumed-role/{role_name}/') : assertion.arn` (this might be filled in by default)
968972
- `attribute.aws_account = assertion.account`
969973
- `attribute.external_id = assertion.external_id`
970974

971975
6. Click **Save**.
972976

973977
7. Go to **IAM & Admin > Service Accounts** and find the service account you want to allow AWS (Label Studio) to impersonate. See [Service account permissions](#Service-account-permissions) above.
974978

975-
8. From the **Permissions** tab, click **Grant Access**.
979+
8. From the **Principals with access** tab, click **Grant Access**.
976980

977981
![Screenshot of grant access button](/images/storages/gcs-grant-access.png)
978982

@@ -994,10 +998,10 @@ Before you begin, ensure you are in the correct project:
994998
Before setting up your connection in Label Studio, note the following (you will be asked to provide them)
995999

9961000
* Your pool ID - available from **IAM & Admin > Workload Identity Pools**
997-
* Your provider ID - available from **IAM & Admin > Workload Identity Pools**
1001+
* Your provider ID - available from **IAM & Admin > Workload Identity Pools** (this should be `label-studio-app-production`)
9981002
* Your service account email - available from **IAM & Admin > Service Accounts**. Select the service account and the email is listed under **Details**.
999-
* Your Google project number - available from **IAM & Admin > Settings**.
1000-
* Your Google project ID - available from **IAM & Admin > Settings**.
1003+
* Your Google project number - available from **IAM & Admin > Settings**
1004+
* Your Google project ID - available from **IAM & Admin > Settings**
10011005

10021006
</details>
10031007

@@ -1013,9 +1017,9 @@ Select the **GCS (WIF auth)** storage type and then complete the following field
10131017
| ------------------------------------------ | ------------------------------------------------- |
10141018
| Bucket Name | Enter the name of the Google Cloud bucket. |
10151019
| Bucket Prefix | Optionally, enter the folder name within the bucket that you would like to use. For example, `data-set-1` or `data-set-1/subfolder-2`. |
1016-
| File Name Filter | Specify a regular expression to filter bucket objects. Use `.*` to collect all objects. |
1017-
| Treat every bucket object as a source file | Enable this option if your bucket contains BLOB storage files such as JPG, MP3, or similar file types. This setting creates a URL for each bucket object to use for labeling, such as `gs://my-gcs-bucket/image.jpg`. Leave this option disabled if you have multiple JSON files in the bucket with one task per JSON file. |
1018-
| Use pre-signed URLs | If your tasks contain `gs://…` links, they must be pre-signed in order to be displayed in the browser. |
1020+
| File Name Filter | Optionally, specify a regular expression to filter bucket objects. |
1021+
| [Treat every bucket object as a source file](#Treat-every-bucket-object-as-a-source-file) | Enable this option if your bucket contains BLOB storage files such as JPG, MP3, or similar file types. This setting creates a URL for each bucket object to use for labeling, such as `gs://my-gcs-bucket/image.jpg`. Leave this option disabled if you have are specifying your tasks in JSON files. |
1022+
| [Use pre-signed URLs](#Pre-signed-URLs-vs-storage-proxies) | **ON** - Label Studio generates a pre-signed URL to load media. <br /> **OFF** - Label Studio proxies media using its own backend. |
10191023
| Pre-signed URL counter | Adjust the counter for how many minutes the pre-signed URLs are valid. |
10201024
| Workload Identity Pool ID | This is the ID you specified when creating the Work Identity Pool. You can find this in Google Cloud Console under **IAM & Admin > Workload Identity Pools**. |
10211025
| Workload Identity Provider ID | This is the ID you specified when setting up the provider. You can find this in Google Cloud Console under **IAM & Admin > Workload Identity Pools**. |
@@ -1140,7 +1144,9 @@ In the Label Studio UI, do the following to set up the connection:
11401144
- In the **Account Name** field, specify the account name for the Azure storage. You can also set this field as an environment variable,`AZURE_BLOB_ACCOUNT_NAME`.
11411145
- In the **Account Key** field, specify the secret key to access the storage account. You can also set this field as an environment variable,`AZURE_BLOB_ACCOUNT_KEY`.
11421146
- Enable **Treat every bucket object as a source file** if your bucket contains BLOB storage files such as JPG, MP3, or similar file types. This setting creates a URL for each bucket object to use for labeling, for example `azure-blob://container-name/image.jpg`. Leave this option disabled if you have multiple JSON files in the bucket with one task per JSON file.
1143-
- Choose whether to disable **Use pre-signed URLs**, or [shared access signatures](https://docs.microsoft.com/en-us/rest/api/storageservices/delegate-access-with-shared-access-signature). If your tasks contain azure-blob://... links, they must be pre-signed in order to be displayed in the browser.
1147+
- Choose whether to disable [**Use pre-signed URLs**](#Pre-signed-URLs-vs-storage-proxies), or [shared access signatures](https://docs.microsoft.com/en-us/rest/api/storageservices/delegate-access-with-shared-access-signature).
1148+
- **ON** - Label Studio generates a pre-signed URL to load media.
1149+
- **OFF** - Label Studio proxies media using its own backend.
11441150
- Adjust the counter for how many minutes the shared access signatures are valid.
11451151
8. Click **Add Storage**.
11461152
9. Repeat these steps for **Target Storage** to sync completed data annotations to a container.
Loading

label_studio/core/permissions.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,8 @@ class AllPermissions(BaseModel):
4545
model_provider_connection_view: str = 'model_provider_connection.view'
4646
model_provider_connection_change: str = 'model_provider_connection.change'
4747
model_provider_connection_delete: str = 'model_provider_connection.delete'
48+
webhooks_view: str = 'webhooks.view'
49+
webhooks_change: str = 'webhooks.change'
4850

4951

5052
all_permissions = AllPermissions()

label_studio/data_manager/actions/remove_duplicates.py

Lines changed: 32 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -144,41 +144,43 @@ def restore_storage_links_for_duplicated_tasks(duplicates) -> None:
144144
total_restored_links = 0
145145
for data in list(duplicates):
146146
tasks = duplicates[data]
147-
source = None
147+
148+
def _get_storagelink(task):
149+
for link in classes:
150+
if link_id := task.get(link):
151+
return classes[link], link_id
152+
return None
148153

149154
# find first task with existing StorageLink
155+
tasks_without_storagelinks = []
156+
tasks_with_storagelinks = []
150157
for task in tasks:
151-
for link in classes:
152-
if link in task and task[link] is not None:
153-
# we don't support case when there are many storage links in duplicated tasks
154-
if source is not None:
155-
source = None
156-
break
157-
source = (
158-
task,
159-
classes[link],
160-
task[link],
161-
) # last arg is a storage link id
158+
if _get_storagelink(task):
159+
tasks_with_storagelinks.append(task)
160+
else:
161+
tasks_without_storagelinks.append(task)
162162

163163
# add storage links to duplicates
164-
if source:
165-
storage_link_class = source[1] # get link name
166-
for task in tasks:
167-
if task['id'] != source[0]['id']:
168-
# get already existing StorageLink
169-
link_instance = storage_link_class.objects.get(id=source[2])
170-
171-
# assign existing StorageLink to other duplicated tasks
172-
link = storage_link_class(
173-
task_id=task['id'],
174-
key=link_instance.key,
175-
row_index=link_instance.row_index,
176-
row_group=link_instance.row_group,
177-
storage=link_instance.storage,
178-
)
179-
link.save()
180-
total_restored_links += 1
181-
logger.info(f"Restored storage link for task {task['id']} from source task {source[0]['id']}")
164+
if tasks_with_storagelinks:
165+
# we don't support case when there are many storage links in duplicated tasks
166+
storage_link_class, storage_link_id = _get_storagelink(tasks_with_storagelinks[0])
167+
# get already existing StorageLink
168+
link_instance = storage_link_class.objects.get(id=storage_link_id)
169+
170+
for task in tasks_without_storagelinks:
171+
# assign existing StorageLink to other duplicated tasks
172+
link = storage_link_class(
173+
task_id=task['id'],
174+
key=link_instance.key,
175+
row_index=link_instance.row_index,
176+
row_group=link_instance.row_group,
177+
storage=link_instance.storage,
178+
)
179+
link.save()
180+
total_restored_links += 1
181+
logger.info(
182+
f"Restored storage link for task {task['id']} from source task {tasks_with_storagelinks[0]['id']}"
183+
)
182184

183185
logger.info(f'Restored {total_restored_links} storage links for duplicated tasks')
184186

label_studio/feature_flags.json

Lines changed: 55 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3201,6 +3201,33 @@
32013201
"version": 3,
32023202
"deleted": false
32033203
},
3204+
"fflag_feat_utc_46_session_timeout_policy": {
3205+
"key": "fflag_feat_utc_46_session_timeout_policy",
3206+
"on": false,
3207+
"prerequisites": [],
3208+
"targets": [],
3209+
"contextTargets": [],
3210+
"rules": [],
3211+
"fallthrough": {
3212+
"variation": 0
3213+
},
3214+
"offVariation": 1,
3215+
"variations": [
3216+
true,
3217+
false
3218+
],
3219+
"clientSideAvailability": {
3220+
"usingMobileKey": false,
3221+
"usingEnvironmentId": false
3222+
},
3223+
"clientSide": false,
3224+
"salt": "be7ce7b1242a4e82a5c1239d8a4f7195",
3225+
"trackEvents": false,
3226+
"trackEventsFallthrough": false,
3227+
"debugEventsUntilDate": null,
3228+
"version": 2,
3229+
"deleted": false
3230+
},
32043231
"fflag_feature_all_optic_1421_cold_start_v2": {
32053232
"key": "fflag_feature_all_optic_1421_cold_start_v2",
32063233
"on": false,
@@ -3225,7 +3252,7 @@
32253252
"trackEvents": false,
32263253
"trackEventsFallthrough": false,
32273254
"debugEventsUntilDate": null,
3228-
"version": 2,
3255+
"version": 4,
32293256
"deleted": false
32303257
},
32313258
"fflag_feature_all_optic_1541_performance_score_on_latest_review_short": {
@@ -4403,6 +4430,33 @@
44034430
"version": 8,
44044431
"deleted": false
44054432
},
4433+
"fflag_root_13_annotation_results_filtering": {
4434+
"key": "fflag_root_13_annotation_results_filtering",
4435+
"on": false,
4436+
"prerequisites": [],
4437+
"targets": [],
4438+
"contextTargets": [],
4439+
"rules": [],
4440+
"fallthrough": {
4441+
"variation": 0
4442+
},
4443+
"offVariation": 1,
4444+
"variations": [
4445+
true,
4446+
false
4447+
],
4448+
"clientSideAvailability": {
4449+
"usingMobileKey": false,
4450+
"usingEnvironmentId": false
4451+
},
4452+
"clientSide": false,
4453+
"salt": "4ab31fe3b28f4abd90ac0558b7d41584",
4454+
"trackEvents": false,
4455+
"trackEventsFallthrough": false,
4456+
"debugEventsUntilDate": null,
4457+
"version": 2,
4458+
"deleted": false
4459+
},
44064460
"fix_backend_dev_3134_exclude_deactivated_users": {
44074461
"key": "fix_backend_dev_3134_exclude_deactivated_users",
44084462
"on": false,

label_studio/tests/data_manager/test_api_actions.py

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -141,10 +141,18 @@ def test_action_remove_duplicates(business_client, project_id, storage_model, li
141141
link_model.objects.create(task=task4, key='duplicated.jpg', storage=storage)
142142

143143
# task 5: add a non-duplicated task using the same key, ensuring multiple tasks in the same key don't interfere
144-
task_data = {'data': {'image': 'normal2.jpg'}}
145-
task5 = make_task(task_data, project)
144+
different_task_data = {'data': {'image': 'normal2.jpg'}}
145+
task5 = make_task(different_task_data, project)
146146
link_model.objects.create(task=task5, key='duplicated.jpg', row_index=1, storage=storage)
147147

148+
# task 6: add duplicated task with a different storage link
149+
task6 = make_task(task_data, project)
150+
link_model.objects.create(task=task6, key='duplicated2.jpg', storage=storage)
151+
152+
# task 7: add duplicated task with a different storage link
153+
task7 = make_task(task_data, project)
154+
link_model.objects.create(task=task7, key='duplicated3.jpg', storage=storage)
155+
148156
# call the "remove duplicated tasks" action
149157
status = business_client.post(
150158
f'/api/dm/actions?project={project_id}&id=remove_duplicates',

label_studio/tests/webhooks/webhooks.tavern.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ stages:
1919
method: POST
2020
json:
2121
url: "http://127.0.0.1:6666/webhook"
22+
project: "{configured_project.id}"
2223
headers:
2324
Autorization: "Token 66666666666666666666666"
2425
Security: "123123123123123"
@@ -35,7 +36,7 @@ stages:
3536
status_code: 200
3637
json:
3738
id: !int "{webhook_id}"
38-
project: null
39+
project: !int "{configured_project.id}"
3940
actions: !anylist
4041
created_at: !anystr
4142
updated_at: !anystr

0 commit comments

Comments
 (0)