Skip to content

Commit d034128

Browse files
authored
Merge branch 'develop' into fb-bros-23
2 parents b78761c + aaf97d2 commit d034128

File tree

37 files changed

+1014
-51
lines changed

37 files changed

+1014
-51
lines changed

.github/workflows/apply-linters.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ jobs:
2424
ref: ${{ inputs.branch_name }}
2525

2626
- name: Set up Python
27-
uses: actions/setup-python@v4
27+
uses: actions/setup-python@v5
2828
with:
2929
python-version: '3.12'
3030

.github/workflows/docker-build-ontop.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -110,7 +110,7 @@ jobs:
110110
${{ steps.calculate-docker-tags.outputs.docker-tags }}
111111
112112
- name: Push Docker image
113-
uses: docker/build-push-action@v6.16.0
113+
uses: docker/build-push-action@v6.17.0
114114
id: docker_build_and_push
115115
with:
116116
context: .

.github/workflows/docker-build.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -130,7 +130,7 @@ jobs:
130130
type=raw,value=${{ steps.version.outputs.build_version }}
131131
132132
- name: Push Docker image
133-
uses: docker/build-push-action@v6.16.0
133+
uses: docker/build-push-action@v6.17.0
134134
id: docker_build_and_push
135135
with:
136136
context: .

.github/workflows/docker-release-promote.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -195,7 +195,7 @@ jobs:
195195
${{ steps.generate-tags.outputs.ubuntu-tags }}
196196
197197
- name: Build and Push Release Ubuntu Docker image
198-
uses: docker/build-push-action@v6.16.0
198+
uses: docker/build-push-action@v6.17.0
199199
id: docker_build
200200
with:
201201
context: ${{ steps.release_dockerfile.outputs.release_dir }}

.github/workflows/tests.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -140,7 +140,7 @@ jobs:
140140

141141
- name: Upload coverage to Codecov
142142
if: ${{ github.event.pull_request.head.repo.fork == false && github.event.pull_request.user.login != 'dependabot[bot]' }}
143-
uses: codecov/codecov-action@v5.4.2
143+
uses: codecov/codecov-action@v5.4.3
144144
with:
145145
name: codecov-python-${{ matrix.python-version }}
146146
flags: pytests

docs/source/guide/install_enterprise_docker.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,9 @@ See [Secure Label Studio](security.html) for more details about security and har
1919

2020
To install Label Studio Community Edition, see [Install Label Studio](https://labelstud.io/guide/install). This page is specific to the Enterprise version of Label Studio.
2121

22+
!!! note
23+
On-prem deployments of Label Studio Enterprise are not supported for Academic licenses.
24+
2225
{% insertmd includes/deploy.md %}
2326

2427
## Install Label Studio Enterprise using Docker

docs/source/guide/install_enterprise_k8s.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,9 @@ Your Kubernetes cluster can be self-hosted or installed somewhere such as Amazon
2323

2424
</div>
2525

26+
!!! note
27+
On-prem deployments of Label Studio Enterprise are not supported for Academic licenses.
28+
2629
This high-level architecture diagram that outlines the main components of a Label Studio Enterprise deployment.
2730

2831
<img src="/images/LSE_k8s_scheme.png"/>

docs/source/tags/pdf.md

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
---
2+
title: PDF
3+
type: tags
4+
order: 302
5+
meta_title: PDF Tag for loading PDF documents
6+
meta_description: Label Studio PDF Tag for loading PDF documents for machine learning and data science projects.
7+
---
8+
9+
The `Pdf` tag displays a PDF document for labeling. Use for performing document-level annotations, transcription, and summarization.
10+
11+
Use with the following data types: PDF.
12+
13+
### Parameters
14+
15+
| Param | Type | Default | Description |
16+
| --- | --- | --- | --- |
17+
| name | <code>string</code> | | Name of the element |
18+
| value | <code>string</code> | | Value of the element - field name to retrieve the PDF URL from |
19+
20+
### Supported Control tags
21+
Document-level annotations are supported with Pdf tag, for example:
22+
23+
- Document classification with [Choices](/tags/choices.html)
24+
- Document rating with [Rating](/tags/rating.html)
25+
- Transcription and summarization with [TextArea](/tags/textarea.html)
26+
27+
### Example
28+
29+
Labeling configuration to label PDF documents:
30+
31+
```html
32+
<View>
33+
<Pdf name="pdf" value="$pdf" />
34+
<Choices name="choices" toName="pdf">
35+
<Choice value="Legal" />
36+
<Choice value="Financial" />
37+
<Choice value="Technical" />
38+
</Choices>
39+
</View>
40+
```
41+
42+
**Example Input data:**
43+
44+
```json
45+
{
46+
"pdf": "https://app.humansignal.com/static/samples/sample.pdf"
47+
}
48+
```
49+

docs/source/templates/pdf_classification.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -26,11 +26,11 @@ If you want to perform PDF classification, use this template. This template prom
2626
<Choice value="Important article"/>
2727
<Choice value="Yellow press"/>
2828
</Choices>
29-
<HyperText name="pdf" value="$pdf" inline="true"/>
29+
<Pdf name="pdf" value="$pdf"/>
3030
</View>
3131

3232
<!-- {
33-
"pdf": "<embed src='https://app.heartex.ai/static/samples/sample.pdf' width='100%' height='600px'/>"
33+
"pdf": "/static/samples/sample.pdf"
3434
} -->
3535
```
3636

@@ -56,9 +56,9 @@ Use the [Choices](/tags/choices.html) control tag to present classification opti
5656
</Choices>
5757
```
5858

59-
Use the [HyperText](/tags/hypertext.html) tag to render an inline version of the PDF data:
59+
Use the [Pdf](/tags/pdf.html) tag to render an inline version of the PDF data:
6060
```xml
61-
<HyperText name="pdf" value="$pdf" inline="true"/>
61+
<Pdf name="pdf" value="$pdf"/>
6262
```
6363

6464
### Input data
@@ -74,4 +74,4 @@ Label Studio does not support labeling PDF-formatted files directly. You should
7474
## Related tags
7575
- [Rating](/tags/rating.html)
7676
- [Choices](/tags/choices.html)
77-
- [HyperText](/tags/hypertext.html)
77+
- [Pdf](/tags/pdf.html)

label_studio/annotation_templates/structured-data-parsing/pdf-classification/config.xml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
<Choice value="Important article"/>
77
<Choice value="Yellow press"/>
88
</Choices>
9-
<HyperText name="pdf" value="$pdf" inline="true"/>
9+
<Pdf name="pdf" value="$pdf"/>
1010
</View>
1111

1212

label_studio/annotation_templates/structured-data-parsing/pdf-classification/config.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,13 +12,13 @@ config: |
1212
<Choice value="Important article"/>
1313
<Choice value="Yellow press"/>
1414
</Choices>
15-
<HyperText name="pdf" value="$pdf" inline="true"/>
15+
<Pdf name="pdf" value="$pdf"/>
1616
</View>
1717
1818
1919
<!-- {
2020
"data": {
21-
"pdf": "<embed src='/static/samples/sample.pdf' width='100%' height='600px'/>"
21+
"pdf": "/static/samples/sample.pdf"
2222
}
2323
} -->
2424

label_studio/core/settings/base.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -494,6 +494,7 @@
494494
'.mp4',
495495
'.webm',
496496
'.webp',
497+
'.pdf',
497498
]
498499
)
499500

label_studio/data_manager/actions/remove_duplicates.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -172,6 +172,8 @@ def restore_storage_links_for_duplicated_tasks(duplicates) -> None:
172172
link = storage_link_class(
173173
task_id=task['id'],
174174
key=link_instance.key,
175+
row_index=link_instance.row_index,
176+
row_group=link_instance.row_group,
175177
storage=link_instance.storage,
176178
)
177179
link.save()

label_studio/feature_flags.json

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3120,6 +3120,33 @@
31203120
"version": 2,
31213121
"deleted": false
31223122
},
3123+
"fflag_feat_root_11_support_jsonl_cloud_storage": {
3124+
"key": "fflag_feat_root_11_support_jsonl_cloud_storage",
3125+
"on": false,
3126+
"prerequisites": [],
3127+
"targets": [],
3128+
"contextTargets": [],
3129+
"rules": [],
3130+
"fallthrough": {
3131+
"variation": 0
3132+
},
3133+
"offVariation": 1,
3134+
"variations": [
3135+
true,
3136+
false
3137+
],
3138+
"clientSideAvailability": {
3139+
"usingMobileKey": false,
3140+
"usingEnvironmentId": false
3141+
},
3142+
"clientSide": false,
3143+
"salt": "85e018dcd2e64c689a61ee7ed3c5edb2",
3144+
"trackEvents": false,
3145+
"trackEventsFallthrough": false,
3146+
"debugEventsUntilDate": null,
3147+
"version": 2,
3148+
"deleted": false
3149+
},
31233150
"fflag_feature_all_optic_1421_cold_start_v2": {
31243151
"key": "fflag_feature_all_optic_1421_cold_start_v2",
31253152
"on": false,

label_studio/io_storages/azure_blob/models.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -209,17 +209,17 @@ def iterkeys(self):
209209
continue
210210
yield file.name
211211

212-
def get_data(self, key) -> list[dict]:
212+
def get_data(self, key) -> Union[dict, list[dict]]:
213213
if self.use_blob_urls:
214214
data_key = settings.DATA_UNDEFINED_NAME
215-
return [{data_key: f'{self.url_scheme}://{self.container}/{key}'}]
215+
return {data_key: f'{self.url_scheme}://{self.container}/{key}'}
216216

217217
container = self.get_container()
218218
blob = container.download_blob(key)
219219
blob_str = blob.content_as_text()
220220
value = json.loads(blob_str)
221221
if isinstance(value, dict):
222-
return [value]
222+
return value
223223
elif isinstance(value, list):
224224
for idx, item in enumerate(value):
225225
if not isinstance(item, dict):

label_studio/io_storages/base_models.py

Lines changed: 24 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -341,7 +341,7 @@ def _scan_and_create_links_v2(self):
341341
raise NotImplementedError
342342

343343
@classmethod
344-
def add_task(cls, data, project, maximum_annotations, max_inner_id, storage, key, link_class):
344+
def add_task(cls, data, project, maximum_annotations, max_inner_id, storage, key, row_index, link_class):
345345
# predictions
346346
predictions = data.get('predictions', [])
347347
if predictions:
@@ -375,8 +375,8 @@ def add_task(cls, data, project, maximum_annotations, max_inner_id, storage, key
375375
inner_id=max_inner_id,
376376
)
377377

378-
link_class.create(task, key, storage)
379-
logger.debug(f'Create {storage.__class__.__name__} link with key={key} for task={task}')
378+
link_class.create(task, key, storage, row_index=row_index)
379+
logger.debug(f'Create {storage.__class__.__name__} link with {key=} and {row_index=} for {task=}')
380380

381381
raise_exception = not flag_set(
382382
'ff_fix_back_dev_3342_storage_scan_with_invalid_annotations', user=AnonymousUser()
@@ -423,10 +423,10 @@ def _scan_and_create_links(self, link_class):
423423
logger.debug('Scanning key %s', key)
424424
self.info_update_progress(last_sync_count=tasks_created, tasks_existed=tasks_existed)
425425

426-
# skip if task already exists
427-
if link_class.exists(key, self):
426+
# skip if key has already been synced
427+
if n_tasks_linked := link_class.n_tasks_linked(key, self):
428428
logger.debug('%s link %s already exists', self.__class__.__name__, key)
429-
tasks_existed += 1 # update progress counter
429+
tasks_existed += n_tasks_linked # update progress counter
430430
continue
431431

432432
logger.debug('%s: found new key %s', self.__class__.__name__, key)
@@ -441,13 +441,20 @@ def _scan_and_create_links(self, link_class):
441441
)
442442
continue
443443

444-
if not flag_set('fflag_feat_dia_2092_multitasks_per_storage_link'):
445-
tasks_data = tasks_data[:1]
444+
if isinstance(tasks_data, dict):
445+
tasks_data = [tasks_data]
446+
row_indices = [None]
447+
else:
448+
if not flag_set('fflag_feat_dia_2092_multitasks_per_storage_link'):
449+
tasks_data = tasks_data[:1]
450+
row_indices = range(len(tasks_data))
446451

447-
for task_data in tasks_data:
452+
for row_index, task_data in zip(row_indices, tasks_data):
448453
# TODO: batch this loop body with add_task -> add_tasks in a single bulk write.
449-
# Also have to handle any mismatch between len(tasks_data) and settings.WEBHOOK_BATCH_SIZE
450-
task = self.add_task(task_data, self.project, maximum_annotations, max_inner_id, self, key, link_class)
454+
# See DIA-2062 for prerequisites
455+
task = self.add_task(
456+
task_data, self.project, maximum_annotations, max_inner_id, self, key, row_index, link_class
457+
)
451458
max_inner_id += 1
452459

453460
# update progress counters for storage info
@@ -702,12 +709,14 @@ class ImportStorageLink(models.Model):
702709
row_index = models.IntegerField(null=True, blank=True, help_text='Parquet row index, or JSON[L] object index')
703710

704711
@classmethod
705-
def exists(cls, key, storage):
706-
return cls.objects.filter(key=key, storage=storage.id).exists()
712+
def n_tasks_linked(cls, key, storage):
713+
return cls.objects.filter(key=key, storage=storage.id).count()
707714

708715
@classmethod
709-
def create(cls, task, key, storage):
710-
link, created = cls.objects.get_or_create(task_id=task.id, key=key, storage=storage, object_exists=True)
716+
def create(cls, task, key, storage, row_index=None, row_group=None):
717+
link, created = cls.objects.get_or_create(
718+
task_id=task.id, key=key, row_index=row_index, row_group=row_group, storage=storage, object_exists=True
719+
)
711720
return link
712721

713722
class Meta:

label_studio/io_storages/gcs/models.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -180,17 +180,17 @@ def iterkeys(self):
180180
return_key=True,
181181
)
182182

183-
def get_data(self, key) -> list[dict]:
183+
def get_data(self, key) -> Union[dict, list[dict]]:
184184
if self.use_blob_urls:
185-
return [{settings.DATA_UNDEFINED_NAME: GCS.get_uri(self.bucket, key)}]
185+
return {settings.DATA_UNDEFINED_NAME: GCS.get_uri(self.bucket, key)}
186186
data = GCS.read_file(
187187
client=self.get_client(),
188188
bucket_name=self.bucket,
189189
key=key,
190190
convert_to=GCS.ConvertBlobTo.JSON,
191191
)
192192
if isinstance(data, dict):
193-
return [data]
193+
return data
194194
elif isinstance(data, list):
195195
for idx, item in enumerate(data):
196196
if not isinstance(item, dict):

label_studio/io_storages/localfiles/models.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -78,16 +78,16 @@ def iterkeys(self):
7878
continue
7979
yield str(file)
8080

81-
def get_data(self, key) -> list[dict]:
81+
def get_data(self, key) -> dict | list[dict]:
8282
path = Path(key)
8383
if self.use_blob_urls:
8484
# include self-hosted links pointed to local resources via
8585
# {settings.HOSTNAME}/data/local-files?d=<path/to/local/dir>
8686
document_root = Path(settings.LOCAL_FILES_DOCUMENT_ROOT)
8787
relative_path = str(path.relative_to(document_root))
88-
return [
89-
{settings.DATA_UNDEFINED_NAME: f'{settings.HOSTNAME}/data/local-files/?d={quote(str(relative_path))}'}
90-
]
88+
return {
89+
settings.DATA_UNDEFINED_NAME: f'{settings.HOSTNAME}/data/local-files/?d={quote(str(relative_path))}'
90+
}
9191

9292
try:
9393
with open(path, encoding='utf8') as f:
@@ -99,7 +99,7 @@ def get_data(self, key) -> list[dict]:
9999
)
100100

101101
if isinstance(value, dict):
102-
return [value]
102+
return value
103103
elif isinstance(value, list):
104104
for idx, item in enumerate(value):
105105
if not isinstance(item, dict):

label_studio/io_storages/redis/models.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33

44
import json
55
import logging
6+
from typing import Union
67

78
import redis
89
from django.db import models
@@ -89,7 +90,7 @@ def iterkeys(self):
8990
for key in client.keys(path + '*'):
9091
yield key
9192

92-
def get_data(self, key) -> list[dict]:
93+
def get_data(self, key) -> Union[dict, list[dict]]:
9394
client = self.get_client()
9495
value_str = client.get(key)
9596
if not value_str:
@@ -98,7 +99,7 @@ def get_data(self, key) -> list[dict]:
9899
value = json.loads(value_str)
99100
# NOTE: this validation did not previously exist, we were accepting any JSON values
100101
if isinstance(value, dict):
101-
return [value]
102+
return value
102103
elif isinstance(value, list):
103104
for idx, item in enumerate(value):
104105
if not isinstance(item, dict):

0 commit comments

Comments
 (0)