You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This section contains the API reference for the `distilabel` image generation models, both for the [`ImageGenerationModel`][distilabel.models.image_generation.ImageGenerationModel] synchronous implementation, and for the [`AsyncImageGenerationModel`][distilabel.models.image_generation.AsyncImageGenerationModel] asynchronous one.
4
+
5
+
For more information and examples on how to use existing LLMs or create custom ones, please refer to [Tutorial - ImageGenerationModel](../../../sections/how_to_guides/basic/task/image_task.md).
This section contains the API reference for the `distilabel` image generation tasks.
4
+
5
+
For more information on how the [`ImageTask`][distilabel.steps.tasks.ImageTask] works and see some examples, check the [Tutorial - Task - ImageTask](../../sections/how_to_guides/basic/task/generator_task.md) page.
Copy file name to clipboardexpand all lines: docs/sections/how_to_guides/advanced/distiset.md
+27
Original file line number
Diff line number
Diff line change
@@ -119,6 +119,33 @@ class MagpieGenerator(GeneratorTask, MagpieBase):
119
119
120
120
The `Citations` section can include any number of bibtex references. To define them, you can add as much elements as needed just like in the example: each citation will be a block of the form: ` ```@misc{...}``` `. This information will be automatically used in the README of your `Distiset` if you decide to call `distiset.push_to_hub`. Alternatively, if the `Citations` is not found, but in the `References` there are found any urls pointing to `https://arxiv.org/`, we will try to obtain the `Bibtex` equivalent automatically. This way, Hugging Face can automatically track the paper for you and it's easier to find other datasets citing the same paper, or directly visiting the paper page.
121
121
122
+
#### Image Datasets
123
+
124
+
!!! info "Keep reading if you are interested in Image datasets"
125
+
126
+
The `Distiset` object has a new method `transform_columns_to_image` specifically to transform the images to `PIL.Image.Image` before pushing the dataset to the hugging face hub.
127
+
128
+
Since version `1.5.0` we have the [`ImageGeneration`](https://distilabel.argilla.io/dev/components-gallery/task/imagegeneration/) task that is able to generate images from text. By default, all the process will work internally with a string representation for the images. This is done for simplicity while processing. But to take advantage of the Hugging Face Hub functionalities if the dataset generated is going to be stored there, a proper Image object may be preferable, so we can see the images in the dataset viewer for example. Let's take a look at the following pipeline extracted from "examples/image_generation.py" at the root of the repository to see how we can do it:
129
+
130
+
```diff
131
+
# Assume all the imports are already done, we are only interested
132
+
with Pipeline(name="image_generation_pipeline") as pipeline:
After calling [`transform_columns_to_image`][distilabel.distiset.Distiset.transform_columns_to_image] on the image columns we may have generated (in this case we only want to transform the `image` column, but a list can be passed). This will apply to any leaf nodes we have in the pipeline, meaning if we have different subsets, the "image" column will be found in all of them, or we can pass a list of columns.
148
+
122
149
### Save and load from disk
123
150
124
151
Take into account that these methods work as `datasets.load_from_disk` and `datasets.Dataset.save_to_disk` so the arguments are directly passed to those methods. This means you can also make use of `storage_options` argument to save your [`Distiset`][distilabel.distiset.Distiset] in your cloud provider, including the distilabel artifacts (`pipeline.yaml`, `pipeline.log` and the `README.md` with the dataset card). You can read more in `datasets` documentation [here](https://huggingface.co/docs/datasets/filesystems#saving-serialized-datasets).
Copy file name to clipboardexpand all lines: docs/sections/how_to_guides/basic/step/generator_step.md
+4-4
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,7 @@ from typing_extensions import override
9
9
from distilabel.steps import GeneratorStep
10
10
11
11
ifTYPE_CHECKING:
12
-
from distilabel.steps.typing import StepColumns, GeneratorStepOutput
12
+
from distilabel.typing import StepColumns, GeneratorStepOutput
13
13
14
14
classMyGeneratorStep(GeneratorStep):
15
15
instructions: List[str]
@@ -67,7 +67,7 @@ We can define a custom generator step by creating a new subclass of the [`Genera
67
67
The default signature for the `process` method is `process(self, offset: int = 0) -> GeneratorStepOutput`. The argument `offset` should be respected, no more arguments can be provided, and the type-hints and return type-hints should be respected too because it should be able to receive any number of inputs by default i.e. more than one [`Step`][distilabel.steps.Step] at a time could be connected to the current one.
68
68
69
69
!!! WARNING
70
-
For the custom [`Step`][distilabel.steps.Step] subclasses to work properly with `distilabel` and with the validation and serialization performed by default over each [`Step`][distilabel.steps.Step] in the [`Pipeline`][distilabel.pipeline.Pipeline], the type-hint for both [`StepInput`][distilabel.steps.StepInput] and [`StepOutput`][distilabel.steps.typing.StepOutput] should be used and not surrounded with double-quotes or imported under `typing.TYPE_CHECKING`, otherwise, the validation and/or serialization will fail.
70
+
For the custom [`Step`][distilabel.steps.Step] subclasses to work properly with `distilabel` and with the validation and serialization performed by default over each [`Step`][distilabel.steps.Step] in the [`Pipeline`][distilabel.pipeline.Pipeline], the type-hint for both [`StepInput`][distilabel.steps.StepInput] and [`StepOutput`][distilabel.typing.StepOutput] should be used and not surrounded with double-quotes or imported under `typing.TYPE_CHECKING`, otherwise, the validation and/or serialization will fail.
71
71
72
72
=== "Inherit from `GeneratorStep`"
73
73
@@ -81,7 +81,7 @@ We can define a custom generator step by creating a new subclass of the [`Genera
81
81
from distilabel.steps import GeneratorStep
82
82
83
83
if TYPE_CHECKING:
84
-
from distilabel.steps.typing import StepColumns, GeneratorStepOutput
84
+
from distilabel.typing import StepColumns, GeneratorStepOutput
85
85
86
86
class MyGeneratorStep(GeneratorStep):
87
87
instructions: List[str]
@@ -104,7 +104,7 @@ We can define a custom generator step by creating a new subclass of the [`Genera
104
104
from distilabel.steps import step
105
105
106
106
if TYPE_CHECKING:
107
-
from distilabel.steps.typing import GeneratorStepOutput
107
+
from distilabel.typing import GeneratorStepOutput
108
108
109
109
@step(outputs=[...], step_type="generator")
110
110
def CustomGeneratorStep(offset: int = 0) -> "GeneratorStepOutput":
Copy file name to clipboardexpand all lines: docs/sections/how_to_guides/basic/step/global_step.md
+3-3
Original file line number
Diff line number
Diff line change
@@ -16,7 +16,7 @@ We can define a custom step by creating a new subclass of the [`GlobalStep`][dis
16
16
The default signature for the `process` method is `process(self, *inputs: StepInput) -> StepOutput`. The argument `inputs` should be respected, no more arguments can be provided, and the type-hints and return type-hints should be respected too because it should be able to receive any number of inputs by default i.e. more than one [`Step`][distilabel.steps.Step] at a time could be connected to the current one.
17
17
18
18
!!! WARNING
19
-
For the custom [`GlobalStep`][distilabel.steps.GlobalStep] subclasses to work properly with `distilabel` and with the validation and serialization performed by default over each [`Step`][distilabel.steps.Step] in the [`Pipeline`][distilabel.pipeline.Pipeline], the type-hint for both [`StepInput`][distilabel.steps.StepInput] and [`StepOutput`][distilabel.steps.typing.StepOutput] should be used and not surrounded with double-quotes or imported under `typing.TYPE_CHECKING`, otherwise, the validation and/or serialization will fail.
19
+
For the custom [`GlobalStep`][distilabel.steps.GlobalStep] subclasses to work properly with `distilabel` and with the validation and serialization performed by default over each [`Step`][distilabel.steps.Step] in the [`Pipeline`][distilabel.pipeline.Pipeline], the type-hint for both [`StepInput`][distilabel.steps.StepInput] and [`StepOutput`][distilabel.typing.StepOutput] should be used and not surrounded with double-quotes or imported under `typing.TYPE_CHECKING`, otherwise, the validation and/or serialization will fail.
20
20
21
21
=== "Inherit from `GlobalStep`"
22
22
@@ -27,7 +27,7 @@ We can define a custom step by creating a new subclass of the [`GlobalStep`][dis
27
27
from distilabel.steps import GlobalStep, StepInput
28
28
29
29
if TYPE_CHECKING:
30
-
from distilabel.steps.typing import StepColumns, StepOutput
30
+
from distilabel.typing import StepColumns, StepOutput
31
31
32
32
class CustomStep(Step):
33
33
@property
@@ -61,7 +61,7 @@ We can define a custom step by creating a new subclass of the [`GlobalStep`][dis
Copy file name to clipboardexpand all lines: docs/sections/how_to_guides/basic/step/index.md
+4-4
Original file line number
Diff line number
Diff line change
@@ -11,7 +11,7 @@ from typing import TYPE_CHECKING
11
11
from distilabel.steps import Step, StepInput
12
12
13
13
ifTYPE_CHECKING:
14
-
from distilabel.steps.typing import StepColumns, StepOutput
14
+
from distilabel.typing import StepColumns, StepOutput
15
15
16
16
classMyStep(Step):
17
17
@property
@@ -87,7 +87,7 @@ We can define a custom step by creating a new subclass of the [`Step`][distilabe
87
87
The default signature for the `process` method is `process(self, *inputs: StepInput) -> StepOutput`. The argument `inputs` should be respected, no more arguments can be provided, and the type-hints and return type-hints should be respected too because it should be able to receive any number of inputs by default i.e. more than one [`Step`][distilabel.steps.Step] at a time could be connected to the current one.
88
88
89
89
!!! WARNING
90
-
For the custom [`Step`][distilabel.steps.Step] subclasses to work properly with `distilabel` and with the validation and serialization performed by default over each [`Step`][distilabel.steps.Step] in the [`Pipeline`][distilabel.pipeline.Pipeline], the type-hint for both [`StepInput`][distilabel.steps.StepInput] and [`StepOutput`][distilabel.steps.typing.StepOutput] should be used and not surrounded with double-quotes or imported under `typing.TYPE_CHECKING`, otherwise, the validation and/or serialization will fail.
90
+
For the custom [`Step`][distilabel.steps.Step] subclasses to work properly with `distilabel` and with the validation and serialization performed by default over each [`Step`][distilabel.steps.Step] in the [`Pipeline`][distilabel.pipeline.Pipeline], the type-hint for both [`StepInput`][distilabel.steps.StepInput] and [`StepOutput`][distilabel.typing.StepOutput] should be used and not surrounded with double-quotes or imported under `typing.TYPE_CHECKING`, otherwise, the validation and/or serialization will fail.
91
91
92
92
=== "Inherit from `Step`"
93
93
@@ -98,7 +98,7 @@ We can define a custom step by creating a new subclass of the [`Step`][distilabe
98
98
from distilabel.steps import Step, StepInput
99
99
100
100
if TYPE_CHECKING:
101
-
from distilabel.steps.typing import StepColumns, StepOutput
101
+
from distilabel.typing import StepColumns, StepOutput
102
102
103
103
class CustomStep(Step):
104
104
@property
@@ -132,7 +132,7 @@ We can define a custom step by creating a new subclass of the [`Step`][distilabe
0 commit comments