You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Add link to accept conditions of segmentation 3.0
* Add table with available models. Add some latencies
* Add more info on selecting different models
* Add missing info on available models
* Improve top menu
* Improve python badge
* Move things around. Simplify code and wording
* Add dark themed logo
* Remove whitespace at the top
* Update README.md
* Rename from_pyannote to from_pretrained in segmentation and embedding blocks
* Separate huggingface links from model name
* Fix reproducibility link
* Add animated diarization pipeline diagram
* Improve pipeline gif
* Update README.md
* Update snippet gif. Fix torch multiprocessing crash with pyannote 3.1. Other README improvements
* Update README.md
* Fix bad link
**1) Make sure your system has the following dependencies:**
82
+
83
+
```
84
+
ffmpeg < 4.4
85
+
portaudio == 19.6.X
86
+
libsndfile >= 1.2.2
87
+
```
88
+
89
+
Alternatively, we provide an `environment.yml` file for a pre-configured conda environment:
81
90
82
91
```shell
83
92
conda env create -f diart/environment.yml
84
93
conda activate diart
85
94
```
86
95
87
-
2) Install the package:
96
+
**2) Install the package:**
88
97
```shell
89
98
pip install diart
90
99
```
91
100
92
101
### Get access to 🎹 pyannote models
93
102
94
-
By default, diart is based on [pyannote.audio](https://github.com/pyannote/pyannote-audio) models stored in the [huggingface](https://huggingface.co/) hub.
95
-
To allow diart to use them, you need to follow these steps:
103
+
By default, diart is based on [pyannote.audio](https://github.com/pyannote/pyannote-audio) models from the [huggingface](https://huggingface.co/) hub.
104
+
In order to use them, please follow these steps:
96
105
97
106
1)[Accept user conditions](https://huggingface.co/pyannote/segmentation) for the `pyannote/segmentation` model
98
-
2)[Accept user conditions](https://huggingface.co/pyannote/embedding) for the `pyannote/embedding` model
99
-
3) Install [huggingface-cli](https://huggingface.co/docs/huggingface_hub/quick-start#install-the-hub-library) and [log in](https://huggingface.co/docs/huggingface_hub/quick-start#login) with your user access token (or provide it manually in diart CLI or API).
107
+
2)[Accept user conditions](https://huggingface.co/pyannote/segmentation-3.0) for the newest `pyannote/segmentation-3.0` model
108
+
3)[Accept user conditions](https://huggingface.co/pyannote/embedding) for the `pyannote/embedding` model
109
+
4) Install [huggingface-cli](https://huggingface.co/docs/huggingface_hub/quick-start#install-the-hub-library) and [log in](https://huggingface.co/docs/huggingface_hub/quick-start#login) with your user access token (or provide it manually in diart CLI or API).
100
110
101
111
## 🎙️ Stream audio
102
112
@@ -116,7 +126,8 @@ A live conversation:
116
126
diart.stream microphone
117
127
```
118
128
119
-
See `diart.stream -h` for more options.
129
+
By default, diart runs a speaker diarization pipeline, equivalent to setting `--pipeline SpeakerDiarization`,
130
+
but you can also set it to `--pipeline VoiceActivityDetection`. See `diart.stream -h` for more options.
If you have an ONNX model, you can use `from_onnx()`:
@@ -204,7 +243,7 @@ optimizer(num_iter=100)
204
243
205
244
This will write results to an sqlite database in `/output/dir`.
206
245
207
-
### Distributed optimization
246
+
### Distributed tuning
208
247
209
248
For bigger datasets, it is sometimes more convenient to run multiple optimization processes in parallel.
210
249
To do this, create a study on a [recommended DBMS](https://optuna.readthedocs.io/en/stable/tutorial/10_key_features/004_distributed.html#sphx-glr-tutorial-10-key-features-004-distributed-py) (e.g. MySQL or PostgreSQL) making sure that the study and database names match:
@@ -248,8 +287,8 @@ import diart.operators as dops
248
287
from diart.sources import MicrophoneAudioSource
249
288
from diart.blocks import SpeakerSegmentation, OverlapAwareSpeakerEmbedding
of the paper implementation in RTTM format for every entry of Table 1 and Figure 5.
394
433
This includes the VBx offline topline as well as our proposed online approach with
395
434
latencies 500ms, 1s, 2s, 3s, 4s, and 5s.
@@ -423,4 +462,4 @@ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
423
462
SOFTWARE.
424
463
```
425
464
426
-
<p>Logo generated by <ahref="https://www.designevo.com/"title="Free Online Logo Maker">DesignEvo free logo designer</a></p>
465
+
<pstyle="color:grey;font-size:14px;">Logo generated by <ahref="https://www.designevo.com/"title="Free Online Logo Maker">DesignEvo free logo designer</a></p>
0 commit comments