Skip to content

Commit b688c7b

Browse files
committed
Added documentation and fixed sample for Windows.
1 parent 574490d commit b688c7b

14 files changed

+291
-112
lines changed

README.md

+39-1
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,41 @@
11
## UBhashini
22

3-
A C# wrapper for the ULCA Bhashini API.
3+
A C# wrapper for the ULCA Bhashini API.
4+
5+
### Installation
6+
7+
This *should* work on any reasonably modern Unity version. Built and tested in Unity 2022.3.29f1.
8+
9+
#### From OpenUPM Through Unity Package Manager
10+
11+
1. Open project settings
12+
2. Select `Package Manager`
13+
3. Add the OpenUPM package registry:
14+
- Name: `OpenUPM`
15+
- URL: `https://package.openupm.com`
16+
- Scope(s)
17+
- `com.uralstech`
18+
- *`com.utilities`
19+
4. Open the Unity Package Manager window (`Window` -> `Package Manager`)
20+
5. Change the registry from `Unity` to `My Registries`
21+
6. Add the `UBhashini`, *`Utilities.Encoder.Wav` and *`Utilities.Audio` packages
22+
23+
#### From GitHub Through Unity Package Manager
24+
25+
1. Open the Unity Package Manager window (`Window` -> `Package Manager`)
26+
2. Select the `+` icon and `Add package from git URL...`
27+
3. Paste the UPM branch URL and press enter:
28+
- `https://github.com/Uralstech/UBhashini.git#upm`
29+
30+
*\*Adding additional dependencies:*<br/>
31+
Follow the steps detailed in the OpenUPM installation method and only install the *`Utilities.Encoder.Wav` and *`Utilities.Audio` packages.
32+
33+
*Optional, but required if you don't want to bother with encoding your AudioClips into Base64 strings manually, or, if you want to use the samples.
34+
35+
### Documentation
36+
37+
See <https://github.com/Uralstech/UBhashini/blob/master/UBhashini/Packages/com.uralstech.ubhashini/Documentation~/README.md>.
38+
39+
---
40+
41+
Made with the help of the [*great documentation by Himanshu Gupta!*](https://bhashini.gitbook.io/bhashini-apis)

UBhashini/Packages/com.uralstech.ubhashini/CHANGELOG.md

-9
This file was deleted.
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,115 @@
1-
## UBhashini
1+
## UBhashini Documentation
22

3-
A C# wrapper for the ULCA Bhashini API.
3+
### Setup
4+
5+
Add an instance of `BhashiniApiManager` to your scene, and set it up with your ULCA user ID and API key, as detailed in the [*Bhashini documentation*](https://bhashini.gitbook.io/bhashini-apis/pre-requisites-and-onboarding).
6+
7+
### Pipelines
8+
9+
As from the [*Bhashini documentation*](https://bhashini.gitbook.io/bhashini-apis):
10+
> ULCA Pipeline is a set of tasks that any specific pipeline supports. For example, any specific pipeline (identified by unique pipeline ID) can support the following:
11+
>
12+
> - only ASR (Speech To Text)
13+
> - only NMT (Translate)
14+
> - only TTS
15+
> - ASR + NMT
16+
> - NMT + TTS
17+
> - ASR + NMT + TTS
18+
>
19+
> Our R&D institutes can create pipelines using any of the available models on ULCA.
20+
21+
Basically, computation (STT, TTS, Translate) is done on a "pipeline". A "pipeline" is set to support a list of tasks, in a defined order, like:
22+
23+
- (input: audio) STT -> Translate (output: text)
24+
- (input: text) Translate -> TTS (output: audio)
25+
26+
In the given examples:
27+
28+
- Case 1 (STT -> Translate): From the given audio clip, the STT model computes text, which is sent automatically to the translate model, and text is returned.
29+
- Case 2 (Translate -> TTS): From the given text, the translate model computes text, which is sent automatically to the TTS model, and audio is returned.
30+
31+
You can have any combination of these tasks, or just individual ones. You can even have tasks like:
32+
33+
- STT -> Translate -> TTS!
34+
35+
#### Code
36+
37+
So, before we do any computation, we have to set up our pipelines:
38+
39+
```csharp
40+
using Uralstech.UBhashini;
41+
using Uralstech.UBhashini.Data;
42+
43+
// This example shows a pipeline configured for a set of tasks which will receive spoken English audio
44+
// as input, transcribe and translate it to Hindi, and finally convert the text to spoken Hindi audio.
45+
46+
BhashiniPipelineConfigResponse response = await BhashiniApiManager.Instance.ConfigurePipeline(new BhashiniPipelineTask[]
47+
{
48+
BhashiniPipelineTask.GetConfigurationTask(BhashiniPipelineTaskType.SpeechToText, "en"), // Here, "en" is the source language.
49+
BhashiniPipelineTask.GetConfigurationTask(BhashiniPipelineTaskType.TextTranslation, "en", "hi"), // Here, "en" is still the source language, but "hi" is the target language.
50+
BhashiniPipelineTask.GetConfigurationTask(BhashiniPipelineTaskType.TextToSpeech, "hi"), // Here, the source language is "hi".
51+
});
52+
```
53+
54+
The Bhashini API follows the [*ISO-639*](https://www.loc.gov/standards/iso639-2/php/code_list.php) standard for language codes.
55+
56+
The API wrapper class, `BhashiniApiManager`, usually returns `null` in if a request fails. Check the debug window or logs for errors in such cases.
57+
58+
Now, we store the computation inference data in variables:
59+
60+
```csharp
61+
BhashiniPipelineInferenceData _inferenceData = response.PipelineEndpoint;
62+
63+
BhashiniPipelineData _sttData = response.PipelineResponseConfig[0].Data[0];
64+
BhashiniPipelineData _translateData = response.PipelineResponseConfig[1].Data[0];
65+
BhashiniPipelineData _ttsData = response.PipelineResponseConfig[2].Data[0];
66+
```
67+
68+
Here, as we specified the expected source and target languages for each task in the pipeline, we know the order of pipeline configurations in `PipelineResponseConfig`.
69+
This may not always be the case. It is recommended to check the array of configurations for the desired model(s).
70+
71+
### Computation
72+
73+
Now that we have the inference data and pipelines configured, we can go straight into computation.
74+
75+
#### Code
76+
77+
```csharp
78+
_audioClip = ...
79+
_audioSource = ...
80+
81+
BhashiniPipelineTask[] tasks = new BhashiniPipelineTask[]
82+
{
83+
_sttData.GetSpeechToTextTask(),
84+
_translateData.GetTextTranslateTask(),
85+
_ttsData.GetTextToSpeechTask(BhashiniVoiceType.Male),
86+
};
87+
88+
BhashiniComputeResponse response = await BhashiniApiManager.Instance.ComputeOnPipeline(_inferenceData, tasks, audioSource: _audioClip);
89+
90+
AudioClip result = await response.GetTextToSpeechResult();
91+
_audioSource.PlayOneShot(result);
92+
```
93+
94+
`ComputeOnPipeline` accepts three optional parameter:
95+
- `textSource` - This is for text-input-based tasks, like Translate or TTS.
96+
- `audioSource` - This is for audio-input-based tasks, like STT. This parameter also requires the `Utilities.Encoder.Wav` and `Utilities.Audio` packages.
97+
- `rawBase64AudioSource` - This is also for audio-input-based tasks, but takes the raw Base64-encoded audio data. You will have to encode your audio manually.
98+
99+
You must only provide one of the parameters at a time, based on the first task given to the pipeline.
100+
101+
Also, `GetSpeechToTextTask` takes an optional `sampleRate` argument. By default, it is 44100, but make sure it matches with your audio data.
102+
103+
`BhashiniComputeResponse` contains three utility functions to help extract the actual text or audio response:
104+
- `GetSpeechToTextResult`
105+
- `GetTextTranslateResult` and
106+
- `GetTextToSpeechResult`
107+
108+
You should call them based on the last task in the pipeline's task list. If your pipeline's last task is STT, use `GetSpeechToTextResult`.
109+
If the last task is translate, use `GetTextTranslateResult`.
110+
111+
`ComputeOnPipeline` and `GetTextToSpeechResult` will throw `BhashiniAudioIOException` errors if they encounter an unsupported format.
112+
113+
---
114+
115+
And that's it! You've learnt how to use the Bhashini API in Unity!

UBhashini/Packages/com.uralstech.ubhashini/Samples~/ASR-Translate-TTS.meta

+8
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

UBhashini/Packages/com.uralstech.ubhashini/Samples~/ASR-Translate-TTS/Scenes.meta

+8
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

UBhashini/Packages/com.uralstech.ubhashini/Samples~/ASR-Translate-TTS/Scenes/ASR-Translate-TTS_Demo.unity

+1-1
Original file line numberDiff line numberDiff line change
@@ -2633,7 +2633,7 @@ MonoBehaviour:
26332633
m_GameObject: {fileID: 1214495822}
26342634
m_Enabled: 1
26352635
m_EditorHideFlags: 0
2636-
m_Script: {fileID: 11500000, guid: 44079af7b9d52724eb845b2b44230d8f, type: 3}
2636+
m_Script: {fileID: 11500000, guid: 578eb5db91665c74d8f273702776cc17, type: 3}
26372637
m_Name:
26382638
m_EditorClassIdentifier:
26392639
_audioSource: {fileID: 1214495828}

UBhashini/Packages/com.uralstech.ubhashini/CHANGELOG.md.meta UBhashini/Packages/com.uralstech.ubhashini/Samples~/ASR-Translate-TTS/Scenes/ASR-Translate-TTS_Demo.unity.meta

+2-2
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

UBhashini/Packages/com.uralstech.ubhashini/Samples~/ASR-Translate-TTS/Scripts.meta

+8
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)