|
1 |
| -## UBhashini |
| 1 | +## UBhashini Documentation |
2 | 2 |
|
3 |
| -A C# wrapper for the ULCA Bhashini API. |
| 3 | +### Setup |
| 4 | + |
| 5 | +Add an instance of `BhashiniApiManager` to your scene, and set it up with your ULCA user ID and API key, as detailed in the [*Bhashini documentation*](https://bhashini.gitbook.io/bhashini-apis/pre-requisites-and-onboarding). |
| 6 | + |
| 7 | +### Pipelines |
| 8 | + |
| 9 | +As from the [*Bhashini documentation*](https://bhashini.gitbook.io/bhashini-apis): |
| 10 | +> ULCA Pipeline is a set of tasks that any specific pipeline supports. For example, any specific pipeline (identified by unique pipeline ID) can support the following: |
| 11 | +> |
| 12 | +> - only ASR (Speech To Text) |
| 13 | +> - only NMT (Translate) |
| 14 | +> - only TTS |
| 15 | +> - ASR + NMT |
| 16 | +> - NMT + TTS |
| 17 | +> - ASR + NMT + TTS |
| 18 | +> |
| 19 | +> Our R&D institutes can create pipelines using any of the available models on ULCA. |
| 20 | +
|
| 21 | +Basically, computation (STT, TTS, Translate) is done on a "pipeline". A "pipeline" is set to support a list of tasks, in a defined order, like: |
| 22 | + |
| 23 | +- (input: audio) STT -> Translate (output: text) |
| 24 | +- (input: text) Translate -> TTS (output: audio) |
| 25 | + |
| 26 | +In the given examples: |
| 27 | + |
| 28 | +- Case 1 (STT -> Translate): From the given audio clip, the STT model computes text, which is sent automatically to the translate model, and text is returned. |
| 29 | +- Case 2 (Translate -> TTS): From the given text, the translate model computes text, which is sent automatically to the TTS model, and audio is returned. |
| 30 | + |
| 31 | +You can have any combination of these tasks, or just individual ones. You can even have tasks like: |
| 32 | + |
| 33 | +- STT -> Translate -> TTS! |
| 34 | + |
| 35 | +#### Code |
| 36 | + |
| 37 | +So, before we do any computation, we have to set up our pipelines: |
| 38 | + |
| 39 | +```csharp |
| 40 | +using Uralstech.UBhashini; |
| 41 | +using Uralstech.UBhashini.Data; |
| 42 | + |
| 43 | +// This example shows a pipeline configured for a set of tasks which will receive spoken English audio |
| 44 | +// as input, transcribe and translate it to Hindi, and finally convert the text to spoken Hindi audio. |
| 45 | +
|
| 46 | +BhashiniPipelineConfigResponse response = await BhashiniApiManager.Instance.ConfigurePipeline(new BhashiniPipelineTask[] |
| 47 | +{ |
| 48 | + BhashiniPipelineTask.GetConfigurationTask(BhashiniPipelineTaskType.SpeechToText, "en"), // Here, "en" is the source language. |
| 49 | + BhashiniPipelineTask.GetConfigurationTask(BhashiniPipelineTaskType.TextTranslation, "en", "hi"), // Here, "en" is still the source language, but "hi" is the target language. |
| 50 | + BhashiniPipelineTask.GetConfigurationTask(BhashiniPipelineTaskType.TextToSpeech, "hi"), // Here, the source language is "hi". |
| 51 | +}); |
| 52 | +``` |
| 53 | + |
| 54 | +The Bhashini API follows the [*ISO-639*](https://www.loc.gov/standards/iso639-2/php/code_list.php) standard for language codes. |
| 55 | + |
| 56 | +The API wrapper class, `BhashiniApiManager`, usually returns `null` in if a request fails. Check the debug window or logs for errors in such cases. |
| 57 | + |
| 58 | +Now, we store the computation inference data in variables: |
| 59 | + |
| 60 | +```csharp |
| 61 | +BhashiniPipelineInferenceData _inferenceData = response.PipelineEndpoint; |
| 62 | + |
| 63 | +BhashiniPipelineData _sttData = response.PipelineResponseConfig[0].Data[0]; |
| 64 | +BhashiniPipelineData _translateData = response.PipelineResponseConfig[1].Data[0]; |
| 65 | +BhashiniPipelineData _ttsData = response.PipelineResponseConfig[2].Data[0]; |
| 66 | +``` |
| 67 | + |
| 68 | +Here, as we specified the expected source and target languages for each task in the pipeline, we know the order of pipeline configurations in `PipelineResponseConfig`. |
| 69 | +This may not always be the case. It is recommended to check the array of configurations for the desired model(s). |
| 70 | + |
| 71 | +### Computation |
| 72 | + |
| 73 | +Now that we have the inference data and pipelines configured, we can go straight into computation. |
| 74 | + |
| 75 | +#### Code |
| 76 | + |
| 77 | +```csharp |
| 78 | +_audioClip = ... |
| 79 | +_audioSource = ... |
| 80 | + |
| 81 | +BhashiniPipelineTask[] tasks = new BhashiniPipelineTask[] |
| 82 | +{ |
| 83 | + _sttData.GetSpeechToTextTask(), |
| 84 | + _translateData.GetTextTranslateTask(), |
| 85 | + _ttsData.GetTextToSpeechTask(BhashiniVoiceType.Male), |
| 86 | +}; |
| 87 | + |
| 88 | +BhashiniComputeResponse response = await BhashiniApiManager.Instance.ComputeOnPipeline(_inferenceData, tasks, audioSource: _audioClip); |
| 89 | + |
| 90 | +AudioClip result = await response.GetTextToSpeechResult(); |
| 91 | +_audioSource.PlayOneShot(result); |
| 92 | +``` |
| 93 | + |
| 94 | +`ComputeOnPipeline` accepts three optional parameter: |
| 95 | +- `textSource` - This is for text-input-based tasks, like Translate or TTS. |
| 96 | +- `audioSource` - This is for audio-input-based tasks, like STT. This parameter also requires the `Utilities.Encoder.Wav` and `Utilities.Audio` packages. |
| 97 | +- `rawBase64AudioSource` - This is also for audio-input-based tasks, but takes the raw Base64-encoded audio data. You will have to encode your audio manually. |
| 98 | + |
| 99 | +You must only provide one of the parameters at a time, based on the first task given to the pipeline. |
| 100 | + |
| 101 | +Also, `GetSpeechToTextTask` takes an optional `sampleRate` argument. By default, it is 44100, but make sure it matches with your audio data. |
| 102 | + |
| 103 | +`BhashiniComputeResponse` contains three utility functions to help extract the actual text or audio response: |
| 104 | +- `GetSpeechToTextResult` |
| 105 | +- `GetTextTranslateResult` and |
| 106 | +- `GetTextToSpeechResult` |
| 107 | + |
| 108 | +You should call them based on the last task in the pipeline's task list. If your pipeline's last task is STT, use `GetSpeechToTextResult`. |
| 109 | +If the last task is translate, use `GetTextTranslateResult`. |
| 110 | + |
| 111 | +`ComputeOnPipeline` and `GetTextToSpeechResult` will throw `BhashiniAudioIOException` errors if they encounter an unsupported format. |
| 112 | + |
| 113 | +--- |
| 114 | + |
| 115 | +And that's it! You've learnt how to use the Bhashini API in Unity! |
0 commit comments