|
| 1 | +# Quick Start |
| 2 | + |
| 3 | +## Setup |
| 4 | + |
| 5 | +Add an instance of [`BhashiniManager`](~/api/Uralstech.UBhashini.BhashiniManager.yml) to your scene, and set it up with your ULCA User ID and API key, as detailed in the [*Bhashini documentation*](https://bhashini.gitbook.io/bhashini-apis/pre-requisites-and-onboarding). |
| 6 | + |
| 7 | +## Pipelines |
| 8 | + |
| 9 | +As per the [*Bhashini documentation*](https://bhashini.gitbook.io/bhashini-apis): |
| 10 | +> ULCA Pipeline is a set of tasks that any specific pipeline supports. For example, any specific pipeline (identified by unique pipeline ID) can support the following: |
| 11 | +> |
| 12 | +> - only ASR (Speech To Text) |
| 13 | +> - only NMT (Translate) |
| 14 | +> - only TTS |
| 15 | +> - ASR + NMT |
| 16 | +> - NMT + TTS |
| 17 | +> - ASR + NMT + TTS |
| 18 | +> |
| 19 | +> Our R&D institutes can create pipelines using any of the available models on ULCA. |
| 20 | +
|
| 21 | +Basically, computation (STT, TTS, Translate) is done on a "pipeline". A "pipeline" is set to support a list of tasks, in a defined order, like: |
| 22 | + |
| 23 | +- (input: audio) STT -> Translate (output: text) |
| 24 | +- (input: text) Translate -> TTS (output: audio) |
| 25 | + |
| 26 | +In the given examples: |
| 27 | + |
| 28 | +- Case 1 (STT -> Translate): From the given audio clip, the STT model computes text, which is sent automatically to the translate model, and text is returned. |
| 29 | +- Case 2 (Translate -> TTS): From the given text, the translate model computes text, which is sent automatically to the TTS model, and audio is returned. |
| 30 | + |
| 31 | +You can have any combination of these tasks, or just individual ones. You can even have tasks like: |
| 32 | + |
| 33 | +- STT -> Translate -> TTS! |
| 34 | + |
| 35 | +### Code |
| 36 | + |
| 37 | +So, before we do any computation, we have to set up our pipelines: |
| 38 | + |
| 39 | +```csharp |
| 40 | +using Uralstech.UBhashini; |
| 41 | +using Uralstech.UBhashini.Data; |
| 42 | +using Uralstech.UBhashini.Data.Pipeline; |
| 43 | + |
| 44 | +// This example shows a pipeline configured for a set of tasks which will receive spoken English audio |
| 45 | +// as input, transcribe and translate it to Hindi, and finally convert the text to spoken Hindi audio. |
| 46 | +
|
| 47 | +BhashiniPipelineResponse pipeline = await BhashiniManager.Instance.ConfigurePipeline( |
| 48 | + new BhashiniPipelineRequestTask(BhashiniTask.SpeechToText, "en"), // Here, "en" is the source language. |
| 49 | + new BhashiniPipelineRequestTask(BhashiniTask.Translation, "en", "hi"), // Here, "en" is still the source language, but "hi" is the target language. |
| 50 | + new BhashiniPipelineRequestTask(BhashiniTask.TextToSpeech, "hi") // Here, the source language is "hi". |
| 51 | +); |
| 52 | +``` |
| 53 | + |
| 54 | +> The Bhashini API follows the [*ISO-639*](https://www.loc.gov/standards/iso639-2/php/code_list.php) standard for language codes. |
| 55 | +> |
| 56 | +> The UBhashini throws `BhashiniAudioIOException` and `BhashiniRequestException` errors when requests fail. Be sure to add try-catch blocks in your code! |
| 57 | +
|
| 58 | +Now, we store the computation inference data in variables: |
| 59 | + |
| 60 | +```csharp |
| 61 | +BhashiniPipelineInferenceEndpoint endpoint = pipeline.InferenceEndpoint; |
| 62 | + |
| 63 | +BhashiniPipelineTaskConfiguration sttTaskConfig = pipeline.PipelineConfigurations[0].Configurations[0]; |
| 64 | +BhashiniPipelineTaskConfiguration translateTaskConfig = pipeline.PipelineConfigurations[1].Configurations[0]; |
| 65 | +BhashiniPipelineTaskConfiguration ttsTaskConfig = pipeline.PipelineConfigurations[2].Configurations[0]; |
| 66 | +``` |
| 67 | + |
| 68 | +Here, as we specified the expected source and target languages for each task in the pipeline, it is very likely that the `Configurations` |
| 69 | +array in `PipelineConfigurations` will only contain one `BhashiniPipelineTaskConfiguration` object. This may not always be the case, so |
| 70 | +it is recommended to check the array of configurations for the desired model(s). The order of `PipelineConfigurations` is based |
| 71 | +on the order of the tasks array in the input for `ConfigurePipeline`. |
| 72 | + |
| 73 | +You can also use the following shortcuts to get the task configs. |
| 74 | + |
| 75 | +```csharp |
| 76 | +BhashiniPipelineTaskConfiguration sttTaskConfig = pipeline.SpeechToTextConfiguration.First; |
| 77 | +BhashiniPipelineTaskConfiguration translateTaskConfig = pipeline.TranslateConfiguration.First; |
| 78 | +BhashiniPipelineTaskConfiguration ttsTaskConfig = pipeline.TextToSpeechConfiguration.First; |
| 79 | +``` |
| 80 | + |
| 81 | +`SpeechToTextConfiguration`, `TranslateConfiguration` and `TextToSpeechConfiguration` will get the first config matching the task type in |
| 82 | +the response and `First` gets the first available `BhashiniPipelineTaskConfiguration` from its `Configurations` array. |
| 83 | + |
| 84 | +## Computation |
| 85 | + |
| 86 | +Now that we have the inference data and pipelines configured, we can go straight into computation. |
| 87 | + |
| 88 | +### Code |
| 89 | + |
| 90 | +```csharp |
| 91 | +using Uralstech.UBhashini.Data.Compute; |
| 92 | + |
| 93 | +// The below code records a 10 seconds audio clip from the device microphone. |
| 94 | +
|
| 95 | +int sampleRate = AudioSettings.GetConfiguration().sampleRate; |
| 96 | +AudioClip audioInput = Microphone.Start(string.Empty, false, 10, sampleRate); |
| 97 | + |
| 98 | +while (Microphone.IsRecording(string.Empty)) |
| 99 | + await Task.Yield(); |
| 100 | + |
| 101 | +if (!TryGetComponent(out AudioSource audioSource)) |
| 102 | + audioSource = gameObject.AddComponent<AudioSource>(); |
| 103 | + |
| 104 | +// Now, we send the clip to Bhashini. |
| 105 | +
|
| 106 | +BhashiniComputeResponse computedResults = await BhashiniManager.Instance.ComputeOnPipeline(endpoint, |
| 107 | + new BhashiniInputData(audioInput), |
| 108 | + sttTaskConfig.ToSpeechToTextTask(sampleRate: sampleRate), |
| 109 | + translateTaskConfig.ToTranslateTask(), |
| 110 | + ttsTaskConfig.ToTextToSpeechTask() |
| 111 | +); |
| 112 | + |
| 113 | +audioSource.PlayOneShot(await computedResults.GetTextToSpeechResult()); |
| 114 | +``` |
| 115 | + |
| 116 | +`BhashiniInputData` has two constructors: |
| 117 | +- `BhashiniInputData(string text = null, string audio = null)` |
| 118 | + - Here, `text` is input for translation and TTS requests, and `audio` is base64-encoded audio for STT requests. Only provide one. |
| 119 | +- `BhashiniInputData(AudioClip audio, BhashiniAudioFormat audioFormat = BhashiniAudioFormat.Wav)` |
| 120 | + - This constructor allows you to directly provide an `AudioClip` as input, but requires the `Utilities.Encoder.Wav` or |
| 121 | + `Utilities.Audio` packages, based on the chosen encoding. |
| 122 | + |
| 123 | +Also, `ToSpeechToTextTask` takes an optional `sampleRate` argument. By default, it is 44100, but make sure it matches with your audio data. |
| 124 | + |
| 125 | +`BhashiniComputeResponse` contains three utility functions to help extract the actual text or audio response: |
| 126 | +- `GetSpeechToTextResult` |
| 127 | +- `GetTranslateResult` and |
| 128 | +- `GetTextToSpeechResult` |
| 129 | + |
| 130 | +You should call them based on the last task in the pipeline's task list. If your pipeline's last task is STT, use `GetSpeechToTextResult`. |
| 131 | +If the last task is translate, use `GetTranslateResult`. If it's TTS, use `GetTextToSpeechResult`. |
| 132 | + |
| 133 | +`ComputeOnPipeline` and `GetTextToSpeechResult` will throw `BhashiniAudioIOException` errors if they encounter an unsupported format. |
| 134 | + |
| 135 | +--- |
| 136 | + |
| 137 | +And that's it! You've learnt how to use the Bhashini API in Unity! |
0 commit comments