Skip to content

Commit 002c958

Browse files
authored
Merge pull request #1 from Uralstech/unstable
UBhashini 2.0.0
2 parents 2f5d804 + f8eea2c commit 002c958

File tree

110 files changed

+2493
-1324
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

110 files changed

+2493
-1324
lines changed

.editorconfig

+341
Large diffs are not rendered by default.

.github/FUNDING.yml

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
buy_me_a_coffee: udayshankar

.github/ISSUE_TEMPLATE/bug_report.md

+29
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
---
2+
name: Bug report
3+
about: Create a report to help UBhashini improve
4+
title: "(Summary of the bug)"
5+
labels: bug
6+
assignees: Uralstech
7+
8+
---
9+
10+
**Describe the bug**
11+
A clear and concise description of what the bug is.
12+
13+
**To Reproduce**
14+
Steps to reproduce the behavior:
15+
1. Go to '...'
16+
2. Click on '....'
17+
3. Scroll down to '....'
18+
4. See error
19+
20+
**Expected behavior**
21+
A clear and concise description of what you expected to happen.
22+
23+
**Development environment:**
24+
- Unity version: [e.g. 2022.3, 2020.2]
25+
- Build target: [e.g. iOS, Android, Editor]
26+
- UBhashini version [e.g. 1.1.0]
27+
28+
**Additional context**
29+
Add any other context about the problem here.
+20
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
---
2+
name: Feature request
3+
about: Suggest an idea for this project
4+
title: "(Summary of the feature)"
5+
labels: enhancement
6+
assignees: Uralstech
7+
8+
---
9+
10+
**Is your feature request related to a problem? Please describe.**
11+
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
12+
13+
**Describe the solution you'd like**
14+
A clear and concise description of what you want to happen.
15+
16+
**Describe alternatives you've considered**
17+
A clear and concise description of any alternative solutions or features you've considered.
18+
19+
**Additional context**
20+
Add any other context about the feature request here.

.github/ISSUE_TEMPLATE/question.md

+17
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
---
2+
name: Question
3+
about: Ask a question about UBhashini
4+
title: "(Summary of the question)"
5+
labels: question
6+
assignees: Uralstech
7+
8+
---
9+
10+
**Question**
11+
What is your question about UBhashini?
12+
13+
**Context**
14+
Provide any context or background information that might be relevant to your question.
15+
16+
**Additional Information**
17+
Add any other details that might help in answering your question.
+43
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
# Trigger the action on push to master
2+
on:
3+
push:
4+
branches:
5+
- master
6+
7+
# Sets permissions of the GITHUB_TOKEN to allow deployment to GitHub Pages
8+
permissions:
9+
actions: read
10+
pages: write
11+
id-token: write
12+
13+
# Allow only one concurrent deployment, skipping runs queued between the run in-progress and latest queued.
14+
# However, do NOT cancel in-progress runs as we want to allow these production deployments to complete.
15+
concurrency:
16+
group: "pages"
17+
cancel-in-progress: false
18+
19+
jobs:
20+
publish-docs:
21+
environment:
22+
name: github-pages
23+
url: ${{ steps.deployment.outputs.page_url }}
24+
runs-on: ubuntu-latest
25+
steps:
26+
- name: Checkout
27+
uses: actions/checkout@v3
28+
- name: Dotnet Setup
29+
uses: actions/setup-dotnet@v3
30+
with:
31+
dotnet-version: 8.x
32+
33+
- run: dotnet tool update -g docfx
34+
- run: docfx Documentation/docfx.json
35+
36+
- name: Upload artifact
37+
uses: actions/upload-pages-artifact@v3
38+
with:
39+
# Upload entire repository
40+
path: 'Documentation/_site'
41+
- name: Deploy to GitHub Pages
42+
id: deployment
43+
uses: actions/deploy-pages@v4

.github/workflows/upm-subtree-split.yml

+2
Original file line numberDiff line numberDiff line change
@@ -18,3 +18,5 @@ jobs:
1818
fetch-depth: 0
1919

2020
- uses: RageAgainstThePixel/upm-subtree-split@v1.1
21+
with:
22+
package-root: "**/Packages/com.uralstech.ubhashini"

CODE_OF_CONDUCT.md

+128
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
# Contributor Covenant Code of Conduct
2+
3+
## Our Pledge
4+
5+
We as members, contributors, and leaders pledge to make participation in our
6+
community a harassment-free experience for everyone, regardless of age, body
7+
size, visible or invisible disability, ethnicity, sex characteristics, gender
8+
identity and expression, level of experience, education, socio-economic status,
9+
nationality, personal appearance, race, religion, or sexual identity
10+
and orientation.
11+
12+
We pledge to act and interact in ways that contribute to an open, welcoming,
13+
diverse, inclusive, and healthy community.
14+
15+
## Our Standards
16+
17+
Examples of behavior that contributes to a positive environment for our
18+
community include:
19+
20+
* Demonstrating empathy and kindness toward other people
21+
* Being respectful of differing opinions, viewpoints, and experiences
22+
* Giving and gracefully accepting constructive feedback
23+
* Accepting responsibility and apologizing to those affected by our mistakes,
24+
and learning from the experience
25+
* Focusing on what is best not just for us as individuals, but for the
26+
overall community
27+
28+
Examples of unacceptable behavior include:
29+
30+
* The use of sexualized language or imagery, and sexual attention or
31+
advances of any kind
32+
* Trolling, insulting or derogatory comments, and personal or political attacks
33+
* Public or private harassment
34+
* Publishing others' private information, such as a physical or email
35+
address, without their explicit permission
36+
* Other conduct which could reasonably be considered inappropriate in a
37+
professional setting
38+
39+
## Enforcement Responsibilities
40+
41+
Community leaders are responsible for clarifying and enforcing our standards of
42+
acceptable behavior and will take appropriate and fair corrective action in
43+
response to any behavior that they deem inappropriate, threatening, offensive,
44+
or harmful.
45+
46+
Community leaders have the right and responsibility to remove, edit, or reject
47+
comments, commits, code, wiki edits, issues, and other contributions that are
48+
not aligned to this Code of Conduct, and will communicate reasons for moderation
49+
decisions when appropriate.
50+
51+
## Scope
52+
53+
This Code of Conduct applies within all community spaces, and also applies when
54+
an individual is officially representing the community in public spaces.
55+
Examples of representing our community include using an official e-mail address,
56+
posting via an official social media account, or acting as an appointed
57+
representative at an online or offline event.
58+
59+
## Enforcement
60+
61+
Instances of abusive, harassing, or otherwise unacceptable behavior may be
62+
reported to the community leaders responsible for enforcement at
63+
uralstech@gmail.com.
64+
All complaints will be reviewed and investigated promptly and fairly.
65+
66+
All community leaders are obligated to respect the privacy and security of the
67+
reporter of any incident.
68+
69+
## Enforcement Guidelines
70+
71+
Community leaders will follow these Community Impact Guidelines in determining
72+
the consequences for any action they deem in violation of this Code of Conduct:
73+
74+
### 1. Correction
75+
76+
**Community Impact**: Use of inappropriate language or other behavior deemed
77+
unprofessional or unwelcome in the community.
78+
79+
**Consequence**: A private, written warning from community leaders, providing
80+
clarity around the nature of the violation and an explanation of why the
81+
behavior was inappropriate. A public apology may be requested.
82+
83+
### 2. Warning
84+
85+
**Community Impact**: A violation through a single incident or series
86+
of actions.
87+
88+
**Consequence**: A warning with consequences for continued behavior. No
89+
interaction with the people involved, including unsolicited interaction with
90+
those enforcing the Code of Conduct, for a specified period of time. This
91+
includes avoiding interactions in community spaces as well as external channels
92+
like social media. Violating these terms may lead to a temporary or
93+
permanent ban.
94+
95+
### 3. Temporary Ban
96+
97+
**Community Impact**: A serious violation of community standards, including
98+
sustained inappropriate behavior.
99+
100+
**Consequence**: A temporary ban from any sort of interaction or public
101+
communication with the community for a specified period of time. No public or
102+
private interaction with the people involved, including unsolicited interaction
103+
with those enforcing the Code of Conduct, is allowed during this period.
104+
Violating these terms may lead to a permanent ban.
105+
106+
### 4. Permanent Ban
107+
108+
**Community Impact**: Demonstrating a pattern of violation of community
109+
standards, including sustained inappropriate behavior, harassment of an
110+
individual, or aggression toward or disparagement of classes of individuals.
111+
112+
**Consequence**: A permanent ban from any sort of public interaction within
113+
the community.
114+
115+
## Attribution
116+
117+
This Code of Conduct is adapted from the [Contributor Covenant][homepage],
118+
version 2.0, available at
119+
https://www.contributor-covenant.org/version/2/0/code_of_conduct.html.
120+
121+
Community Impact Guidelines were inspired by [Mozilla's code of conduct
122+
enforcement ladder](https://github.com/mozilla/diversity).
123+
124+
[homepage]: https://www.contributor-covenant.org
125+
126+
For answers to common questions about this code of conduct, see the FAQ at
127+
https://www.contributor-covenant.org/faq. Translations are available at
128+
https://www.contributor-covenant.org/translations.

Documentation/.gitignore

+2
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
api/
2+
_site/

Documentation/DocSource/QuickStart.md

+137
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,137 @@
1+
# Quick Start
2+
3+
## Setup
4+
5+
Add an instance of [`BhashiniManager`](~/api/Uralstech.UBhashini.BhashiniManager.yml) to your scene, and set it up with your ULCA User ID and API key, as detailed in the [*Bhashini documentation*](https://bhashini.gitbook.io/bhashini-apis/pre-requisites-and-onboarding).
6+
7+
## Pipelines
8+
9+
As per the [*Bhashini documentation*](https://bhashini.gitbook.io/bhashini-apis):
10+
> ULCA Pipeline is a set of tasks that any specific pipeline supports. For example, any specific pipeline (identified by unique pipeline ID) can support the following:
11+
>
12+
> - only ASR (Speech To Text)
13+
> - only NMT (Translate)
14+
> - only TTS
15+
> - ASR + NMT
16+
> - NMT + TTS
17+
> - ASR + NMT + TTS
18+
>
19+
> Our R&D institutes can create pipelines using any of the available models on ULCA.
20+
21+
Basically, computation (STT, TTS, Translate) is done on a "pipeline". A "pipeline" is set to support a list of tasks, in a defined order, like:
22+
23+
- (input: audio) STT -> Translate (output: text)
24+
- (input: text) Translate -> TTS (output: audio)
25+
26+
In the given examples:
27+
28+
- Case 1 (STT -> Translate): From the given audio clip, the STT model computes text, which is sent automatically to the translate model, and text is returned.
29+
- Case 2 (Translate -> TTS): From the given text, the translate model computes text, which is sent automatically to the TTS model, and audio is returned.
30+
31+
You can have any combination of these tasks, or just individual ones. You can even have tasks like:
32+
33+
- STT -> Translate -> TTS!
34+
35+
### Code
36+
37+
So, before we do any computation, we have to set up our pipelines:
38+
39+
```csharp
40+
using Uralstech.UBhashini;
41+
using Uralstech.UBhashini.Data;
42+
using Uralstech.UBhashini.Data.Pipeline;
43+
44+
// This example shows a pipeline configured for a set of tasks which will receive spoken English audio
45+
// as input, transcribe and translate it to Hindi, and finally convert the text to spoken Hindi audio.
46+
47+
BhashiniPipelineResponse pipeline = await BhashiniManager.Instance.ConfigurePipeline(
48+
new BhashiniPipelineRequestTask(BhashiniTask.SpeechToText, "en"), // Here, "en" is the source language.
49+
new BhashiniPipelineRequestTask(BhashiniTask.Translation, "en", "hi"), // Here, "en" is still the source language, but "hi" is the target language.
50+
new BhashiniPipelineRequestTask(BhashiniTask.TextToSpeech, "hi") // Here, the source language is "hi".
51+
);
52+
```
53+
54+
> The Bhashini API follows the [*ISO-639*](https://www.loc.gov/standards/iso639-2/php/code_list.php) standard for language codes.
55+
>
56+
> The UBhashini throws `BhashiniAudioIOException` and `BhashiniRequestException` errors when requests fail. Be sure to add try-catch blocks in your code!
57+
58+
Now, we store the computation inference data in variables:
59+
60+
```csharp
61+
BhashiniPipelineInferenceEndpoint endpoint = pipeline.InferenceEndpoint;
62+
63+
BhashiniPipelineTaskConfiguration sttTaskConfig = pipeline.PipelineConfigurations[0].Configurations[0];
64+
BhashiniPipelineTaskConfiguration translateTaskConfig = pipeline.PipelineConfigurations[1].Configurations[0];
65+
BhashiniPipelineTaskConfiguration ttsTaskConfig = pipeline.PipelineConfigurations[2].Configurations[0];
66+
```
67+
68+
Here, as we specified the expected source and target languages for each task in the pipeline, it is very likely that the `Configurations`
69+
array in `PipelineConfigurations` will only contain one `BhashiniPipelineTaskConfiguration` object. This may not always be the case, so
70+
it is recommended to check the array of configurations for the desired model(s). The order of `PipelineConfigurations` is based
71+
on the order of the tasks array in the input for `ConfigurePipeline`.
72+
73+
You can also use the following shortcuts to get the task configs.
74+
75+
```csharp
76+
BhashiniPipelineTaskConfiguration sttTaskConfig = pipeline.SpeechToTextConfiguration.First;
77+
BhashiniPipelineTaskConfiguration translateTaskConfig = pipeline.TranslateConfiguration.First;
78+
BhashiniPipelineTaskConfiguration ttsTaskConfig = pipeline.TextToSpeechConfiguration.First;
79+
```
80+
81+
`SpeechToTextConfiguration`, `TranslateConfiguration` and `TextToSpeechConfiguration` will get the first config matching the task type in
82+
the response and `First` gets the first available `BhashiniPipelineTaskConfiguration` from its `Configurations` array.
83+
84+
## Computation
85+
86+
Now that we have the inference data and pipelines configured, we can go straight into computation.
87+
88+
### Code
89+
90+
```csharp
91+
using Uralstech.UBhashini.Data.Compute;
92+
93+
// The below code records a 10 seconds audio clip from the device microphone.
94+
95+
int sampleRate = AudioSettings.GetConfiguration().sampleRate;
96+
AudioClip audioInput = Microphone.Start(string.Empty, false, 10, sampleRate);
97+
98+
while (Microphone.IsRecording(string.Empty))
99+
await Task.Yield();
100+
101+
if (!TryGetComponent(out AudioSource audioSource))
102+
audioSource = gameObject.AddComponent<AudioSource>();
103+
104+
// Now, we send the clip to Bhashini.
105+
106+
BhashiniComputeResponse computedResults = await BhashiniManager.Instance.ComputeOnPipeline(endpoint,
107+
new BhashiniInputData(audioInput),
108+
sttTaskConfig.ToSpeechToTextTask(sampleRate: sampleRate),
109+
translateTaskConfig.ToTranslateTask(),
110+
ttsTaskConfig.ToTextToSpeechTask()
111+
);
112+
113+
audioSource.PlayOneShot(await computedResults.GetTextToSpeechResult());
114+
```
115+
116+
`BhashiniInputData` has two constructors:
117+
- `BhashiniInputData(string text = null, string audio = null)`
118+
- Here, `text` is input for translation and TTS requests, and `audio` is base64-encoded audio for STT requests. Only provide one.
119+
- `BhashiniInputData(AudioClip audio, BhashiniAudioFormat audioFormat = BhashiniAudioFormat.Wav)`
120+
- This constructor allows you to directly provide an `AudioClip` as input, but requires the `Utilities.Encoder.Wav` or
121+
`Utilities.Audio` packages, based on the chosen encoding.
122+
123+
Also, `ToSpeechToTextTask` takes an optional `sampleRate` argument. By default, it is 44100, but make sure it matches with your audio data.
124+
125+
`BhashiniComputeResponse` contains three utility functions to help extract the actual text or audio response:
126+
- `GetSpeechToTextResult`
127+
- `GetTranslateResult` and
128+
- `GetTextToSpeechResult`
129+
130+
You should call them based on the last task in the pipeline's task list. If your pipeline's last task is STT, use `GetSpeechToTextResult`.
131+
If the last task is translate, use `GetTranslateResult`. If it's TTS, use `GetTextToSpeechResult`.
132+
133+
`ComputeOnPipeline` and `GetTextToSpeechResult` will throw `BhashiniAudioIOException` errors if they encounter an unsupported format.
134+
135+
---
136+
137+
And that's it! You've learnt how to use the Bhashini API in Unity!

0 commit comments

Comments
 (0)