Link here It is class of deep learning architectures called transformer networks.
Why transformer architecture is suited for LLMs.
- positional encodings: Positional encoding embeds the order of which the input occurs within a given sequence. Essentially, instead of feeding words within a sentence sequentially into the neural network, thanks to positional encoding, the words can be fed in non-sequentially.
- self attention: Self-attention assigns a weight to each part of the input data while processing it. This weight signifies the importance of that input in context to the rest of the input.
Main use-cases:
- Generation (story writing).
- Summarization.
- Translation.
- Classification.
- Chatbot.
How do they work? LLM are usually trained through unsupervised learning. With unsupervised learning, models can find previously unknown patterns in data using unsupervised learning. Removes the need for data labelling.
Foundation models : the modes that serves multiple use-cases, does require task specific training to solve a particular task.
The ability for the foundation model to generate text for a wide variety of purposes without much instruction or training is called zero-shot learning.
To make the model behave in certain way, to customize to achieve higher accuracy, there are several ways. Few of them:
- prompt tuning.
- fine tuning.
- adapters.
Open source Models:
- Lamma2
- Claude2
- Vicuna
- Mistral AI (recently).