Name		Name	Last commit message	Last commit date
parent directory ..
configs		configs
dataset		dataset
tools		tools
training_pipeline		training_pipeline
.beamignore		.beamignore
.env.example		.env.example
Makefile		Makefile
README.md		README.md
logging.yaml		logging.yaml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

README.md

Training / Fine-tuning Pipeline

Training pipeline that supports fine-tuning an LLM on a proprietary Q&A dataset and storing it in a model registry.

1. Motivation

The best way to specialize an LLM on your specific task is to fine-tune it on a small dataset coupled to your business use case.

In this case, we will use the finance dataset generated using the q_and_a_dataset_generator to specialize the LLM in responding to investing questions.

2. Install

2.1. Dependencies

Main dependencies you have to install yourself:

Python 3.10
Poetry 1.5.1
GNU Make 4.3

Installing all the other dependencies is as easy as running:

make install

For developing run:

make install_dev

Prepare credentials:

cp .env.example .env

--> and complete the .env file with your credentials.

2.2. Beam

optional step in case you want to use Beam

-> Create a Beam account & configure it.

After you have to upload the dataset to a Beam volume:

make upload_dataset_to_beam

3. Usage

3.1. Train

Local

For debugging or to test that everything is working fine, run the following to train the model on a small subset of the dataset:

make dev_train_local

For training on the whole dataset, run the following:

make train_local

Using Beam

Similar to the local training, for debugging or testing, run:

make dev_train_beam

For training on the whole dataset, run:

make train_beam

3.2. Inference

Local

Testing or debugging:

make dev_infer_local

The whole deal:

make infer_local

Using Beam

Testing or debugging:

make dev_infer_beam

The whole deal:

make infer_beam

3.3. PEP8 Linting & Formatting

Check the code for linting issues:

make lint_check

Fix the code for linting issues (note that some issues can't automatically be fixed, so you might need to solve them manually):

make lint_fix

Check the code for formatting issues:

make format_check

Fix the code for formatting issues:

make format_fix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

training_pipeline

training_pipeline

README.md

Training / Fine-tuning Pipeline

Table of Contents

1. Motivation

2. Install

2.1. Dependencies

2.2. Beam

3. Usage

3.1. Train

Local

Using Beam

3.2. Inference

Local

Using Beam

3.3. PEP8 Linting & Formatting

Files

training_pipeline

Directory actions

More options

Directory actions

More options

Latest commit

History

training_pipeline

Folders and files

parent directory

README.md

Training / Fine-tuning Pipeline

Table of Contents

1. Motivation

2. Install

2.1. Dependencies

2.2. Beam

3. Usage

3.1. Train

Local

Using Beam

3.2. Inference

Local

Using Beam

3.3. PEP8 Linting & Formatting