Model Validation Notebooks

This repository contains a set of predefined validation notebooks to help you validate models based on best practices, with metadata and configuration files to streamline the process. You can get prepopulated configuration files (JSON) for the model you would like to validate from the Vectice UI.

How to Use This Repository

Identify which validation suite to run
This repository is structured to guide you towards the appropriate validation suite based on your specific use case. The folders are organized by validation categories, and each category contains specialized test suites to assess different aspects of your model.
Select the Appropriate Test Suite
Navigate through the folder structure to select the most relevant test suite for your needs. Each test suite is designed for specific validation tasks such as Classification, Explainability, Resilience, Robustness, Model Performance, PD Models, or LLM Use Cases.
Download and Customize as Needed
Once you’ve identified the correct test suite, download the corresponding notebook. If the default notebook doesn’t fully meet your needs, you can modify the provided tests or create your own. The repository also includes example notebooks to help you customize your validation tests.

Folder Structure Overview

The repository is structured around key validation categories, with each category containing specialized test suites to cover different validation needs. Below is a guide to help you choose the right suite based on your use case:

Use Case	Validation Category	Folder	Test Suites Available
Validating model predictions for classification	Classification	`classification`	- `classification_notebook.ipynb`: Covers accuracy, precision, recall, F1-score, AUC.
Ensuring model predictions are interpretable	Explainability	`explainability`	- `shap_explainability_notebook.ipynb`: Uses SHAP for model interpretation. - `lime_explainability.ipynb`: Uses LIME for model interpretation.
Testing model robustness to input variations	Robustness	`robustness`	- `robustness_notebook.ipynb`: Evaluates model sensitivity to small input changes.
Assessing model performance under varied conditions	Resilience	`resilience`	- `resilience_notebook.ipynb`: Measures how well the model maintains performance under stress tests.
Evaluating overall model performance	Model Performance	`performance`	- `performance_notebook.ipynb`: Provides an overview of standard performance metrics.
Validating Probability of Default (PD) models	PD Models	`pd_models`	- `pd_validation_notebook.ipynb`: Tests focused on validating probability of default predictions
Validating Large Language Models (LLMs)	LLM Use Cases	`llm_use_cases`	- `llm_judge_notebook.ipynb`: Uses Giskard to assess the LLM’s ability to judge the accuracy and relevance of its outputs. - `llm_evaluation_notebook.ipynb`: Focuses on evaluating LLM performance on specific tasks like summarization, translation, and question answering. - `llm_bias_detection_notebook.ipynb`: Detects and measures bias in LLM outputs.

Folder Descriptions

Classification
Contains test suites focused on evaluating classification model metrics like accuracy, precision, recall, F1-score, and AUC.
Explainability
Focuses on tests that ensure the model's predictions can be interpreted. Includes notebooks using SHAP and LIME for model explainability.
Robustness
Includes test suites designed to evaluate how sensitive the model is to small changes in the input data.
Resilience
Contains test suites that assess how well the model maintains its performance under varied conditions, such as different data distributions or adversarial scenarios.
Model Performance
Provides a comprehensive evaluation of the model’s performance across multiple metrics, offering a holistic view of its effectiveness.
PD Models
This folder is dedicated to validating Probability of Default (PD) models. It includes tests to assess the calibration, discriminatory power, and overall reliability of PD predictions.
LLM Use Cases
Contains notebooks specifically designed for validating Large Language Models (LLMs). The test suites include:
- LLM as Judge: Using the Giskard library to evaluate the LLM’s ability to judge the accuracy and relevance of its outputs.
- Task-Specific LLM Evaluation: Focused on assessing LLM performance on tasks such as summarization, translation, and question answering.
- LLM Bias Detection: A notebook dedicated to detecting and measuring bias in LLM outputs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Example_readme_structure.md

Example_readme_structure.md

Model Validation Notebooks

How to Use This Repository

Folder Structure Overview

Folder Descriptions

Files

Example_readme_structure.md

Latest commit

History

Example_readme_structure.md

File metadata and controls

Model Validation Notebooks

How to Use This Repository

Folder Structure Overview

Folder Descriptions