black-box-benchmarking

SCARF (System for Comprehensive Assessment of RAG Frameworks) is a modular evaluation framework for benchmarking deployed Retrieval Augmented Generation (RAG) applications. It offers end-to-end, black-box assessment across multiple configurations, supports automated testing with several vector databases and LLMs.

python nlp open-source benchmarking machine-learning ai research-tool evaluation-framework rag black-box-benchmarking llm

Updated Apr 17, 2025
Python

thomasWeise / BBDOB_W_Model

Star

The W-Model, a tunable Black-Box Discrete Optimization Benchmarking (BB-DOB) problem, implemented for the BB-DOB@GECCO Workshop.

benchmark discrete-mathematics noise local-search experiments epistasis evolutionary-algorithm multi-objective deterministic random-walk fitness-landscape combinatorial-optimization neutrality black-box-benchmarking non-separability multi-objectivity ruggedness benchmark-problem hill-climber

Updated Oct 14, 2020
Java

pinouche / bbob_2009

Star

Functions for the BBOB 2009 optimization challenge (https://coco.gforge.inria.fr/).

coco rastrigin black-box-benchmarking bbob rosenbrock

Updated Jan 9, 2020
Jupyter Notebook

Improve this page

Add a description, image, and links to the black-box-benchmarking topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the black-box-benchmarking topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

black-box-benchmarking

Here are 7 public repositories matching this topic...

airbnb / artificial-adversary

optuna / kurobako

optuna / kurobako-py

sile / kurobako-go

Eustema-S-p-A / SCARF

thomasWeise / BBDOB_W_Model

pinouche / bbob_2009

Improve this page

Add this topic to your repo