NVIDIA AgentIQ Agents Evaluator

A comprehensive evaluation platform for AI agents powered by NVIDIA's AgentIQ framework. This dashboard enables detailed analysis and comparison of agent performance across multiple dimensions including RAG capabilities, workflow efficiency, and decision-making accuracy.

🌟 Key Features

🤖 Advanced Agent Evaluation
- RAG (Retrieval-Augmented Generation) performance metrics
- Multi-step reasoning assessment
- Workflow efficiency analysis
- Decision-making accuracy tracking
📊 Interactive Dashboard
- Real-time evaluation monitoring
- Comparative performance visualization
- Detailed agent behavior analysis
- Resource utilization insights
🔍 Comprehensive Metrics
- Agent accuracy and reliability
- Response quality assessment
- Processing efficiency metrics
- Resource optimization tracking
🛠 Flexible Configuration
- Support for multiple agent architectures
- Customizable evaluation scenarios
- Extensible testing frameworks
- Configurable performance thresholds

📊 Dashboard Examples

Main Dashboard Interface

The main interface provides easy access to all evaluation features and real-time monitoring capabilities.

Comparison Overview

The dashboard provides comprehensive model comparison across multiple metrics:

1. Model Performance Comparison

Compare multiple models across key metrics:

Accuracy rates across different models
Response time analysis
Token usage patterns
Overall performance trends

2. Detailed Analysis View

In-depth analysis of individual model performance including:

Confusion matrix visualization
Response length distribution
Prediction breakdown
Sample predictions and errors

RAG Performance Metrics

Detailed analysis of Retrieval-Augmented Generation capabilities:

RAG Accuracy: Measures how accurate the model's responses are when using retrieved information
RAG Groundedness: Shows how well responses are grounded in the retrieved information
RAG Relevance: Indicates how relevant the responses are to input queries

Advanced Analytics

Comprehensive error analysis and performance insights:

Error pattern distribution
Sample error cases
Performance optimization opportunities

Key Performance Indicators

Key metrics tracked for each model:

Total Predictions: Number of evaluations performed
Correct Predictions: Accuracy count
Accuracy Percentage: Success rate
Average Response Length: Output size metrics
Token Usage Statistics: Resource utilization
Processing Time: Latency analysis

📋 Prerequisites

Python 3.8+ (3.10+ recommended)
NVIDIA AI Endpoints API key
AgentIQ framework installation

🚀 Quick Start

1. Clone the Repository

git clone https://github.com/ahsanblock/NVIDIA-AgentIQ-Agents-Evaluator.git
cd NVIDIA-AgentIQ-Agents-Evaluator

2. Set Up Environment

# Create and activate virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

3. Install AgentIQ

Option A: Clone AgentIQ Repository (Recommended)

# Clone AgentIQ in adjacent directory
cd ..
git clone https://github.com/NVIDIA/AgentIQ.git
cd AgentIQ

# Install AgentIQ
python -m venv .venv
source .venv/bin/activate
pip install -e .

# Return to evaluator
cd ../NVIDIA-AgentIQ-Agents-Evaluator

Option B: Use Existing AgentIQ Installation

export AGENTIQ_PATH=/path/to/your/AgentIQ

4. Configure API Access

export NVIDIA_API_KEY=your-api-key-here

💻 Running the Dashboard

Start the evaluation dashboard:

streamlit run app.py

Access the dashboard at http://localhost:8501

📊 Evaluation Capabilities

Agent Performance Metrics

RAG Capabilities
- Accuracy in information retrieval
- Response groundedness
- Context relevance
- Source utilization
Workflow Efficiency
- Task completion rates
- Processing time analysis
- Resource utilization
- Optimization metrics
Decision Making
- Response accuracy
- Reasoning quality
- Error handling
- Edge case management

Customization

Add your own evaluation scenarios using CSV files with the following structure:

body,label,subject
"Content to evaluate...",category,"Subject line"

🤝 Contributing

We welcome contributions! Please feel free to submit pull requests or open issues for improvements.

📜 License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.

🙏 Acknowledgments

NVIDIA for providing the AgentIQ framework
Contributors to the open-source AI community
All developers and researchers advancing agent technology

Powered by NVIDIA AgentIQ
Advancing AI Agent Evaluation and Analysis

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
configs		configs
data		data
images		images
plots		plots
results		results
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
app.py		app.py
create_temp_config.py		create_temp_config.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NVIDIA AgentIQ Agents Evaluator

🌟 Key Features

📊 Dashboard Examples

Main Dashboard Interface

Comparison Overview

1. Model Performance Comparison

2. Detailed Analysis View

RAG Performance Metrics

Advanced Analytics

Key Performance Indicators

📋 Prerequisites

🚀 Quick Start

1. Clone the Repository

2. Set Up Environment

3. Install AgentIQ

Option A: Clone AgentIQ Repository (Recommended)

Option B: Use Existing AgentIQ Installation

4. Configure API Access

💻 Running the Dashboard

📊 Evaluation Capabilities

Agent Performance Metrics

Customization

🤝 Contributing

📜 License

🙏 Acknowledgments

About

Releases

Packages

Languages

ahsanblock/NVIDIA-AgentIQ-Agents-Evaluator

Folders and files

Latest commit

History

Repository files navigation

NVIDIA AgentIQ Agents Evaluator

🌟 Key Features

📊 Dashboard Examples

Main Dashboard Interface

Comparison Overview

1. Model Performance Comparison

2. Detailed Analysis View

RAG Performance Metrics

Advanced Analytics

Key Performance Indicators

📋 Prerequisites

🚀 Quick Start

1. Clone the Repository

2. Set Up Environment

3. Install AgentIQ

Option A: Clone AgentIQ Repository (Recommended)

Option B: Use Existing AgentIQ Installation

4. Configure API Access

💻 Running the Dashboard

📊 Evaluation Capabilities

Agent Performance Metrics

Customization

🤝 Contributing

📜 License

🙏 Acknowledgments

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages