Skip to content

OWASP/www-project-artificial-intelligence-vulnerability-scoring-system

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Artificial Intelligence Vulnerability Scoring System (AIVSS)

1. Introduction

The Artificial Intelligence Vulnerability Scoring System (AIVSS) provides a standardized, comprehensive framework for evaluating and quantifying the security risks associated with AI systems. This framework focuses particularly on Large Language Models (LLMs) and cloud-based deployments, while remaining applicable to a wide range of AI systems. AIVSS adapts traditional security vulnerability scoring concepts to the unique characteristics and challenges of AI, drawing insights from leading AI threat taxonomies and security standards. This document outlines the AIVSS framework, including detailed scoring rubrics, an implementation checklist, and considerations for environmental factors.

2. The Need for AIVSS

Traditional vulnerability scoring systems, such as the Common Vulnerability Scoring System (CVSS), are insufficient for addressing the unique security challenges posed by AI systems. These challenges include:

  • Adversarial Attacks: AI systems are vulnerable to adversarial attacks that manipulate model behavior through crafted inputs, a threat not adequately captured by traditional systems.
  • Model Degradation: AI models can degrade over time due to concept drift or data poisoning, impacting their accuracy and reliability.
  • Lifecycle Vulnerabilities: AI systems have complex lifecycles, from data collection and training to deployment and maintenance, each stage introducing potential vulnerabilities.
  • Ethical and Societal Impacts: AI systems can have significant ethical and societal implications, such as bias and discrimination, which are not considered in traditional security assessments.
  • Dynamic Nature of AI: AI systems are often dynamic and adaptive, making static scoring methods less effective.

AIVSS addresses these challenges by providing a comprehensive framework tailored to the specific security risks of AI.

3. Framework Components

AIVSS consists of the following key components:

3.1. Base Metrics

Base Metrics capture the fundamental characteristics of a vulnerability that are constant over time and across different environments.

  • Attack Vector (AV): Reflects the context by which vulnerability exploitation is possible.
    • Network (N): 0.85
    • Adjacent Network (A): 0.62
    • Local (L): 0.55
    • Physical (P): 0.2
  • Attack Complexity (AC): Measures the conditions beyond the attacker's control that must exist to exploit the vulnerability.
    • Low (L): 0.77
    • High (H): 0.44
  • Privileges Required (PR): Describes the level of privileges an attacker must possess before successfully exploiting the vulnerability.
    • None (N): 0.85
    • Low (L): 0.62
    • High (H): 0.27
  • User Interaction (UI): Captures the requirement for a user, other than the attacker, to participate in the successful compromise of the vulnerable component.
    • None (N): 0.85
    • Required (R): 0.62
  • Scope (S): Measures whether a vulnerability in one vulnerable component impacts resources in components beyond its security scope.
    • Unchanged (U): 1.0
    • Changed (C): 1.5

3.2. AI-Specific Metrics

AI-Specific Metrics capture the unique vulnerabilities and risks associated with AI systems. These metrics are evaluated based on a detailed scoring rubric (provided in Section 4).

AISpecificMetrics = [MR × DS × EI × DC × AD × AA × LL × GV × CS] × ModelComplexityMultiplier
  • MR (Model Robustness): Assesses the system's resilience to adversarial attacks and model degradation.
  • DS (Data Sensitivity): Evaluates the risks associated with the confidentiality, integrity, and provenance of the data used by the AI system.
  • EI (Ethical Implications): Considers potential biases, transparency issues, accountability concerns, and societal impacts.
  • DC (Decision Criticality): Measures the potential consequences of incorrect or malicious decisions made by the AI system.
  • AD (Adaptability): Assesses the system's ability to adapt to evolving threats and maintain security over time.
  • AA (Adversarial Attack Surface): Evaluates the system's exposure to various adversarial attack techniques.
  • LL (Lifecycle Vulnerabilities): Considers security risks at different stages of the AI system's lifecycle.
  • GV (Governance and Validation): Assesses the presence and effectiveness of governance mechanisms and validation processes.
  • CS (Cloud Security Alliance LLM Taxonomy): Addresses specific threats to LLMs in cloud environments, as defined by the CSA LLM Threat Taxonomy.
  • ModelComplexityMultiplier: A factor that adjusts the AI-Specific Metrics score based on the complexity of the AI model (ranging from 1.0 for simple models to 1.5 for highly complex models).

3.3. Environmental Metrics

Environmental Metrics reflect the characteristics of the AI system's deployment environment that can influence the overall risk.

  • Confidentiality Requirement (CR): Measures the importance of maintaining the confidentiality of the data processed by the AI system.
  • Integrity Requirement (IR): Measures the importance of maintaining the integrity of the data and the AI system's outputs.
  • Availability Requirement (AR): Measures the importance of ensuring the AI system's availability for its intended purpose.
  • Societal Impact Requirement (SIR): Measures the importance of mitigating potential negative societal impacts of the AI system.

These are rated:

  • Not Defined (X): 1.0 (default, does not modify the score)
  • Low (L): 0.5
  • Medium (M): 1.0
  • High (H): 1.5

3.4. Modified Base Metrics

These metrics are based on the Base Metrics but can be modified according to the specific environment:

  • Modified Attack Vector (MAV)
  • Modified Attack Complexity (MAC)
  • Modified Privileges Required (MPR)
  • Modified User Interaction (MUI)
  • Modified Scope (MS)

These metrics are rated the same way as Base Metrics, with the addition of:

  • Not Defined (X): Uses the unmodified Base Metric value.

4. Detailed Scoring Rubric for AI-Specific Metrics (Higher Score = More Severe)

Each sub-category within the AI-Specific Metrics is scored on a scale of 0.0 to 1.0, with the following general interpretation:

  • 0.0: No Known Vulnerability: Indicates no known vulnerability or a formally proven resistance to the specific threat.
  • 0.1 - 0.3: Low Vulnerability: Indicates a low vulnerability with strong mitigation in place, but some minor weaknesses may still exist.
  • 0.4 - 0.6: Medium Vulnerability: Indicates a moderate vulnerability with some mitigation, but significant weaknesses remain.
  • 0.7 - 1.0: Critical/High Vulnerability: Indicates a severe vulnerability with little to no mitigation in place.

MR (Model Robustness)

  • Evasion Resistance

    • 0.0: Formally verified robustness against a wide range of evasion attacks.
    • 0.1-0.3: Robust to most known evasion attacks, multiple defense mechanisms employed (e.g., adversarial training, input sanitization, certified robustness).
    • 0.4-0.6: Susceptible to some evasion attacks, basic adversarial training or input validation in place.
    • 0.7-1.0: Highly susceptible to common evasion attacks (e.g., FGSM, PGD). No or minimal defenses.
    • Examples:
      • 0.0: Model's robustness proven through formal methods.
      • 0.2: Model uses a combination of adversarial training, input filtering, and certified robustness techniques.
      • 0.5: Model trained with adversarial examples, but still vulnerable to more sophisticated attacks.
      • 0.8: Model easily fooled by adding small perturbations to input images.
  • Gradient Masking/Obfuscation

    • 0.0: Gradients are completely hidden or formally proven to be unrecoverable.
    • 0.1-0.3: Strong gradient masking techniques used (e.g., Shattered Gradients, Thermometer Encoding), making gradient-based attacks significantly more difficult.
    • 0.4-0.6: Basic gradient obfuscation methods employed (e.g., adding noise), but gradients can still be partially recovered.
    • 0.7-1.0: Gradients are easily accessible and interpretable, no masking techniques used.
    • Examples:
      • 0.0: Model uses homomorphic encryption or other methods to make gradients completely inaccessible.
      • 0.2: Model uses advanced techniques like shattered gradients to make gradient-based attacks computationally expensive.
      • 0.5: Some noise is added to gradients, but they still reveal information about the model.
      • 0.9: Model's gradients can be easily calculated and visualized.
  • Robustness Certification

    • 0.0: Formal robustness certification obtained from a reputable third-party organization.
    • 0.1-0.3: Rigorous robustness testing against a wide range of attacks and using multiple metrics (e.g., CLEVER, Robustness Gym).
    • 0.4-0.6: Basic robustness testing against a limited set of attacks or using simple metrics.
    • 0.7-1.0: No robustness testing performed.
    • Examples:
      • 0.0: Model certified by a recognized certification body for robustness against specific attack types.
      • 0.2: Model evaluated using a comprehensive robustness testing framework like Robustness Gym.
      • 0.5: Model tested against FGSM attacks with a limited range of perturbation budgets.
      • 0.8: No testing for robustness against adversarial examples.

DS (Data Sensitivity)

  • Data Confidentiality

    • 0.0: Data fully anonymized using techniques like differential privacy or homomorphic encryption.
    • 0.1-0.3: Strong encryption (e.g., AES-256) used for data at rest and in transit, strict access controls and key management practices in place.
    • 0.4-0.6: Sensitive data with basic access controls (e.g., passwords), but no encryption.
    • 0.7-1.0: Highly sensitive data (e.g., PII, financial data) stored or processed with no or minimal protection.
    • Examples:
      • 0.0: Data is fully anonymized and provably unlinkable to individuals.
      • 0.2: Data encrypted at rest and in transit, with strict access controls and key rotation policies.
      • 0.5: Data access restricted by user roles, but data is stored in plain text.
      • 0.9: Training data includes unencrypted PII accessible to all developers.
  • Data Integrity

    • 0.0: Data integrity formally verified using techniques like blockchain or Merkle trees.
    • 0.1-0.3: Strong integrity checks (e.g., digital signatures, cryptographic hashes) and tamper detection mechanisms in place.
    • 0.4-0.6: Basic integrity checks (e.g., checksums) used, but no tamper-proof mechanisms.
    • 0.7-1.0: No data integrity checks, data can be easily modified without detection.
    • Examples:
      • 0.0: Data is stored on a blockchain, ensuring immutability and tamper-proof integrity.
      • 0.2: Data is digitally signed, and any modification is detected and alerted.
      • 0.5: Checksums used to verify data integrity upon access.
      • 0.8: Data can be altered without any detection.
  • Data Provenance

    • 0.0: Data provenance formally verified and auditable, with mechanisms to ensure the authenticity and trustworthiness of the data source.
    • 0.1-0.3: Detailed data lineage tracked, including all transformations and processing steps, with a clear audit trail.
    • 0.4-0.6: Basic information about data sources available, but lineage is incomplete or unclear.
    • 0.7-1.0: No information about data origin, collection methods, or transformations.
    • Examples:
      • 0.0: Data provenance is cryptographically verified and tamper-proof.
      • 0.2: Full data lineage is tracked, including all processing steps and data owners.
      • 0.5: Data sources are documented, but the transformations applied are not clearly recorded.
      • 0.9: Origin and collection method of the data are unknown.

EI (Ethical Implications)

  • Bias and Discrimination

    • 0.0: System demonstrably fair and unbiased across different groups, with ongoing monitoring and auditing for bias.
    • 0.1-0.3: Rigorous fairness testing using multiple metrics (e.g., equal opportunity, predictive rate parity) and bias mitigation techniques applied (e.g., re-weighting, adversarial debiasing).
    • 0.4-0.6: Some awareness of potential bias, basic fairness metrics (e.g., demographic parity) monitored, but no active mitigation.
    • 0.7-1.0: High risk of discriminatory outcomes, no bias detection or mitigation methods used.
    • Examples:
      • 0.0: System's fairness is formally verified and continuously monitored.
      • 0.2: System is trained using techniques like adversarial debiasing and regularly audited for fairness.
      • 0.5: Fairness metrics are monitored, but no actions are taken to address identified biases.
      • 0.9: System consistently produces biased outputs against certain demographic groups.
  • Transparency and Explainability

    • 0.0: System's decision-making process is fully transparent and formally explainable, with clear causal relationships established.
    • 0.1-0.3: Highly explainable, system uses inherently interpretable models (e.g., decision trees) or provides reliable and comprehensive explanations for all decisions.
    • 0.4-0.6: Limited explainability, some post-hoc explanations (e.g., LIME, SHAP) can be generated, but they may not be reliable or comprehensive.
    • 0.7-1.0: Black-box system, no insight into decision-making process.
    • Examples:
      • 0.0: System's logic is fully transparent and can be formally verified.
      • 0.2: System uses an interpretable model or provides detailed and reliable explanations for each decision.
      • 0.5: Post-hoc explanations can be generated, but they are not always accurate or complete.
      • 0.8: No explanation provided for the system's decisions.
  • Accountability

    • 0.0: Full accountability with mechanisms for redress, remediation, and independent oversight.
    • 0.1-0.3: Clear accountability framework in place, with defined roles, responsibilities, and processes for addressing errors and disputes.
    • 0.4-0.6: Some responsibility assigned to developers or operators, but no formal accountability framework.
    • 0.7-1.0: No clear lines of accountability for system's actions or errors.
    • Examples:
      • 0.0: System has a formal accountability framework with mechanisms for independent audits and public reporting.
      • 0.2: Clear roles and responsibilities defined for development, deployment, and operation, with an incident response plan.
      • 0.5: Development team is generally responsible, but there are no clear procedures for handling errors.
      • 0.9: Unclear who is responsible when the system makes a mistake.
  • Societal Impact

    • 0.0: System designed to maximize positive societal impact and minimize negative consequences, with ongoing monitoring and engagement with affected communities.
    • 0.1-0.3: Thorough societal impact assessment conducted, considering a wide range of stakeholders and potential harms, with mitigation strategies in place.
    • 0.4-0.6: Some consideration of potential societal impacts, but no comprehensive assessment or proactive mitigation.
    • 0.7-1.0: High risk of negative societal impacts (e.g., job displacement, manipulation, erosion of trust), no assessment or mitigation.
    • Examples:
      • 0.0: System is designed with a strong ethical framework, promoting fairness, transparency, and societal well-being.
      • 0.2: A comprehensive societal impact assessment has been conducted, and mitigation strategies are in place.
      • 0.5: Developers acknowledge potential negative impacts but have not taken concrete steps to address them.
      • 0.8: System could be used for mass surveillance or to spread misinformation without any safeguards.

DC (Decision Criticality)

  • Safety-Critical

    • 0.0: System formally verified to meet safety-critical standards (e.g., ISO 26262 for automotive, IEC 62304 for medical devices).
    • 0.1-0.3: Rigorous safety testing performed, including edge cases and failure scenarios, with failsafe mechanisms and human oversight.
    • 0.4-0.6: Basic safety measures in place (e.g., some redundancy), but no rigorous safety testing or formal verification.
    • 0.7-1.0: System used in safety-critical applications (e.g., autonomous driving, medical diagnosis) without proper safety considerations or failsafe mechanisms.
    • Examples:
      • 0.0: System is certified to meet relevant safety standards for its application domain.
      • 0.2: System undergoes rigorous safety testing and has multiple failsafe mechanisms in place.
      • 0.5: System has some backup systems, but they have not been thoroughly tested.
      • 0.9: System used to control a critical function without any redundancy or failsafe mechanisms.
  • Financial Impact

    • 0.0: System designed to minimize financial risks, with real-time fraud prevention, anomaly detection, and comprehensive insurance coverage.
    • 0.1-0.3: Robust financial controls and fraud detection mechanisms in place, regular audits conducted to identify and mitigate financial risks.
    • 0.4-0.6: Some measures to mitigate financial risks (e.g., transaction limits), but no comprehensive risk assessment or fraud prevention mechanisms.
    • 0.7-1.0: High risk of significant financial loss due to system errors or malicious attacks, no safeguards in place.
    • Examples:
      • 0.0: System has multiple layers of financial controls, real-time fraud prevention, and insurance against financial losses.
      • 0.2: System uses advanced fraud detection algorithms and undergoes regular financial audits.
      • 0.5: System has some transaction limits and basic fraud monitoring.
      • 0.8: System errors could lead to large unauthorized transactions without any detection.
  • Reputational Damage

    • 0.0: System designed to minimize reputational risks, with ongoing monitoring of public perception, proactive engagement with stakeholders, and a robust crisis management plan.
    • 0.1-0.3: Reputational risk assessment conducted, considering various scenarios and stakeholders, with communication plans and mitigation strategies in place.
    • 0.4-0.6: Some awareness of reputational risks, limited monitoring of public perception, but no proactive measures to address negative publicity.
    • 0.7-1.0: High risk of severe reputational damage due to system errors, biases, or security breaches, no mitigation strategies.
    • Examples:
      • 0.0: System is designed to be transparent and ethical, minimizing the risk of reputational damage, and the company has a strong track record of responsible AI practices.
      • 0.2: A reputational risk assessment has been conducted, and a crisis communication plan is in place.
      • 0.5: Company monitors social media for negative comments but has no plan to address them.
      • 0.9: System errors or biases could lead to widespread public criticism and loss of trust.
  • Operational Disruption

    • 0.0: System designed for high availability and resilience, with real-time monitoring, automated recovery, and regular testing of failover mechanisms.
    • 0.1-0.3: Robust operational controls, including redundancy, failover mechanisms, and a comprehensive business continuity and disaster recovery plan.
    • 0.4-0.6: Some measures to mitigate operational risks (e.g., limited redundancy), but no comprehensive business continuity plan.
    • 0.7-1.0: High risk of significant operational disruption due to system failures or attacks, no backup systems or recovery plans.
    • Examples:
      • 0.0: System is designed for 24/7 availability with multiple layers of redundancy and automated recovery.
      • 0.2: System has a comprehensive business continuity plan that is regularly tested and updated.
      • 0.5: System has some redundant components, but failover procedures are not regularly tested.
      • 0.8: System failure could bring down critical business operations with no backup.

AD (Adaptability)

  • Continuous Monitoring

    • 0.0: Real-time monitoring with automated response to detected threats, including dynamic model adaptation and rollback capabilities.
    • 0.1-0.3: Comprehensive monitoring of system inputs, outputs, and internal states, with anomaly detection algorithms and automated alerts for suspicious activity.
    • 0.4-0.6: Basic monitoring in place (e.g., logging system outputs), but limited analysis and no automated alerts.
    • 0.7-1.0: No monitoring for adversarial attacks, anomalies, or performance degradation.
    • Examples:
      • 0.0: System has real-time intrusion detection and automated response capabilities.
      • 0.2: System uses a SIEM system to monitor for anomalies and generate alerts.
      • 0.5: System logs are stored but only analyzed manually on a periodic basis.
      • 0.9: No logs are collected, and no monitoring is performed.
  • Retraining Capabilities

    • 0.0: Continuous and automated retraining triggered by performance degradation, concept drift, or the availability of new data, with minimal human intervention.
    • 0.1-0.3: Automated retraining pipeline in place, allowing for regular updates with new data and model improvements.
    • 0.4-0.6: Manual retraining possible, but infrequent and time-consuming, with limited automation.
    • 0.7-1.0: No capability to retrain the model, or retraining requires significant manual effort and downtime.
    • Examples:
      • 0.0: Model continuously learns and adapts to new data and changing conditions.
      • 0.2: Model is automatically retrained on a regular schedule using an automated pipeline.
      • 0.5: Model can be retrained manually, but it requires significant effort and downtime.
      • 0.8: Model cannot be updated without rebuilding it from scratch.
  • Threat Intelligence Integration

    • 0.0: Proactive threat hunting based on threat intelligence, with automated analysis and correlation of threat data to identify and mitigate potential risks before they impact the system.
    • 0.1-0.3: Threat intelligence feeds integrated into security monitoring and response systems, providing automated alerts and updates on emerging threats.
    • 0.4-0.6: Basic threat intelligence used (e.g., manually reviewing threat reports), but not systematically integrated into security operations.
    • 0.7-1.0: No integration with threat intelligence feeds or other sources of security information.
    • Examples:
      • 0.0: System uses threat intelligence to proactively identify and mitigate potential vulnerabilities.
      • 0.2: System automatically ingests and analyzes threat intelligence feeds, generating alerts for relevant threats.
      • 0.5: Security team occasionally reviews threat intelligence reports but takes no specific actions.
      • 0.9: Security team is not aware of current threats to AI systems.
  • Adversarial Training

    • 0.0: Continuous adversarial training with evolving attack techniques, incorporating new attacks as they are discovered, and using formal verification methods to ensure robustness.
    • 0.1-0.3: Robust adversarial training against a wide range of attacks (e.g., PGD, C&W) with larger perturbation budgets, using multiple techniques (e.g., ensemble adversarial training, certified defenses).
    • 0.4-0.6: Basic adversarial training with a limited set of attack types (e.g., FGSM) and small perturbation budgets.
    • 0.7-1.0: No adversarial training used during model development.
    • Examples:
      • 0.0: Model undergoes continuous adversarial training and is formally verified for robustness against specific attack models.
      • 0.2: Model is trained using a combination of different adversarial training techniques and attack types.
      • 0.5: Model is trained with FGSM-generated adversarial examples.
      • 0.8: Model is not trained to be resistant to any adversarial examples.

AA (Adversarial Attack Surface)

  • Model Inversion

    • 0.0: Model provably resistant to model inversion attacks, with formal guarantees on the privacy of the training data.
    • 0.1-0.3: Strong defenses against model inversion, such as differential privacy or data sanitization techniques, significantly increasing the difficulty of reconstructing training data.
    • 0.4-0.6: Some measures to mitigate model inversion (e.g., limiting model output precision), but significant risks remain.
    • 0.7-1.0: High risk of model inversion attacks, sensitive training data can be easily reconstructed from model outputs or gradients.
    • Examples:
      • 0.0: Model is formally proven to be resistant to model inversion under specific attack models.
      • 0.2: Model is trained with differential privacy, providing strong guarantees against model inversion.
      • 0.5: Model's output is rounded or perturbed to make inversion more difficult, but some information may still be leaked.
      • 0.9: An attacker can easily reconstruct faces or other sensitive data from the model's outputs.
  • Model Extraction

    • 0.0: Model provably resistant to model extraction, with formal guarantees on the difficulty of creating a functional copy.
    • 0.1-0.3: Strong defenses against model extraction, such as anomaly detection on API queries, model watermarking, and legal agreements with users, making it significantly more difficult and costly to steal the model.
    • 0.4-0.6: Some measures to mitigate model extraction (e.g., rate limiting, watermarking), but a determined attacker can still succeed.
    • 0.7-1.0: High risk of model extraction, attackers can easily create a functional copy of the model by querying its API.
    • Examples:
      • 0.0: Model is designed to be resistant to model extraction, and its functionality cannot be replicated through black-box queries.
      • 0.2: Model uses watermarking and anomaly detection to detect and prevent extraction attempts.
      • 0.5: API access is rate-limited, but an attacker can still extract the model over a longer period.
      • 0.8: An attacker can create a copy of the model by making a large number of API calls.
  • Membership Inference

    • 0.0: Model provably resistant to membership inference attacks, with formal guarantees on the privacy of individual training data points.
    • 0.1-0.3: Strong defenses against membership inference, such as differential privacy or model stacking, significantly reducing the attacker's ability to infer membership.
    • 0.4-0.6: Some measures to mitigate membership inference (e.g., regularization, dropout), but significant risks remain.
    • 0.7-1.0: High risk of membership inference attacks, attackers can easily determine whether a specific data point was used in the model's training set.
    • Examples:
      • 0.0: Model is formally proven to be resistant to membership inference under specific attack models.
      • 0.2: Model is trained with differential privacy, providing strong protection against membership inference.
      • 0.5: Model uses regularization techniques that may reduce the risk of membership inference, but no formal guarantees.
      • 0.9: An attacker can easily determine if a particular individual's data was used to train the model.

LL (Lifecycle Vulnerabilities)

  • Development

    • 0.0: Secure development environment with formal verification of code, strict access controls, and continuous monitoring for security threats.
    • 0.1-0.3: Secure development lifecycle (SDL) practices followed, including code reviews, static analysis, and vulnerability scanning, with access controls on development resources.
    • 0.4-0.6: Basic security measures in the development environment (e.g., developer workstations have antivirus software), some secure coding guidelines, but no formal secure development lifecycle (SDL).
    • 0.7-1.0: Insecure development environment, no secure coding practices, no access controls on development resources.
    • Examples:
      • 0.0: Development environment is isolated and continuously monitored, with formal methods used to verify the security of critical code components.
      • 0.2: SDL practices are followed, including code reviews, static analysis, and vulnerability scanning, with access to code repositories restricted based on roles.
      • 0.5: Developers use company-provided laptops with basic security software, and some secure coding guidelines are in place.
      • 0.8: Developers work on personal laptops with no security controls, and code is stored in a public repository without access restrictions.
  • Training

    • 0.0: Secure and isolated training environment with formal verification of the training process, strict access controls, and continuous monitoring for intrusions and anomalies.
    • 0.1-0.3: Secure training environment with access controls, data encryption at rest and in transit, and regular security audits.
    • 0.4-0.6: Basic security measures in the training environment (e.g., training data stored on a password-protected server), but no encryption or strict access controls.
    • 0.7-1.0: Insecure training environment, no data security or access controls, training data stored and processed on unsecured systems.
    • Examples:
      • 0.0: Training is performed in a secure enclave with strict access controls, continuous monitoring, and formal verification of the training process.
      • 0.2: Training data is encrypted at rest and in transit, access is restricted based on roles, and the training environment is regularly audited for security.
      • 0.5: Training data is stored on a password-protected server, but access is not strictly controlled.
      • 0.8: Training data is stored on a public cloud server without any encryption or access controls.
  • Deployment

    • 0.0: Secure and isolated deployment environment with continuous monitoring, automated security patching, and formal verification of the deployment process.
    • 0.1-0.3: Secure deployment environment with strong authentication and authorization, regular security updates, and intrusion detection systems.
    • 0.4-0.6: Basic security measures in the deployment environment (e.g., model deployed behind a firewall), but no strong authentication or authorization mechanisms.
    • 0.7-1.0: Insecure deployment environment, no access controls or security monitoring, model deployed on publicly accessible servers without any protection.
    • Examples:
      • 0.0: Model is deployed in a secure enclave with strict access controls, continuous monitoring, and automated security patching.
      • 0.2: Model is deployed in a secure cloud environment with strong authentication, authorization, and regular security updates.
      • 0.5: Model is deployed behind a firewall, but API keys are shared among multiple users.
      • 0.8: Model is deployed on a public server with no authentication required to access its API.
  • Operations

    • 0.0: Continuous security monitoring with automated incident response capabilities, regular security audits, and a dedicated security operations center (SOC).
    • 0.1-0.3: Comprehensive security monitoring using a SIEM system, automated alerts for suspicious activity, and a well-defined incident response plan that is regularly tested.
    • 0.4-0.6: Basic security monitoring (e.g., manually reviewing logs), limited incident response capabilities, no formal incident response plan.
    • 0.7-1.0: No security monitoring or incident response plan, system logs not collected or analyzed.
    • Examples:
      • 0.0: A dedicated SOC monitors the system 24/7, with automated incident response capabilities and regular security audits.
      • 0.2: A SIEM system is used to monitor security events, generate alerts, and trigger incident response procedures.
      • 0.5: System logs are collected and manually reviewed on a weekly basis, and there is a basic incident response plan.
      • 0.8: No logs are collected, and there is no process for responding to security incidents.

GV (Governance and Validation)

  • Compliance

    • 0.0: System exceeds regulatory requirements and sets industry best practices for compliance, with a proactive approach to adapting to new regulations.
    • 0.1-0.3: Full compliance with relevant regulations and industry standards, with a dedicated compliance team and regular audits.
    • 0.4-0.6: Basic understanding of regulations, some ad-hoc compliance efforts, but no formal compliance program.
    • 0.7-1.0: No awareness of or compliance with relevant regulations (e.g., GDPR, CCPA, HIPAA) or industry standards.
    • Examples:
      • 0.0: System is designed to be compliant by design, exceeding regulatory requirements and setting industry best practices.
      • 0.2: System is fully compliant with all applicable regulations, with regular audits and a dedicated compliance team.
      • 0.5: Some efforts are made to comply with regulations, but there are significant gaps and no formal compliance program.
      • 0.8: System collects and processes personal data without user consent or proper safeguards, violating data privacy regulations.
  • Auditing

    • 0.0: Regular independent audits by reputable third parties, with formal verification of the system's security, fairness, and ethical performance.
    • 0.1-0.3: Regular internal audits conducted, covering all aspects of the AI system lifecycle, with clear audit trails and documentation.
  • 0.4-0.6: Infrequent or limited audits (e.g., only auditing code for security vulnerabilities), with no independent verification.

    • 0.7-1.0: No auditing of the AI system's design, development, deployment, or operation.
    • Examples:
      • 0.0: Independent audits are conducted annually by a reputable third party, with the results publicly reported.
      • 0.2: Regular internal audits are conducted, covering security, fairness, and performance, with detailed audit trails.
      • 0.5: Code is audited for security vulnerabilities before deployment, but no other audits are conducted.
      • 0.8: No audit logs are maintained, and no audits are performed.
  • Risk Management

    • 0.0: Proactive and continuous AI risk management, with a dedicated AI risk management team, regular risk assessments, and a strong focus on anticipating and mitigating emerging AI risks.
    • 0.1-0.3: Comprehensive AI risk management framework in place, with specific processes for identifying, assessing, mitigating, and monitoring AI risks, fully integrated into the organizational risk framework.
    • 0.4-0.6: Basic risk assessment for AI systems, limited mitigation strategies, AI risks partially integrated into the organizational risk framework.
    • 0.7-1.0: No AI-specific risk management processes, AI risks not considered in the overall organizational risk framework.
    • Examples:
      • 0.0: AI risk management is a continuous process, integrated with the organization's overall risk management and governance structures.
      • 0.2: A comprehensive AI risk management framework is in place, with regular risk assessments and mitigation plans.
      • 0.5: AI risks are assessed on an ad-hoc basis, with limited mitigation strategies.
      • 0.8: AI risks are not considered in the organization's risk management processes.
  • Human Oversight

    • 0.0: Human-in-the-loop system with well-defined roles and responsibilities, clear procedures for human-machine collaboration, and mechanisms for human oversight at various stages of the system's operation.
    • 0.1-0.3: Clear mechanisms for human review and intervention in the system's decision-making process, with well-defined roles and responsibilities for human operators.
    • 0.4-0.6: Limited human oversight, primarily reactive (e.g., users can report errors), no clear mechanisms for human intervention or override.
    • 0.7-1.0: No human oversight or intervention in the AI system's decision-making process.
    • Examples:
      • 0.0: System is designed for human-machine collaboration, with humans playing a central role in the decision-making process.
      • 0.2: System has mechanisms for human operators to review and override its decisions in specific cases.
      • 0.5: Users can report errors, but there is no process for human intervention in the system's decisions.
      • 0.8: System operates autonomously without any human control or monitoring.
  • Ethical Framework Alignment

    • 0.0: System demonstrably adheres to and promotes ethical AI principles, with ongoing monitoring and auditing of ethical performance.
    • 0.1-0.3: System design and operation align with established ethical frameworks (e.g., OECD AI Principles, Montreal Declaration for Responsible AI), with mechanisms for addressing ethical concerns.
    • 0.4-0.6: Basic awareness of ethical guidelines, limited implementation, no formal ethical review process.
    • 0.7-1.0: No consideration of ethical frameworks or principles in the design, development, or deployment of the AI system.
    • Examples:
      • 0.0: System's ethical performance is regularly assessed, and it actively promotes ethical AI principles.
      • 0.2: System design incorporates principles from relevant ethical frameworks, and there is a process for addressing ethical concerns.
      • 0.5: Developers are aware of ethical guidelines but have not formally integrated them into the system's design.
      • 0.8: System is developed and deployed without any consideration for ethical implications.

CS (Cloud Security Alliance LLM Taxonomy)

  • Model Manipulation

    • 0.0: System provably resistant to model manipulation, with formal verification of robustness against prompt injection and other adversarial techniques.
    • 0.1-0.3: Strong defenses against model manipulation (e.g., input filtering, adversarial training, output validation), making it difficult to manipulate the model's behavior.
    • 0.4-0.6: Some defenses against manipulation (e.g., basic input sanitization), but vulnerabilities remain, and the model can be manipulated with some effort.
    • 0.7-1.0: Highly vulnerable to model manipulation, including prompt injection and other adversarial techniques, with no or minimal defenses in place.
    • Examples:
      • 0.0: Model's resistance to prompt injection is formally verified.
      • 0.2: Model uses a combination of input filtering, adversarial training, and output validation to defend against manipulation.
      • 0.5: Model has basic input sanitization, but can still be manipulated by carefully crafted prompts.
      • 0.8: Model is easily manipulated by prompt injection attacks.
  • Data Poisoning

    • 0.0: System provably resistant to data poisoning, with formal guarantees on the integrity and security of the training data.
    • 0.1-0.3: Strong data validation, anomaly detection, and provenance tracking mechanisms in place, making it very difficult to successfully poison the training data.
    • 0.4-0.6: Some measures to mitigate data poisoning (e.g., outlier detection), but risks remain, and targeted poisoning attacks may still be possible.
    • 0.7-1.0: High risk of data poisoning, with no or minimal measures to ensure the integrity and security of the training data.
    • Examples:
      • 0.0: Training data is stored on an immutable ledger with cryptographic verification of its integrity.
      • 0.2: Robust data validation, anomaly detection, and provenance tracking mechanisms are used to prevent and detect data poisoning.
      • 0.5: Basic outlier detection is used, but sophisticated poisoning attacks may still succeed.
      • 0.8: Training data can be easily tampered with, and there are no mechanisms to detect poisoning.
  • Sensitive Data Disclosure

    • 0.0: System provably prevents sensitive data disclosure, with formal guarantees on the privacy of sensitive information.
    • 0.1-0.3: Strong access controls, encryption, and output sanitization mechanisms in place, making it very difficult to extract sensitive data from the system.
    • 0.4-0.6: Some measures to prevent data leakage (e.g., output filtering), but vulnerabilities remain, and sensitive information may be disclosed under certain circumstances.
    • 0.7-1.0: High risk of sensitive data disclosure, with no or minimal measures to protect sensitive information processed or stored by the system.
    • Examples:
      • 0.0: System uses homomorphic encryption or other privacy-preserving techniques to prevent any sensitive data disclosure.
      • 0.2: Strong access controls, encryption, and output sanitization are used to prevent data leakage.
      • 0.5: Model outputs are filtered to remove potentially sensitive information, but some leakage may still occur.
      • 0.8: Model may reveal sensitive information in its outputs, and there are no safeguards against data exfiltration.
  • Model Stealing

    • 0.0: Model provably resistant to model stealing, with formal guarantees on the difficulty of creating a functional copy.
    • 0.1-0.3: Strong defenses against model stealing (e.g., anomaly detection on API queries, model watermarking, legal agreements), making it significantly more difficult and costly to steal the model.
    • 0.4-0.6: Some measures to mitigate model stealing (e.g., rate limiting), but a determined attacker can still succeed.
    • 0.7-1.0: High risk of model stealing, and attackers can easily create a functional copy of the model by querying its API.
    • Examples:
      • 0.0: Model is designed to be resistant to model extraction, and its functionality cannot be replicated through black-box queries.
      • 0.2: Model uses a combination of watermarking, anomaly detection, and legal agreements to deter and detect model stealing.
      • 0.5: API access is rate-limited, but an attacker can still extract the model over a longer period.
      • 0.8: An attacker can create a copy of the model by making a large number of API calls.
  • Failure/Malfunctioning

    • 0.0: System designed for high availability and fault tolerance, with formal verification of its reliability.
    • 0.1-0.3: Robust error handling, monitoring, and redundancy mechanisms in place, significantly reducing the risk of failures or malfunctions.
    • 0.4-0.6: Some measures to ensure reliability (e.g., basic error handling), but risks remain, and the system may experience downtime or produce incorrect outputs under certain conditions.
    • 0.7-1.0: High risk of failures or malfunctions, with no or minimal measures to ensure the system's reliability.
    • Examples:
      • 0.0: System is designed with multiple layers of redundancy and failover mechanisms, and its reliability is formally verified.
      • 0.2: System has robust error handling, monitoring, and self-healing capabilities.
      • 0.5: System has basic error handling and logging, but may experience downtime due to unexpected errors.
      • 0.8: System is prone to crashes or errors, and there are no mechanisms to ensure its continuous operation.
  • Insecure Supply Chain

    • 0.0: Secure and auditable supply chain, with formal verification of all third-party components and dependencies.
    • 0.1-0.3: Strong supply chain security practices in place (e.g., code signing, dependency verification, regular audits), minimizing the risk of supply chain attacks.
    • 0.4-0.6: Some measures to mitigate supply chain risks (e.g., using trusted sources), but vulnerabilities may still exist in third-party components.
    • 0.7-1.0: High risk of supply chain vulnerabilities, with no or minimal measures to ensure the security of third-party components and dependencies.
    • Examples:
      • 0.0: All third-party components are formally verified for security, and the supply chain is continuously monitored for vulnerabilities.
      • 0.2: Strong security practices are followed throughout the supply chain, including code signing, dependency verification, and regular audits.
      • 0.5: Third-party libraries are used from reputable sources, but they are not thoroughly vetted for security vulnerabilities.
      • 0.8: System relies on outdated or unpatched third-party components with known vulnerabilities.
  • Insecure Apps/Plugins

    • 0.0: Secure development and integration practices for apps/plugins enforced, with formal verification of their security.
    • 0.1-0.3: Strong security guidelines and vetting process for apps/plugins, minimizing the risk of vulnerabilities introduced by third-party integrations.
    • 0.4-0.6: Some security measures for apps/plugins (e.g., sandboxing), but risks remain, and vulnerabilities may be introduced through insecure integrations.
    • 0.7-1.0: High risk of vulnerabilities from insecure apps/plugins, with no or minimal measures to ensure the security of third-party integrations.
    • Examples:
      • 0.0: All apps/plugins undergo a rigorous security review and are formally verified before being allowed to integrate with the system.
      • 0.2: Strong security guidelines are in place for app/plugin development, and a vetting process is used to minimize risks.
      • 0.5: Apps/plugins are sandboxed, but they may still have access to sensitive data or functionalities.
      • 0.8: Apps/plugins can be easily installed without any security checks, potentially introducing vulnerabilities into the system.
  • Denial of Service (DoS)

    • 0.0: System provably resistant to DoS attacks, with formal guarantees on its availability under high load or malicious traffic.
    • 0.1-0.3: Strong defenses against DoS attacks (e.g., traffic filtering, rate limiting, auto-scaling), making it very difficult to disrupt the system's availability.
    • 0.4-0.6: Some measures to mitigate DoS attacks (e.g., basic rate limiting), but the system may still be vulnerable to sophisticated attacks.
    • 0.7-1.0: Highly vulnerable to DoS attacks, with no or minimal measures to protect the system's availability.
    • Examples:
      • 0.0: System is designed to withstand massive traffic spikes and is formally verified for its resistance to DoS attacks.
      • 0.2: System uses a combination of traffic filtering, rate limiting, and auto-scaling to mitigate DoS attacks.
      • 0.5: System has basic rate limiting, but can still be overwhelmed by a large number of requests.
      • 0.8: System can be easily made unavailable by sending a large number of requests or malicious traffic.
  • Loss of Governance/Compliance

    • 0.0: System meets or exceeds all relevant regulatory and governance requirements, with a proactive approach to adapting to new regulations and a strong focus on maintaining compliance.
    • 0.1-0.3: Strong compliance framework and controls in place, ensuring adherence to relevant regulations and governance policies.
    • 0.4-0.6: Some compliance efforts, but gaps remain, and the system may not fully meet all regulatory or governance requirements.
    • 0.7-1.0: High risk of non-compliance with regulations or governance policies, with no or minimal measures to ensure adherence.
    • Examples:
      • 0.0: System is designed to be compliant by design, with automated mechanisms to ensure adherence to regulations and policies.
      • 0.2: System is regularly audited for compliance, and a dedicated team ensures that all requirements are met.
      • 0.5: Some efforts are made to comply with regulations, but there are significant gaps and no formal compliance program.
      • 0.8: System does not meet data privacy regulations, and there are no mechanisms to ensure compliance with internal policies.

5. Scoring Methodology

Base Formula

AIVSS_Score = [
    (w₁ × ModifiedBaseScore) +
    (w₂ × AISpecificMetrics) +
    (w₃ × ImpactMetrics)
] × TemporalMetrics × MitigationMultiplier

Where: 0 ≤ AIVSS_Score ≤ 10
  • w₁, w₂, w₃: Weights assigned to each component (Modified Base, AI-Specific, Impact). Suggested starting point: w₁ = 0.3, w₂ = 0.5, w₃ = 0.2 (giving more weight to AI-specific risks). Adjust based on the specific AI system and its risk profile.
  • TemporalMetrics: Adjustments based on exploitability, remediation level, and report confidence (similar to CVSS Temporal Score).
    • Exploitability (E):
      • Not Defined (ND): 1.0
      • Unproven (U): 0.9
      • Proof-of-Concept (P): 0.95
      • Functional (F): 1.0
      • High (H): 1.0
    • Remediation Level (RL):
      • Not Defined (ND): 1.0
      • Official Fix (O): 0.95
      • Temporary Fix (T): 0.96
      • Workaround (W): 0.97
      • Unavailable (U): 1.0
    • Report Confidence (RC):
      • Not Defined (ND): 1.0
      • Unknown (U): 0.92
      • Reasonable (R): 0.96
      • Confirmed (C): 1.0
  • MitigationMultiplier: A factor (ranging from 1.0 to 1.5) that increases the score based on the lack of effective mitigations. 1.0 = Strong Mitigation; 1.5 = No/Weak Mitigation.

6. Component Calculations

1. Modified Base Metrics

ModifiedBaseScore = min(10, [MAV × MAC × MPR × MUI × MS] × ScopeMultiplier)

Where the Modified Base Metrics (MAV, MAC, MPR, MUI, MS) are derived from the Base Metrics, adjusted according to the specific environment and using the Environmental Metrics. Each Modified Base Metric can be rated the same way as the Base Metrics, with the addition of:

  • Not Defined (X): Uses the unmodified Base Metric value.

2. AI-Specific Metrics

AISpecificMetrics = [MR × DS × EI × DC × AD × AA × LL × GV × CS] × ModelComplexityMultiplier
  • Each metric (MR, DS, EI, DC, AD, AA, LL, GV, CS) is scored from 0.0 to 1.0 based on the severity of the vulnerability in each sub-category, using the detailed scoring rubric provided above (higher score = more severe issue).
  • ModelComplexityMultiplier: A factor (1.0 to 1.5) to account for the increased attack surface and complexity of more advanced models.

3. Impact Metrics

ImpactMetrics = (C + I + A + SI) / 4
  • C (Confidentiality Impact): Impact on data confidentiality.
  • I (Integrity Impact): Impact on data and system integrity.
  • A (Availability Impact): Impact on system availability.
  • SI (Societal Impact): Broader societal harms (e.g., discrimination, manipulation). Informed by the EI (Ethical Implications) sub-categories.

Severity Levels (for C, I, A, SI):

  • None: 0.0
  • Low: 0.22
  • Medium: 0.55
  • High: 0.85
  • Critical: 1.0

4. Environmental Score

The Environmental Score is calculated by modifying the Base Score with the Environmental metrics. The formula integrates these considerations:

EnvironmentalScore = [(ModifiedBaseScore + (Environmental Component)) × TemporalMetrics] × (1 + EnvironmentalMultiplier)

Environmental Component is derived from the AI-Specific Metrics, adjusted based on the environmental context:

EnvironmentalComponent = [CR × IR × AR × SIR] × AISpecificMetrics

Where:

  • CR, IR, AR, SIR are the Confidentiality, Integrity, Availability, and Societal Impact Requirements, respectively.
  • EnvironmentalMultiplier adjusts the score based on specific environmental factors not covered by CR, IR, AR, SIR.

Risk Categories

Critical: 9.0 - 10.0
High:     7.0 - 8.9
Medium:   4.0 - 6.9
Low:      0.1 - 3.9
None:     0.0

7. Implementation Guide

Prerequisites

  • Access to AI system architecture details
  • Security assessment tools
  • Understanding of ML/AI concepts and the specific AI model under assessment.
  • Expertise in ethical AI principles and potential societal impacts.
  • Familiarity with cloud security principles, particularly the CSA LLM Threat Taxonomy if the AI system is cloud-based or an LLM.
  • Experience with vulnerability analysis, particularly in the context of AI/ML systems.

Roles and Responsibilities:

  • AI Security Team/Specialist: Leads the AIVSS assessment, coordinates with other teams, ensures accuracy and completeness.
  • AI Developers/Data Scientists: Provide technical details, assist in identifying vulnerabilities, implement mitigations.
  • Security Engineers: Assess base metrics, evaluate the security of development, training, and deployment environments, contribute to the overall assessment.
  • Compliance/Risk Officer: Ensures alignment with regulations and organizational risk management frameworks.
  • Ethical AI Officer/Review Board: Evaluates ethical implications and provides guidance on mitigating ethical risks.

8. AIVSS Assessment Checklist

This checklist provides a simplified and actionable guide for organizations to conduct an AIVSS assessment.

Phase 1: System and Environment Definition

  • 1.1 Identify the AI system to be assessed, including its components, data flows, and dependencies.
  • 1.2 Define the system's operational environment, including its deployment model (cloud, on-premise, hybrid), network configuration, and user base.
  • 1.3 Determine the Environmental metrics (CR, IR, AR, SIR) based on the system's specific context.
  • 1.4 Document the Modified Base Metrics (MAV, MAC, MPR, MUI, MS) based on environmental factors.

Phase 2: Base and AI-Specific Metrics Evaluation

  • 2.1 Evaluate the Base Metrics (AV, AC, PR, UI, S) based on the identified vulnerability.
  • 2.2 Assess each AI-Specific Metric (MR, DS, EI, DC, AD, AA, LL, GV, CS) using the detailed scoring rubric:
    • 2.2.1 Model Robustness (MR): Evasion Resistance, Gradient Masking, Robustness Certification.
    • 2.2.2 Data Sensitivity (DS): Data Confidentiality, Data Integrity, Data Provenance.
    • 2.2.3 Ethical Implications (EI): Bias and Discrimination, Transparency and Explainability, Accountability, Societal Impact.
    • 2.2.4 Decision Criticality (DC): Safety-Critical, Financial Impact, Reputational Damage, Operational Disruption.
    • 2.2.5 Adaptability (AD): Continuous Monitoring, Retraining Capabilities, Threat Intelligence Integration, Adversarial Training.
    • 2.2.6 Adversarial Attack Surface (AA): Model Inversion, Model Extraction, Membership Inference.
    • 2.2.7 Lifecycle Vulnerabilities (LL): Development, Training, Deployment, Operations.
    • 2.2.8 Governance and Validation (GV): Compliance, Auditing, Risk Management, Human Oversight, Ethical Framework Alignment.
    • 2.2.9 Cloud Security Alliance LLM Taxonomy (CS): Model Manipulation, Data Poisoning, Sensitive Data Disclosure, Model Stealing, Failure/Malfunctioning, Insecure Supply Chain, Insecure Apps/Plugins, Denial of Service (DoS), Loss of Governance/Compliance.
  • 2.3 Determine the Model Complexity Multiplier based on the assessed AI model.

Phase 3: Impact and Temporal Assessment

  • 3.1 Assess the Impact Metrics (C, I, A, SI) based on the potential consequences of the vulnerability.
  • 3.2 Evaluate the Temporal Metrics (E, RL, RC) based on the current exploitability, available remediation, and report confidence.

Phase 4: Mitigation and Scoring

  • 4.1 Evaluate the effectiveness of existing mitigations and determine the Mitigation Multiplier.
  • 4.2 Calculate the Modified Base Score.
  • 4.3 Calculate the AI-Specific Metrics Score.
  • 4.4 Calculate the Impact Metrics Score.
  • 4.5 Calculate the Environmental Component.
  • 4.6 Calculate the Environmental Score.
  • 4.7 Generate the final AIVSS Score using the formula.

Phase 5: Reporting and Remediation

  • 5.1 Document the assessment findings in a comprehensive report, including the AIVSS score, detailed metric scores, justifications, and supporting evidence.
  • 5.2 Communicate the assessment results to relevant stakeholders (technical teams, management, board of directors).
  • 5.3 Develop and prioritize recommendations for remediation based on the AIVSS score and the identified vulnerabilities.
  • 5.4 Implement the recommended mitigations and track progress.
  • 5.5 Re-assess the AI system after implementing mitigations to validate their effectiveness and update the AIVSS score.

9. Example Assessment:

# Example vulnerability assessment (Illustrative and Simplified)
vulnerability = {
    'attack_vector': 'Network',  # 0.85
    'attack_complexity': 'High',  # 0.44
    'privileges_required': 'Low',  # 0.62
    'user_interaction': 'None',  # 0.85
    'scope': 'Unchanged',  # 1.0
    'model_robustness': {
        'evasion_resistance': 0.7,  # High susceptibility to evasion
        'gradient_masking': 0.8,  # Gradients easily accessible
    },
    'data_sensitivity': {
        'data_confidentiality': 0.9,  # Sensitive data with minimal protection
        'data_integrity': 0.7  # No data integrity checks
    },
    'ethical_impact': {
        'bias_discrimination': 0.8, # High risk of discriminatory outcomes
        'transparency_explainability': 0.7, # Black-box system
    },
    'cloud_security': {
        'model_manipulation': 0.8, # Vulnerable to prompt injection
        'data_poisoning': 0.6, # Some risk of data poisoning
        'sensitive_data_disclosure': 0.7, # Risk of sensitive data leakage
        'model_stealing': 0.5, # Some model stealing mitigations, but risks remain
        'failure_malfunctioning': 0.7, # Risk of failures
        'insecure_supply_chain': 0.6, # Some supply chain risks
        'insecure_apps_plugins': 0.4, # Some app/plugin security, but risks remain
        'denial_of_service': 0.8, # Vulnerable to DoS
        'loss_of_governance_compliance': 0.7 # Risk of non-compliance
    },
    # ... (Other AI-specific metrics with sub-categories)
    'confidentiality_impact': 'High',  # 0.85
    'integrity_impact': 'Medium',  # 0.55
    'availability_impact': 'Low',  # 0.22
    'societal_impact': 'Medium',  # 0.55
    'temporal_metrics': {
        'exploitability': 'Proof-of-Concept', # 0.95
        'remediation_level': 'Temporary Fix', # 0.96
        'report_confidence': 'Confirmed' # 1.0
    },
    'mitigation_multiplier': 1.4, # Example: Weak Mitigation
    'model_complexity_multiplier': 1.4,  # Example: Complex model (e.g., large language model)
     'environmental_metrics': {
        'cr': 'High', # 1.5
        'ir': 'Medium', # 1.0
        'ar': 'Low', # 0.5
        'sir': 'Medium', # 1.0
    },

}

# Adjust Base Metrics according to environment
vulnerability['modified_attack_vector'] = vulnerability['attack_vector'] # No change
vulnerability['modified_attack_complexity'] = vulnerability['attack_complexity'] * 0.5 # Example: Lower complexity in this environment
vulnerability['modified_privileges_required'] = vulnerability['privileges_required'] # No change
vulnerability['modified_user_interaction'] = vulnerability['user_interaction'] # No change
vulnerability['modified_scope'] = vulnerability['scope'] # No change

# Calculate score (simplified and illustrative)

# Modified Base Metrics
modified_base_score = min(10, vulnerability['modified_attack_vector'] * vulnerability['modified_attack_complexity'] * vulnerability['modified_privileges_required'] * vulnerability['modified_user_interaction'] * vulnerability['modified_scope']) # = 0.098

# AI-Specific Metrics - Example calculations (using average for simplicity):
mr_score = (vulnerability['model_robustness']['evasion_resistance'] +
            vulnerability['model_robustness']['gradient_masking']) / 2  # = 0.75
ds_score = (vulnerability['data_sensitivity']['data_confidentiality'] +
            vulnerability['data_sensitivity']['data_integrity']) / 2  # = 0.8
ei_score = (vulnerability['ethical_impact']['bias_discrimination'] +
            vulnerability['ethical_impact']['transparency_explainability']) / 2  # = 0.75

# Cloud Security (CS) - using the detailed rubric and averaging:
cs_score = (vulnerability['cloud_security']['model_manipulation'] +
            vulnerability['cloud_security']['data_poisoning'] +
            vulnerability['cloud_security']['sensitive_data_disclosure'] +
            vulnerability['cloud_security']['model_stealing'] +
            vulnerability['cloud_security']['failure_malfunctioning'] +
            vulnerability['cloud_security']['insecure_supply_chain'] +
            vulnerability['cloud_security']['insecure_apps_plugins'] +
            vulnerability['cloud_security']['denial_of_service'] +
            vulnerability['cloud_security']['loss_of_governance_compliance']) / 9 # = 0.64

# Assume other AI-specific metrics are calculated and we have these scores (Illustrative):
dc_score = 0.7
ad_score = 0.55
aa_score = 0.6
ll_score = 0.75
gv_score = 0.8

# Calculate the overall AI-Specific Metrics score:
ais_score = (mr_score * ds_score * ei_score * dc_score * ad_score * aa_score * ll_score * gv_score * cs_score) * vulnerability['model_complexity_multiplier'] # = 0.095

# Impact Metrics
impact_score = (vulnerability['confidentiality_impact'] + vulnerability['integrity_impact'] + vulnerability['availability_impact'] + vulnerability['societal_impact']) / 4 # = 0.543

# Temporal Metrics - using average for simplicity:
temporal_score = (vulnerability['temporal_metrics']['exploitability'] +
                  vulnerability['temporal_metrics']['remediation_level'] +
                  vulnerability['temporal_metrics']['report_confidence']) / 3  # = 0.97

# Environmental Component
environmental_component = (vulnerability['environmental_metrics']['cr'] * vulnerability['environmental_metrics']['ir'] * vulnerability['environmental_metrics']['ar'] * vulnerability['environmental_metrics']['sir']) * ais_score # = 0.072
# Environmental Score
environmental_score = min(10, ((modified_base_score + environmental_component) * temporal_score) * vulnerability['mitigation_multiplier']) # = 0.232

# Final AIVSS Score
final_score = ((0.3 * modified_base_score) + (0.5 * ais_score) + (0.2 * impact_score)) * temporal_score * vulnerability['mitigation_multiplier'] # = 0.323

Interpretation of the Example:

In this example, the final AIVSS score is approximately 0.323, which falls into the Low risk category. The Environmental Score is factored in, slightly modifying the Base Score based on the specific environmental requirements.

Important Considerations:

  • Illustrative: This is a simplified example. Real-world assessments will involve a more thorough evaluation of each sub-category.
  • Weighting: The weights (w₁, w₂, w₃) can significantly influence the final score. Organizations should carefully consider the appropriate weights based on their specific risk profiles.
  • Context: The interpretation of the AIVSS score should always be done in the context of the specific AI system, its intended use, and the potential consequences of a security incident.

10. Reporting and Communication:

  • Assessment Report: A comprehensive report should be generated, including:
    • A summary of the AI system being assessed.
    • The final AIVSS score and risk category.
    • Detailed scores for each metric and sub-category.
    • Justification for the assigned scores, referencing the scoring rubric and evidence gathered.
    • An analysis of the key vulnerabilities and their potential impact.
    • Recommendations for mitigation, prioritized based on the severity of the vulnerabilities.
    • Appendices with supporting documentation (e.g., threat models, assessment data).
  • Communication: Assessment findings should be communicated to relevant stakeholders, including:
    • Technical Teams: To inform remediation efforts.
    • Management: To support risk management decisions.
    • Board of Directors: To provide an overview of the organization's AI security posture.

11. Integration with Risk Management Frameworks:

  • Mapping: AIVSS metrics can be mapped to existing risk categories within organizational risk management frameworks (e.g., NIST Cybersecurity Framework, ISO 27001).
  • Risk Assessments: AIVSS assessments can be incorporated into broader risk assessments.
  • Audits: AIVSS can be used as a framework for conducting audits of AI systems.

12. Appendix: AI Threat Taxonomies

Taxonomy Description Link
MITRE ATLAS A knowledge base of adversary tactics and techniques based on real-world observations, specifically focused on threats to machine learning systems. It provides a framework for understanding the adversarial ML lifecycle and includes case studies of attacks. https://atlas.mitre.org/
NIST AI 100-2 E2023 A taxonomy of adversarial machine learning, including attacks, defenses, and consequences. It provides a detailed framework for understanding and categorizing threats to AI systems and offers guidance on risk management. https://csrc.nist.gov/pubs/ai/100/2/e2023/final
EU HLEG Trustworthy AI Ethics guidelines for trustworthy AI developed by the European Commission's High-Level Expert Group on Artificial Intelligence. It focuses on human-centric AI principles, including fairness, transparency, accountability, and societal well-being. [https://digital-strategy.ec.europa
ISO/IEC JTC 1/SC 42 An international standards body developing standards for artificial intelligence. It covers various aspects of AI, including risk management, trustworthiness, bias, and governance. https://www.iso.org/committee/6794475.html
AI Incident Database A database of real-world incidents involving AI systems, including failures, accidents, and malicious attacks. It provides valuable data for understanding the risks associated with AI and informing risk management strategies. https://incidentdatabase.ai/
DARPA's GARD The Guaranteeing AI Robustness against Deception (GARD) program aims to develop defenses against adversarial attacks on AI systems. It focuses on developing robust AI that can withstand attempts to deceive or manipulate it. https://www.darpa.mil/research/programs/guaranteeing-ai-robustness-against-deception
OECD AI Principles Principles for responsible stewardship of trustworthy AI, adopted by the Organisation for Economic Co-operation and Development (OECD). They cover aspects such as inclusive growth, human-centered values, transparency, robustness, and accountability. https://oecd.ai/en/ai-principles
MITRE Atlas Matrix Adversarial ML Threat Matrix is a framework that captures the tactics, techniques, and procedures used by adversaries to attack ML systems. It is structured similarly to the ATT&CK framework but specialized for the domain of machine learning. https://atlas.mitre.org/
CSA LLM Threat Taxonomy Defines common threats related to large language models in the cloud. Key categories include model manipulation, data poisoning, sensitive data disclosure, model stealing, and others specific to cloud-based LLM deployments. https://cloudsecurityalliance.org/artifacts/csa-large-language-model-llm-threats-taxonomy
MIT AI Threat Taxonomy Comprehensive classification of attack surfaces, adversarial techniques, and governance vulnerabilities of AI. It details various types of attacks and provides mitigation strategies. https://arxiv.org/pdf/2408.12622
OWASP Top 10 for LLMs Highlights the most critical security risks for large language model applications. It covers vulnerabilities like prompt injection, data leakage, insecure output handling, and model denial of service, among others. https://owasp.org/www-project-top-10-for-large-language-model-applications/

Note: This table is intended to be a starting point and may not be exhaustive. New taxonomies and frameworks may emerge as the field of AI security continues to evolve.

13. Continuous Improvement

This AIVSS framework should be treated as a living document. It will be revised and updated. Organizations are encouraged to provide feedback, contribute to its development, and adapt it to their specific needs. Regular updates will be released to incorporate new research, threat intelligence, and best practices.

14. Conclusion

The AIVSS framework provides a structured and comprehensive approach to assessing and quantifying the security risks of AI systems. By using this framework, along with the provided checklist, organizations can gain a better understanding of their AI-specific vulnerabilities, prioritize remediation efforts, and improve their overall AI security posture. The detailed scoring rubric, the inclusion of relevant AI threat taxonomies, the addition of the Environmental Score, and the focus on practical implementation make AIVSS a valuable tool for securing the future of AI. Continuous improvement, community engagement, and adaptation to the evolving threat landscape will be crucial for the long-term success and adoption of AIVSS.

Disclaimer:

AIVSS is a framework for assessing and scoring the security risks of AI systems. It is not a guarantee of security, and it should not be used as the sole basis for making security decisions. Organizations should use AIVSS in conjunction with other security best practices and should always exercise their own judgment when assessing and mitigating AI security risks.

Contributors 5