code4thought

iQ4AI

Feed useful insights into the AI processes of your organisation

Our own proprietary AI-testing platform, enables the analysis of any type of data and AI models based on the ISO 29119-11 technical report that provides guidelines for testing AI-based systems.
We dig deep into the AI model and the data it utilizes in order to provide a comprehensive overview of all relevant aspects of an AI system, so that our clients can understand what’s going on in the AI model/algorithm itself. Our analysis focuses on the Quality of the AI system and includes:
Performance Testing
where we:
assess how effectively an AI system performs its intended tasks. This includes evaluating the system’s accuracy and efficiency in various operational scenarios. It’s crucial for determining whether the AI system meets the operational requirements of a business.
Trustworthiness Testing
that encompasses several critical aspects:
Fairness / Bias Testing: Ensuring the AI system does not exhibit biased decision-making and treats all user groups equitably.
Transparency / Explainability Analysis: The provision of explanations on how the decisions of the AI system are being derived and assessment of the validity, clarity and understandability of the decision-making processes
Robustness / Security: Testing for vulnerabilities to protect the system from cyber threats and safeguard data integrity.
Reliability: Evaluating the consistency and dependability of the AI system over time and under different conditions.
Those analyses ensure a comprehensive evaluation of AI systems, addressing both their functional efficiency and their ethical, secure, and transparent operation. Based on these results, as well as our clients’ own business context, our advisors develop a full AI risk profile and actionable, prioritized recommendations to ensure their objectives stay on track.
Q4AI is a proprietary AI Risk Intelligence and Self-Assessment platform where analysis can be performed to surface, in a proactive manner:
  • Operational
  • Ethical
  • Legal & Regulatory
  • Financial
  • Reputational and
  • Security risks
throughout a standard AI Model development process lifecycle.
Such risks are linked to data and model vulnerabilities, of innate, conceptual and/or operational nature, such as data drift, concept drift, selection and creator bias, unexpected behaviour, model “over-tuning”, various data-model “mismatches”, evasion attacks, data poisoning and others…
The user of the platform experiences the self-assessment process as a guided journey through the following aspects:
Performance
Bias-Fairness
Explainability and
Robustness – Security
which produce a collection of actionable insights represented by well-established metrics in an intuitive manner so that they may be integrated efficiently into an organisation’s governance, risk management and decision-making process. The AI Risk Intelligence platform is industry agnostic as it focuses on the AI Model and the data it utilises.

Foundation Steps for Testing AI systems

Model Performance

Αssess how effectively an AI system performs its intended tasks. This includes evaluating the system’s accuracy, and efficiency in various operational scenarios.
Selected Metrics adhering to the specifications of the ISO 4213 (Assessment of machine learning classification performance).
Categories of those metrics are:
  • Overall accuracy
  • Sensitivity Metrics
  • F-Scores
  • Error Analysis
  • Nuanced Insights

Fairness/Bias Testing

Ensuring the AI system does not exhibit biased decision-making and treats all user groups equitably
Selected metrics adhering to legislations such as the New York City Bias Law and specifications of the ISO 29119-11 (Guidelines on the testing of AI-systems):
  • Fairness Assessment
  • Advanced Disparity
  • Expert Disparity
  • Disaggregated Metrics

Transparency/ Explainability Analysis

The provision of explanations on how the decisions of the AI system are being derived and assessment of the validity, clarity and understandability of the decision-making processes
Selected approaches following the specifications of the ISO 29119-11 (Guidelines on the testing of AI-systems) and state of the art methods:
  • LIME
  • Shapley Values
  • MASHAP (own proprietary method)

Platform’s Projects

The concept of a business problem/use case, transformed into a modelled Machine Learning hypothesis, is implemented within the product’s “Project” . A project may contain a set of datasets and models that represent the specific problem’s whole modelling process where each dataset-model pairing is a milestone of the process.

Platform’s Audits

Per such pairing an Audit may be created which aims at feeding seamlessly useful evaluation metrics and insights into the AI Risk Assessment and Compliance Evaluation processes of an organisation, in a meaningful and efficient manner.

Platform’s Aspects

The model quality analysis is split into logical groups or “aspects”:
Performance
Fairness
Statistical Analysis
Robustness
Explainability
Each aspect:
  • targets a different quality aspect of a model,
  • is created as a standalone, on its own merit, “review” of that aspect, covering standard data science, engineering or compliance requirements and
  • is structured to follow the typical analysis structure and progression of a data scientist’s work in a robust and concise manner.

Platform’s Metrics

The metrics produced per analysis aspect are not constructed merely as a “phonebook” style recorded list of well-known ML metrics; instead, curated lists of metrics are displayed on every aspect based on combinations of the use case/problem parameter specs and specific user preferences, associated with both model quality KPIs and business targets.
Such preferences may include a user’s wish to focus on metrics that are easily interpreted into business actionable insights or describe more accurately the model fit to the data and its generalization potential or may be used for model selection in production
Each metric is accompanied by its definition, a simple example of how it can be used and a “diagnostic” message, rating its value against the specific analysis aspect aims, based on pre-defined thresholds.
This “literature” is created to appeal to varying types of expertise and varying levels of data science expertise so that is accessible to business and traditional analytics people as well as data scientists. It is designed to ensure the user stays on a specific concise analysis/thinking path while reviewing the results to help them stay focused and avoid confusion due to the enormity of possible different options to consider. It also attempts to assimilate a “data science buddy” or a “second pair of eyes” for the user, to the extent possible considering the nature of this type of analysis of course.

Compare Audits

Furthermore, by being able to compare the “appropriate” audits in the same view without additional effort required by the user, one may have a holistic understanding of the modelling process, progression, the key decisions that impacted the process outcome across various milestones and unique insights regarding the key data drivers behind the original problem. Ultimately, this process enables an easier, more efficient and accurate assessment of the appropriate next actions in the pursuit of the associated business targets.
The user is guided in creating these pairs appropriately and navigating through the relevant analysis aspects in the “correct” order so that
  • omparing modelling milestones outcomes makes sense both from a scientific and from a business perspective
  • useful insights may be produced into the strengths and weaknesses of the modelling process as well as the inner working of the business problem these
  • pairs attempt to describe, without additional effort required on mundane, repetitive and time-consuming tasks by the user
Going forward there will be different predefined reporting templates available, targeting various personas ranging from a junior business analyst to a senior software engineer, a seasoned data scientist or the Head of Analytics, the CTO and the CIO.
The user will also be enabled to select which diagrams or tables they would wish to save separately or into a presentation file for example and even create their own reporting template.

Main functions

Overview/my way

Achieve early functional level monitoring independently of any complex MLOps standards and processes that may or may not be implemented in an enterprise infrastructure
Support the model selection process in production, based on model optimization and/or business KPIs criteria.
Maintain a copy of each audit representing a specific modelling process of a business the user can access at any time.
Detect model and data drift by comparing all aspect analyses for similar audits over time
Develop a single source of truth for production model reporting that can enable multi-disciplinary teams to participate efficiently in governance
Key results and insights from all analysis aspects are consolidated in one overview page to create a “story” of a model’s inner workings, behaviour towards different inputs and performance that can be shared across multi-disciplinary teams
Results overview tailored to specific problems and role levels to support decision-making processes connected to regulatory compliance, revenue generation and cost management
Centralized view of all models for MLOps, ML engineers, data scientists and line of business to collaborate throughout the ML lifecycle.
Observe model health by tracking performance, drift, data integrity or model bias
Detect behavior changes in models with highly imbalanced datasets.
Highlights/shows potential risk «territories» for model performance, data drift and data integrity and rest quality aspects.
Validate model before deployment with out-of-the-box performance metrics.
Identify potential underlying reasons causing model performance or model drift issues with root cause analysis.
Improve models with actionable insights from powerful dashboards showing feature impact, correlation or distribution.

Overview

Achieve early functional level monitoring independently of any complex MLOps standards and processes that may or may not be implemented in an enterprise infrastructure
Support the model selection process in production, based on model optimization and/or business KPIs criteria.
Maintain a copy of each audit representing a specific modelling process of a business the user can access at any time.
Detect model and data drift by comparing all aspect analyses for similar audits over time
Develop a single source of truth for production model reporting that can enable multi-disciplinary teams to participate efficiently in governance
Key results and insights from all analysis aspects are consolidated in one overview page to create a “story” of a model’s inner workings, behaviour towards different inputs and performance that can be shared across multi-disciplinary teams
Results overview tailored to specific problems and role levels to support decision-making processes connected to regulatory compliance, revenue generation and cost management
Centralized view of all models for MLOps, ML engineers, data scientists and line of business to collaborate throughout the ML lifecycle.
Observe model health by tracking performance, drift, data integrity or model bias
Detect behavior changes in models with highly imbalanced datasets.
Highlights/shows potential risk «territories» for model performance, data drift and data integrity and rest quality aspects.
Validate model before deployment with out-of-the-box performance metrics.
Identify potential underlying reasons causing model performance or model drift issues with root cause analysis.
Improve models with actionable insights from powerful dashboards showing feature impact, correlation or distribution.

Model Performance

Assess “fit-for-purpose” status of models for the business problem(s) they describe, in multiple stages of the modelling process and for multiple inputs, based on their performance
A data exploratory view, displaying basic dataset statistics on its feature types and values, enables the user to identify pointers to potential dataset deficiencies – phenomena such as data drift, outlier patterns and the presence of hidden bias
User is guided intuitively and interactively through a plethora of well-established metrics analysis to an insight-enriched model performance assessment tailored to the business problem, data and technical specifications associated with it.
Multiple analysis approaches are accessed simultaneously, represented by a mix of traditional and innovative techniques and enabling the user to broaden the analysis basis as they see fit.
Evaluate model performance and ensure all models are compliant for audit purposes
Provide stakeholders with fine-grained control and visibility into models, enabling them with deep model interpretations
Reduce risks from model degradation or model bias

Fairness / Bias Testing

Ensure sustained diversity and inclusivity regulatory compliance with recurrent evaluations of how fairly a model treats the data it “learns”
User is guided intuitively and interactively through a plethora of well-established and innovative intersectional fairness metrics including metrics associated with existing laws (e.g. NY Bias Law)
They may view how the model handles any of the individual or different combos of sensitive traits across protected groups they wish to investigate and the impact on fairness standards
Support the performance optimization process without jeopardising regulatory compliance
Analyse the impact of the performance – fairness tradeoff on a modelling process by combining appropriate metrics in the same intuitive dashboard
Assess models with model and dataset fairness checks at any point in the ML lifecycle.
Increase confidence in model outcomes by detecting intersectional unfairness and algorithmic biases.
Leverage out-of-the-box fairness metrics including disparate impact, group benefit, equal opportunity and demographic parity.

Explainability

Ensure transparency and provide nuanced visibility of a model’s inner workings and behaviour using a mix of well-established and proprietary explainable AI methods
A combination of model agnostic and model specific techniques, including LIME, ShaP, partial dependence plots etc., are applied to produce human-readable explanations on individual and global dataset prediction levels.
Multiple graphic representations of these explanations enable the user to understand which features may be the strongest predictors and which predictor combinations may be the key drivers of the model’s behaviour
Explainability analysis supports the production of business actionable insights and
combined with results from the other quality aspects informs the user better in terms of anticipation of model generalisation and potential hidden bias
Enable stakeholders and regulators with human-readable explanations to understand model behavior.
Obtain global and local-level explanations of how different attributions contribute to model prediction.
Compare and contrast feature values and their impact on model prediction.

For Who

Key Users

Data Scientists and AI Engineers:

  • Responsible for developing and deploying AI models.
  • Use the tool to validate model performance, detect bias, and ensure robustness and security.

Governance, Risk, and Compliance (GRC) Teams:

  • Focus on ensuring that AI models comply with relevant regulations, such as GDPR, and ISO standards.
  • Use the tool to monitor trustworthiness and adherence to ethical guidelines.

AI Model Auditors and Quality Assurance Professionals:

  • Tasked with auditing AI systems for quality, bias, and ethical considerations.
  • Leverage the platform for standardized testing and comparative evaluations of AI models over time.

CIOs and CTOs:

  • Oversee technology strategy and ensure that AI systems align with the company’s governance and risk management policies.
  • Use high-level reporting from the tool to make strategic decisions regarding AI deployment.

Product Managers & AI System Operators:

  • Ensure that AI models are performing efficiently within products or services.
  • Use the tool to identify issues in AI deployment and improve operational performance.

Technology Consultants:

  • Specialize in providing AI solutions to clients, particularly in regulated industries.
  • Use IQ4AI as a key service offering for their clients to validate and improve AI model quality.

Features

General Features

Secure Access

Role-based access control (RBAC) and permissions management.

Containerized

Deployment of services in Docker containers.

On-Prem

Support for on-premises deployment for organizations with specific requirements.

Audit comparability

One may have a holistic understanding of the modelling process, progression, the key decisions that impacted the process outcome across various milestones and unique insights regarding the key data drivers behind the original problem

Curated lists of metrics

The metrics produced per analysis aspect are not constructed merely as a “phonebook” style recorded list of well-known ML metrics; instead, curated lists of metrics are displayed on every aspect based on combinations of the use case/problem parameter specs and specific user preferences, associated with both model quality KPIs and business targets:
Each metric is accompanied by its definition, a simple example of how it can be used and a “diagnostic” message, rating its value against the specific analysis aspect aims, based on pre-defined thresholds

Reporting

Generation of comprehensive reports for stakeholders giving the ability to compare audit results while a ML software evolves in time.
Performance Features

Performance monitoring

Track your model’s performance and accuracy with out-of-the-box metrics & compare different modelling outcomes.

Data drift

Easily monitor data drift, uncover data integrity issues, and compare data distributions between baseline and production datasets to boost model performance

Class imbalance

Detect changes in low-frequency predictions due to class imbalance

Feature quality

Uncover data integrity issues in your data pipeline, including missing feature values, data type mismatches, or range violations
Fairness Features

Algorithmic Bias Detection

Detect algorithmic bias using powerful visualizations and metrics

Intersectional Bias Detection

Discover potential bias by examining multiple dimensions simultaneously (e.g. gender, race, etc.)

Model Fairness

Obtain fairness information by comparing model outcomes and model performance for each subgroup of interest

Dataset Fairness

Check for fairness in your dataset before training your model by catching feature dependencies and ensuring your labels are balanced across subgroups

Fairness Metrics

Use out-of-the-box fairness metrics, such as disparate impact, demographic parity, equal opportunity, and group benefit, to help you increase transparency in your models
Explainability Features

Shapley values

Increase your model’s transparency and interpretability using SHAP values, enhanced by our proprietary explainability methods, MASHAP & LIME.

‘What-if’ analysis

Gain a better understanding of your model’s predictions by changing any value and studying the impact on scenario outcomes.

Global and local explanations

Understand how each feature you select contributes to the model’s predictions (global) and uncover the root cause of an individual issue (local).

Surrogate models

Improve the interpretability of your models before they go into production by using automatically generated surrogate models.

Analytics features

Dashboards

Increase business alignment and confidence in decision-making by enabling teams across the organization to glean insights

Charts

Build custom reports with the insights you need to gain deep understanding of your models and their impact on business outcomes, from monitoring metrics, feature impact, correlation, and distribution to partial dependence plot (PDP) charts

Model validation

Evaluate your model’s performance and validate it before deploying it into production

Why us

Platform’s Advantages

International standards

Designed to support users of varying expertise levels, serving as a common reference point for multidisciplinary teams.

Easily upgradeable: Moving to a new version is a seamless process

Integrates with MLOps: Integrating with CI/CD pipelines in order to contribute as a feedback loop before deployment to Production

Combined with Inspections

Benefits

Platform’s Benefits

Automates repetitive testing tasks, leading to reduced testing time and improved team alignment.

Supports continuous improvement and testing practices, resulting in resource optimization, increased release throughput, and talent satisfaction.

Offers centralized control of AI testing strategies, including test data selection, model artifacts in various formats, metric selection, and insights based on established methodologies.

Ensures consistent testing processes over time, reducing admin overhead, promoting reproducibility, and enhancing reliability.

Complements compliance requirements with risk-based approaches and naturally connects business targets with research and innovation.

Bundles

Basic

AI Testing and Auditing Tool:

Access for auditing up to 5 AI models or systems.

Min. Duration:

12 months

Inspections:

10 inspections
This bundle is suitable for organizations needing periodic reviews of a few AI systems with limited complexity.

Extended

AI Testing and Auditing Tool:

Access for auditing up to 15 AI models or systems.

Min. Duration:

12 months

Inspections:

20 inspections
This bundle is suitable for organizations needing periodic reviews of a few AI systems with limited complexity.

Advanced

AI Testing and Auditing Tool:

Unlimited access for auditing an unlimited number of AI models or systems.

Min. Duration:

12 months

Inspections:

30 inspections
+ 5 advanced inspections
Designed for large enterprises with extensive AI deployment, this bundle offers scalable auditing tools and comprehensive inspections to manage compliance and optimize AI systems effectively.
*Each inspection includes foundation analysis, in-depth results interpretation, strategic advice, and ongoing client collaboration.
**Each advanced inspection includes advanced analyses alongside with results interpretation for a selected number of systems
  •    WE' D LOVE TO HELP YOU  
  •    WE' D LOVE TO HELP YOU  
  •    WE' D LOVE TO HELP YOU  
  •   WE' D LOVE TO HELP YOU  
  •    WE' D LOVE TO HELP YOU  
  •    WE' D LOVE TO HELP YOU  

FURTHER READING