MONTHLY INSIDER
ISO/IEC TR 29119-11: A Technical Report Providing Guidelines for Testing AI-Based Systems
27/08/2024
11 MIN READ /
Today, software permeates every aspect of life, and artificial intelligence (AI) is becoming increasingly integral to business operations. As AI systems grow in complexity and ubiquity, ensuring their reliability, safety, and effectiveness is paramount.
This is where ISO/IEC TR 29119-11 comes into play: a standard dedicated to testing AI-based systems, offering numerous benefits for businesses via a structured and universally recognized approach to AI software testing.
In full disclosure, this is one of our ‘favourite’ standards related to AI systems, chosen by our team a few years ago as the foundation for code4thought’s proprietary AI Quality Testing platform.
Overview of ISO/IEC TR 29119-11
ISO/IEC TR 29119-11 is actually a technical report and part of the broader ISO/IEC 29119 series of standards, which provides guidelines and best practices for software testing across various phases of the software development lifecycle.
TR 29119-11 focuses on and addresses the unique challenges associated with AI-based systems testing. Unlike traditional software, AI systems learn and evolve continuously, making their behavior less predictable and more complex to test. ISO/IEC TR 29119-11 provides guidelines and best practices to ensure these systems perform as intended and mitigate all associated risks.
The inception of ISO/IEC TR 29119-11 was rooted in late 2020 in the growing need for standardized testing methodologies for AI systems. As AI technologies began to diffuse through various industries – from self-driving cars and smart vacuums to checkout-free grocery shopping and machine learning for healthcare – stakeholders recognized the absence of comprehensive testing standards.
This gap led to collaborative efforts among international experts in AI and software testing, culminating in the development of ISO/IEC TR 29119-11, which forms a consensus on the best practices required to ensure the quality and reliability of AI systems.
ISO/IEC TR 29119-11 is meticulously structured to cover all facets of AI system testing. Its key components include:
- Scope: Outlines its purpose and its applicability to different types of AI systems.
- Concepts: Defines key terms and concepts to ensure a common understanding among practitioners.
- Planning: Provides guidance on planning tests for AI systems, considering factors such as system complexity, learning mechanisms, and operational environment.
- Design: Discusses methodologies for designing effective tests, including the selection of appropriate test data and scenarios.
- Execution: Covers the execution of tests, including monitoring system behavior, logging results, and handling anomalies.
- Evaluation and Reporting: Offers guidelines for evaluating test results and reporting findings to stakeholders.
- Further Considerations: Addresses the ethical and legal implications of testing AI systems, emphasizing the need for transparency and accountability.
Implementation of the Standard
Implementing ISO/IEC TR 29119-11 requires a systematic approach. The process begins with a gap analysis, where current testing practices are assessed against the standard to identify gaps and areas for improvement. Following this, and to maximize the standard’s guidelines efficacy, educating the testing team and relevant stakeholders on the standard’s requirements and best practices is crucial to ensure everyone involved is aware of the necessary steps.
The next steps are developing comprehensive test plans and designing test cases that align with the standard’s regulations. Appropriate tools for test automation, data management, and result analysis are then identified and deployed to facilitate the testing process.
At this point we need to underline the fact that the standard merely provides guidance on what aspects to test and indicatively provides ways/approaches for doing those tests. It is up to the teams/organization to decide the “how”. For instance, in the case of Bias Testing, the standard suggests the use of expert reviews of datasets, which can be time-consuming and an error-prone process. In order to implement this test, our team at code4thought has automated the task by using industry-accepted metrics such as the Disparate Impact Ratio.
During the actual testing phase, tests are executed rigorously, AI system behavior is monitored, and outcomes are meticulously logged for further analysis. The final phase involves evaluating the test results, documenting findings, and communicating them to stakeholders to ensure transparency and informed decision-making.
Benefits and Challenges
The benefits of ISO/IEC TR 29119-11 are numerous. Enhanced quality assurance ensures that AI systems are thoroughly tested, leading to improved reliability and performance. The standard helps identify and mitigate potential risks associated with AI system deployment, providing a standardized approach to testing that facilitates consistency and repeatability.
Furthermore, adhering to the standard builds confidence among stakeholders, including customers, regulators, and partners, by demonstrating compliance with recognized guidelines. Moreover, it promotes adequate and trustworthy testing practices, ensuring AI systems operate transparently and responsibly.
However, there are challenges associated with implementing the standard. AI systems’ intrinsic complexity can make testing challenging. Additionally, the process can be resource-intensive, requiring significant time and effort. Keeping testing practices up-to-date with the latest advancements can also be demanding as AI technologies evolve.
For example, testing Generative AI presents unique challenges due to its non-deterministic nature and the vast range of possible outputs. Evaluating the quality, consistency, and appropriateness of generated content requires complex metrics and human judgment, while ensuring the AI’s responses remain ethical and unbiased adds another layer of complexity to the testing process.
Finally, another consideration is handling large volumes of test data and ensuring its quality. Effective implementation also necessitates interdisciplinary expertise in both AI and software testing, requiring collaboration among various stakeholders.
Adopting the Standard: How and Why
Adopting ISO/IEC TR 29119-11 involves several strategies. Securing commitment from senior leadership is essential to driving the adoption process and allocating the necessary time and resources. The standard’s guidelines should be tailored to fit the specific needs and context of the organization, and a culture of continuous improvement should be fostered, with regular reviews and refinements of testing practices.
Encouraging collaboration between AI developers, testers, and other stakeholders is crucial, and ongoing training should be provided to keep skills current. Starting with pilot projects can demonstrate the value of the standard and refine implementation strategies before scaling up.
Today, the pressure for businesses to adopt ISO/IEC TR 29119-11 is more evident than ever mainly due to the EU AI Act. Adhering to recognized standards facilitates compliance with the Act’s requirements and helps avoid legal issues. Demonstrating robust and consistent testing practices can differentiate a company from competitors, providing a significant comparative advantage.
Additionally, ensuring the quality, reliability and safety of AI systems enhances customer trust and satisfaction, while systematic testing can identify inefficiencies and areas for improvement, leading to operational gains. Lastly, businesses can confidently innovate and deploy new AI-driven solutions for their customers and their needs by ensuring the high quality of their systems.
code4thought Predicts and Shapes the Testing Future of AI-Based Systems
As AI evolves and is adopted extensively, the importance of standardized testing practices will only grow. Businesses that adopt the ISO/IEC TR 29119-11 standard early will be better positioned to navigate the complexities of AI, ensuring their systems are reliable, trustworthy, and compliant.
ISO/IEC TR 29119-11 complements other AI-related ISO standards by focusing specifically on testing aspects of AI systems. While ISO 42001 provides a general framework for AI management systems, ISO 25059 addresses quality requirements for AI systems, and ISO 23894 offers risk management guidelines for AI, TR 29119-11 delves into the practical aspects of testing AI systems. It builds upon the foundational principles established in these standards, offering detailed guidance on test design, execution, and evaluation tailored to the unique challenges of AI. This technical report addresses the non-deterministic nature of AI outputs, the need for extensive data validation, and the importance of testing for bias and ethical considerations. By doing so, it fills a crucial gap in the standardization landscape, providing testers and developers with concrete methodologies to ensure AI systems meet the quality, reliability, and ethical standards outlined in the broader ISO framework.
The AI Quality Testing & Audit solution offered by code4thought is based on a tried-and-true, fact-based methodology built around ISO/IEC TR 29119-11. Our own proprietary AI Quality Testing platform enables the analysis of any type of data and AI models and bundled with the expertise of our professionals, it helps reduce any dangers or mistrust associated with AI-based systems. Contact us to learn how we can support you.