AI Act

AI Act Technical Documentation: Annex IV Guide

EU AI Act Annex IV technical documentation guide. All 9 sections, practical examples, SME simplifications, and a preparation checklist.

Legalithm TeamDecember 2, 202519 min read

ShareLinkedIn X

Read time19 min

TopicAI Act

UpdatedDec 2025

Table of contents

AI Act Technical Documentation: A Practical Guide to Annex IV Requirements

TL;DR — What you need to know about Annex IV documentation

Who: Every provider of a high-risk AI system must prepare Annex IV technical documentation before placing the system on the EU market or putting it into service.
What: Nine mandatory sections covering system description, development process, monitoring, performance metrics, risk management, lifecycle changes, harmonised standards, declaration of conformity, and post-market monitoring.
When: Documentation must exist before conformity assessment begins — not after. The high-risk obligations deadline is 2 August 2026.
How long: Expect 40–60 hours for simple systems, 60–100 hours for moderate systems, and 100–200+ hours for complex systems. Retrospective documentation takes 2–3x longer.
SME relief: Article 11(2) allows SMEs and startups to use a simplified form, though all nine sections must still be addressed.
Living document: Annex IV documentation is not a one-time deliverable — it must be updated throughout the AI system's entire lifecycle.
Conformity gate: Without complete documentation, you cannot pass conformity assessment. Without conformity assessment, you cannot legally operate.

Why technical documentation matters

Article 11 requires that technical documentation of a high-risk AI system be drawn up before that system is placed on the market or put into service, and be kept up to date throughout its lifecycle.

The documentation serves two purposes:

Demonstrate compliance with the requirements in Articles 8–15 — risk management, data governance, transparency, human oversight, accuracy, robustness, and cybersecurity.
Provide national competent authorities with all necessary information to assess the system's conformity.

This is not a formality. During conformity assessment — whether self-assessed or evaluated by a notified body — the assessor evaluates your documentation against the legal requirements. Gaps in documentation translate directly to assessment failures, delays, and in the worst case, inability to place your system on the market.

The connection to conformity assessment is direct: for systems requiring self-assessment under Annex VI, the provider's own quality management team reviews the documentation. For systems requiring a notified body under Annex VII (primarily biometric identification), an external assessor scrutinises every section. In both cases, incomplete or vague documentation is the single most common reason for assessment failure.

The nine mandatory sections of Annex IV

The following sections correspond to the structure specified in Annex IV of the AI Act. For each section, we explain what to include, what not to include, and provide a practical example based on a credit scoring AI system — one of the most common high-risk classifications under Annex III point 5(a).

Section 1: General description of the AI system

What to include:

The system's intended purpose, stated precisely
The provider's name, address, and contact details
System version number and any predecessor versions
How the system interacts with external hardware or software
Versions of relevant software or firmware, and requirements for version updates
All forms in which the system is placed on the market (SaaS, API, embedded, on-premise)
The hardware on which the system is intended to run
For product components: photographs showing external features, marking, and internal layout
A basic description of the user interface provided to the deployer

What NOT to include: Marketing copy, aspirational feature descriptions, or vague claims about the system's capabilities. Write as if explaining the system to a regulator who has never seen it before.

Credit scoring example: "CreditScore Pro v3.2 is an AI system that assesses the creditworthiness of natural persons applying for consumer loans between EUR 1,000 and EUR 50,000. It ingests applicant financial history, employment data, and transaction patterns via API integration with the deploying bank's core banking system. It outputs a numerical score (300–850) and a risk category (low/medium/high/very high). It is deployed as a cloud-hosted SaaS application running on AWS eu-west-1. The system does not make autonomous lending decisions — it provides a recommendation that a human credit officer evaluates."

Section 2: Detailed description of the elements and development process

This is the most technically demanding section. It must cover five sub-areas:

Design and development:

General logic of the system and the algorithms used
Key design choices, including rationale and assumptions made
System architecture explaining how software components build on or feed into each other
Computational resources used for development, training, testing, and validation
Third-party tools, libraries, or pre-trained models used, with version numbers

Data practices:

Training methodologies and techniques
Training data: description of datasets, data provenance, scope, main characteristics
How data was obtained and selected
Labelling procedures and data-cleaning methodologies
Data assessment in terms of suitability, biases, and potential gaps

Human oversight:

Measures designed into the system to facilitate human oversight under Article 14

Pre-determined changes:

Any pre-determined changes to the system and its performance, with details of the technical solutions to ensure continued compliance

Validation and testing:

Validation and testing procedures, including the data used and its main characteristics
Metrics used to measure accuracy, robustness, and compliance
Test logs and test reports with dates and signatures

Cybersecurity:

Technical solutions addressing Article 15 requirements
Measures against AI-specific vulnerabilities: data poisoning, model poisoning, adversarial examples

Credit scoring example — data practices section: "Training data comprises 2.4 million anonymised historical loan applications from the period 2018–2024, sourced from three EU banking partners under data sharing agreements. The dataset includes 43 features per application. Applicants' protected characteristics (gender, ethnicity, age) were excluded from model inputs but retained in a separate analysis dataset for bias testing. Labelling: each application was labelled with the actual repayment outcome (default/no-default) at 12 months. Data cleaning: 14,200 records (0.6%) were excluded due to incomplete repayment data. Bias assessment: the training dataset over-represents applicants aged 30–50 and under-represents applicants under 25. This was addressed through stratified sampling during training and post-hoc calibration of scores across age brackets."

Handling third-party and pre-trained models: If your system uses a model you did not train — a fine-tuned foundation model, a pre-trained embedding model, or a third-party classification API — you must still document the base model's characteristics, your adaptation process, and any limitations inherited from the base model. "We used Model X" is not sufficient. Request technical documentation, model cards, or data sheets from your suppliers. Document what you know, what you do not know, and what steps you have taken to address gaps. If the supplier cannot provide adequate documentation, this itself is a risk that must be documented and mitigated.

Section 3: Monitoring, functioning, and control

The system's capabilities and limitations in performance, including degrees of accuracy for specific persons or groups
Foreseeable unintended outcomes and sources of risks to health, safety, and fundamental rights
Human oversight specifications: technical measures to facilitate interpretation of outputs
Specifications for input data, as appropriate

Credit scoring example: "The system's accuracy (AUC-ROC) is 0.87 on the general test population. Known limitations: accuracy drops to 0.79 for applicants with fewer than 12 months of credit history, and to 0.81 for self-employed applicants with irregular income patterns. The system may produce unreliable scores for applicants from countries with incompatible credit reporting frameworks. Human oversight: the deployer dashboard displays the score, the top five contributing factors, and a confidence indicator. If confidence is below 70%, the system flags the case for mandatory manual review."

Section 4: Appropriateness of performance metrics

The metrics chosen to measure performance
Why these metrics are appropriate for the specific system and intended purpose
The benchmark(s) against which performance is measured

Disaggregated accuracy requirements: The AI Act expects performance metrics to be broken down across relevant subgroups — not reported only as aggregate figures. For a credit scoring system, this means reporting accuracy, false positive rates, and false negative rates disaggregated by age bracket, gender, geographic region, and employment type. A single aggregate "95% accuracy" figure is insufficient and will likely be challenged during conformity assessment.

Credit scoring example: "Primary metric: AUC-ROC, chosen because it measures discriminative ability across all classification thresholds, which is appropriate for a scoring system where deployers set their own acceptance thresholds. Secondary metrics: false positive rate (FPR) and false negative rate (FNR), reported disaggregated by age group (<25, 25–35, 35–50, 50–65, 65+), gender, and employment type (employed, self-employed, unemployed). Benchmark: the system's performance is compared against the incumbent logistic regression model used by the primary banking partner, using the same test dataset."

Section 5: Risk management system

The risk management system under Article 9
Known or foreseeable risks identified
Risk evaluation results
Risk management measures adopted and residual risk assessment
Evidence that the process was iterative and carried out throughout the development lifecycle

Credit scoring example: "Risk register includes 23 identified risks. Top-5 by severity: (1) systematic bias against young applicants with thin credit files — mitigated by age-stratified calibration and mandatory manual review for applicants under 25; (2) proxy discrimination via postal code — mitigated by excluding geographic features and testing for disparate impact; (3) data drift from changing economic conditions — mitigated by quarterly model performance monitoring and retraining triggers; (4) adversarial manipulation of input data — mitigated by input validation, anomaly detection, and transaction pattern cross-verification; (5) over-reliance by deployers on automated scores — mitigated by requiring human review for all borderline scores (550–650 range)."

Section 6: Changes throughout the lifecycle

All relevant changes made to the system throughout its lifecycle
How changes were tested and validated
Version control and change management procedures

This section must be maintained as a living record. Every model update, retraining, feature addition, or performance recalibration should be logged with the date, rationale, test results, and confirmation of continued compliance.

Section 7: Applied harmonised standards

If harmonised standards under Article 40 were applied, list them with version numbers
Where harmonised standards were not applied, describe the solutions adopted to meet Chapter III, Section 2

As of April 2026, CEN/CENELEC has published draft standards but not all have been formally harmonised. Document which standards you followed and, for areas without harmonised standards, explain how you met the legal requirements directly from the text of Articles 8–15.

Section 8: EU declaration of conformity

A copy of the EU declaration of conformity under Article 47

This section is completed at the end of the conformity assessment process. The declaration references the system, the provider, the harmonised standards or other specifications used, and the conformity assessment procedure followed.

Section 9: Post-market monitoring system

The post-market monitoring system under Article 72
How performance data is collected and analysed after deployment
Thresholds and triggers for corrective action
Incident reporting procedures under Article 73

Credit scoring example: "Performance monitoring: automated weekly calculation of AUC-ROC, FPR, and FNR on a rolling 90-day window of production decisions, disaggregated by subgroup. Alert thresholds: if any subgroup AUC-ROC drops below 0.80, an investigation is triggered within 48 hours. If aggregate AUC-ROC drops below 0.83, the system is flagged for retraining. Feedback loop: deployment partners report quarterly on actual default rates for AI-scored applications, enabling back-testing of predictions. Incident reporting: the post-market monitoring team reports serious incidents to the relevant market surveillance authority within 15 days per Article 73."

Real-world documentation scenarios

Scenario 1: Medical device AI (radiology)

A provider of an AI system that assists radiologists in detecting lung nodules on CT scans (high-risk under Annex III point 1 and potentially under the Medical Devices Regulation) faces the most demanding documentation requirements. Third-party conformity assessment by a notified body is likely required. Section 2 must include detailed descriptions of the training dataset (tens of thousands of annotated scans), inter-rater reliability of the labelling, performance broken down by nodule size, patient demographics, and scanner manufacturer. Section 5 must address the risk of missed detections (false negatives) and false alarms (false positives) with specific residual risk quantification.

Scenario 2: HR screening tool (recruitment)

An HR technology company providing an AI system that filters job applications (high-risk under Annex III point 4(a)) must document in Section 2 how the training data was curated to avoid encoding historical hiring biases. Section 3 must specify accuracy disaggregated by gender, age, ethnicity, and disability status. Section 5 must address risks including indirect discrimination via proxy features (university name as a proxy for socioeconomic status, gap years as a proxy for caregiving responsibilities). Section 9 must describe how the provider monitors whether the system's recommendations lead to disparate outcomes across protected groups in production.

Scenario 3: Critical infrastructure monitoring

A provider of an AI system that monitors electrical grid stability and triggers automated load-shedding decisions (high-risk under Annex III point 2) must document in Section 1 the system's interaction with SCADA systems and grid hardware. Section 2 must cover the simulation environments used for testing, since live grid testing is impractical. Section 5 must address cascading failure risks, including scenarios where the AI incorrectly triggers load-shedding and causes unplanned outages affecting hospitals and emergency services.

Common pitfalls and how to fix them

Pitfall 1: Writing documentation retrospectively

The AI Act requires documentation to be prepared during development, not after the system is complete. If design decisions, training data choices, and test results were not documented as they happened, reconstructing them is harder and less credible to an assessor.

Fix: Start a documentation log from day one. Record key decisions, dataset descriptions, and test results in real time. Integrate documentation tasks into your sprint or development cycle.

Pitfall 2: Treating documentation as a one-time deliverable

Annex IV documentation is a living document. It must be kept up to date throughout the AI system's lifecycle. Any significant change — a model update, a new training dataset, a change in intended purpose — triggers a documentation update.

Fix: Tie documentation updates to your CI/CD pipeline. Every release that changes model behaviour should trigger a documentation review. Use version control (Git) for documentation alongside code.

Pitfall 3: Ignoring inherited limitations from third-party components

If your system uses a pre-trained model, a third-party dataset, or an external API, you must document the limitations and risks inherited from these components. "We used GPT-4 for the embedding layer" is not sufficient.

Fix: Request technical documentation or model cards from your suppliers. Document what you know and what you do not know. If a supplier cannot provide adequate documentation, document this gap and explain your mitigation strategy.

Pitfall 4: Vague or aggregate-only accuracy claims

"The system achieves 95% accuracy" fails the Annex IV standard. You must specify:

The metric used (precision, recall, F1, AUC-ROC, etc.)
The dataset on which it was measured
The population segments for which it was measured (disaggregated performance)
Known failure modes and performance drops under specific conditions

Fix: Report performance disaggregated across all relevant subgroups. Document the datasets, conditions, and thresholds used. Be explicit about where performance degrades.

Pitfall 5: Missing cybersecurity documentation

Many teams document functional performance but neglect cybersecurity. Article 15 requires specific documentation of measures against data poisoning, model poisoning, adversarial inputs, and unauthorised access.

Fix: Conduct a threat model specific to AI vulnerabilities. Document each threat, the mitigation measures, and the residual risk. This is distinct from your general IT security posture.

Pitfall 6: No version control or audit trail

Assessors will look for evidence that documentation evolved alongside the system. A single, undated Word document with no change history is a red flag.

Fix: Store documentation in version-controlled repositories. Use timestamped commits. Maintain a changelog for each major documentation revision.

SME simplifications under Article 11(2)

Article 11(2) explicitly allows SMEs and startups to provide Annex IV elements in a simplified form. The European Commission is tasked with establishing a simplified technical documentation form tailored to the needs of small and micro enterprises.

As of April 2026, the Commission has not yet published this form. The practical approach in the interim:

Cover all nine sections — the simplification applies to depth, not scope.
Scale detail to system complexity — a simple classification tool does not need the same depth as a medical AI diagnostic system.
Focus on substance over length — assessors evaluate whether you addressed the requirements, not page counts.
Document what you genuinely know — it is better to state "we tested on a dataset of 5,000 records and found X" than to fabricate elaborate testing narratives.

Example: A five-person startup providing an AI system that prioritises customer support tickets (high-risk if used in essential services) can document its development process in 15–20 pages rather than the 80+ pages that a large medical AI provider might need — as long as every Annex IV section is addressed with honest, specific information.

Documentation as a living document

Annex IV documentation is not a deliverable you complete and archive. The AI Act requires it to be maintained throughout the system's lifecycle. Triggers for updates include:

Model retraining or fine-tuning — update Sections 2, 4, and 5.
New training data — update Sections 2 and 5.
Change in intended purpose or deployment context — update Sections 1, 3, and 5.
New identified risks — update Section 5.
Performance degradation detected — update Sections 4 and 9.
Regulatory guidance or harmonised standards published — update Section 7.
Post-market incidents — update Sections 5 and 9.

Establish a review cadence: at minimum quarterly, or triggered by any of the events above.

Practical tips on tooling and version control

Documentation that lives in disconnected Word documents across email threads will not survive a conformity assessment. Practical approaches used by early adopters:

Docs-as-code: Store documentation in Markdown or reStructuredText alongside your codebase, versioned in Git. Every documentation change is a commit with a timestamp and author.
Structured templates: Use a consistent template mirroring the nine Annex IV sections. This ensures completeness and makes assessor review straightforward.
Automated data capture: Pull training metadata, test results, and performance metrics directly from your ML pipeline into documentation templates. Tools like MLflow, Weights & Biases, or DVC can automate much of Section 2 and Section 4.
Review workflows: Require sign-off on documentation changes, similar to code review. This creates an audit trail showing who approved what and when.
Single source of truth: Avoid duplicating information across systems. If your risk register lives in a GRC tool, reference it from the documentation rather than copying it.

Preparation checklist

Use this to audit your readiness before starting formal documentation:

System description and intended purpose defined precisely
Architecture diagrams and component inventory prepared
All third-party components identified with supplier documentation obtained
Training data sources, selection criteria, and cleaning methods documented
Bias assessment of training and testing data completed with disaggregated results
Human oversight measures specified and tested
Accuracy metrics defined with disaggregated performance data across relevant subgroups
Cybersecurity threat model (AI-specific) completed with mitigations documented
Risk management process documented with iteration evidence across the development lifecycle
Test plans, test results, and test reports archived with dates and signatures
Post-market monitoring plan drafted with thresholds and triggers
Change management and version control procedures defined
Documentation stored in version-controlled repository with audit trail
SME simplification applicability assessed (if relevant)

Time and resource estimates by system complexity

System complexity	Examples	Estimated hours	Team involvement
Simple	Single-purpose classifier, limited dataset, clear use case	40–60 hours	1 engineer + 1 compliance lead
Moderate	Multi-feature model, larger training data, multiple deployment contexts	60–100 hours	2–3 engineers + data scientist + compliance lead
Complex	Foundation model, high-risk domain (medical/financial), multiple downstream uses	100–200+ hours	Cross-functional team, external legal review
Retrospective (any complexity)	Documentation started after development completed	2–3x the above	Additional time for evidence reconstruction

These figures assume design decisions were documented from the start. Retrospective documentation — reconstructing decisions, test results, and data provenance after the fact — is consistently the most expensive and error-prone path.

Connection to conformity assessment

Technical documentation is not an end in itself — it is the primary input to conformity assessment. The relationship is direct:

Self-assessment (Annex VI): Your internal quality management system reviews the documentation against Articles 8–15. If the documentation is incomplete, the self-assessment cannot pass.
Notified body assessment (Annex VII): The notified body examines your documentation in detail. Expect questions, requests for clarification, and follow-up audits. The quality of your documentation determines the speed and cost of the assessment.
Declaration of conformity (Article 47): You cannot sign the declaration without a completed conformity assessment, and you cannot complete conformity assessment without complete documentation.

Next steps

Classify your AI system to confirm whether Annex IV documentation is required.
Review the full Annex IV text for the exact legal requirements.
Use the checklist above to audit your current documentation gaps.
Start with Section 1 (general description) and Section 2 (development process) — these are the most time-intensive.
Review the full compliance checklist to see how documentation fits into the broader compliance programme.

Run the free AI Act assessment to confirm your system's risk classification and documentation obligations.

For the full legal text, see the complete AI Act guide.

Frequently asked questions

How detailed does Annex IV documentation need to be?

Detailed enough for an assessor — whether internal or a notified body — to verify that your system meets every requirement in Articles 8–15 without needing to ask you supplementary questions. The standard is not a page count; it is completeness and specificity. A 30-page document that addresses every section with concrete evidence is better than a 100-page document that uses generic language. The key test: could a qualified assessor who has never seen your system understand how it works, what risks it presents, and how you mitigated them, solely from reading the documentation?

Can I reuse documentation from ISO or other frameworks?

Partially. Existing documentation from ISO 42001 (AI management systems), ISO 27001 (information security), or IEC 62304 (medical device software) can provide building blocks, but none of these standards maps directly to all nine Annex IV sections. Use existing materials where they address the same topics, but be prepared to fill gaps — particularly around AI-specific requirements like disaggregated performance metrics, bias assessments, and AI-specific cybersecurity threats (data poisoning, adversarial examples).

Document what the supplier has provided (model card, data sheet, performance benchmarks), what you requested but did not receive, and how you addressed the resulting documentation gaps. Conduct your own evaluation of the model's performance in your deployment context. Document the inherited risks and your mitigation strategy. A conformity assessor will evaluate whether your approach is reasonable given the information available — but a complete absence of supplier documentation is a significant risk factor that must be explicitly addressed.

Does the documentation need to be in a specific language?

The documentation must be drawn up in an official language of the Member State where the system is placed on the market or put into service. In practice, English is widely accepted by market surveillance authorities across the EU, but confirm with the relevant national authority. If you operate in multiple Member States, you may need translations of key sections.

How often must Annex IV documentation be updated?

There is no fixed schedule in the AI Act. The requirement is that documentation must be "kept up to date" throughout the system's lifecycle. In practice, updates should be triggered by any material change to the system (retraining, new data, new deployment context, identified incidents), any new risk information, and at regular review intervals (quarterly is a reasonable baseline). Every update should be version-controlled with a clear changelog.

What are the penalties for inadequate technical documentation?

Inadequate documentation of high-risk AI systems falls under the general high-risk violation category, carrying fines of up to EUR 15 million or 3% of global annual turnover, whichever is higher. For SMEs, the lower amount applies. Beyond fines, the practical consequence is that you cannot complete conformity assessment, which means you cannot legally place the system on the EU market. See the penalties and fines guide for the full breakdown.

Legalithm is an AI-assisted compliance workflow tool — not legal advice. Final compliance decisions should be reviewed by qualified legal counsel.

AI Act

Technical Documentation

Annex IV

Article 11

High-Risk

Template

Conformity Assessment

Check your AI system's compliance

Free assessment — no signup required. Get your risk classification in minutes.

Run free assessment

Agentic AI Governance and Compliance

Complete guide to agentic AI governance. Singapore framework, EU AI Act application to AI agents, accountability gaps, technical controls, and enterprise compliance.

EU AI Act Compliance Software Tools Compared (2026)

Objective comparison of EU AI Act compliance tools in 2026. Covers GRC platforms, AI governance, open-source scanners, and workflow tools with pricing and criteria.

CE Marking and EU Database for AI Systems

Guide to CE marking requirements and EU database registration for AI systems. Article 48, Article 49, Annex VIII, conformity declaration, and market access.