Annexes — referenced by Article 51Article Annex XIII

Annex XIII: Criteria for Classification of GPAI Models with Systemic Risk

In effect since 2 Aug 20255 min readEUR-Lex verified Apr 2026

Annex XIII lists the criteria for classifying a GPAI model as having systemic risk under Article 51. It includes both quantitative indicators (notably the 10^25 FLOPs cumulative compute threshold that creates a rebuttable presumption) and qualitative criteria the AI Office considers when assessing high-impact capabilities. The Commission may update these criteria via delegated acts under Article 97 as technology evolves.

Who does this apply to?

  • -Providers of GPAI models assessing whether they meet systemic-risk thresholds
  • -The AI Office and scientific panel for AI (applying and monitoring the criteria)
  • -Downstream providers integrating GPAI models who need to know systemic-risk status
  • -Compliance teams tracking threshold changes via Commission delegated acts

Scenarios

A new frontier model is trained with cumulative compute exceeding 10^25 floating-point operations.

Presumed to have systemic risk under Annex XIII / Article 51(2). Provider must notify the AI Office and comply with Article 55.
Ref. Annex XIII + Art. 51(2)

A model is below 10^25 FLOPs but achieves state-of-the-art scores on reasoning and code generation benchmarks with broad deployment across the EU.

The AI Office may still designate systemic risk based on qualitative criteria (high-impact capabilities, reach, number of users) even without crossing the compute threshold.
Ref. Annex XIII + Art. 51(1)(b)

The Commission adopts a delegated act lowering the FLOPs threshold to 10^24 after advances in training efficiency.

Providers must re-assess against the updated criteria; models previously below threshold may now be captured.
Ref. Art. 97 + Annex XIII

What Annex XIII covers (in plain terms)

Annex XIII provides the assessment framework the AI Office uses to determine whether a GPAI model has high-impact capabilities and should be classified as systemic risk. The criteria include:

  • Number of parameters of the model
  • Quality and size of the dataset used for training
  • Amount of computation used for training the model (measured in FLOPs) — including the 10^25 FLOPs presumption threshold
  • Input and output modalities of the model (text, image, video, code, etc.)
  • Benchmarks and evaluations of the model, including state-of-the-art performance
  • Number of registered users or reach
  • Any other indicator of high-impact capabilities

The 10^25 FLOPs threshold creates a rebuttable presumption: models above it are presumed systemic risk, but providers may argue otherwise. Models below it can still be designated if other criteria demonstrate equivalent capabilities.

The 10^25 FLOPs threshold — context

The 10^25 floating-point operations threshold was calibrated to frontier models at the time of legislative negotiations (roughly GPT-4-class training compute). Key considerations:

  • It is a rebuttable presumption, not a hard boundary
  • The Commission can update the threshold via delegated act as training efficiency evolves
  • Distillation, data quality improvements, and architecture advances may reduce the compute needed for equivalent capabilities—the threshold may under-capture risk over time
  • The AI Office can designate models below the threshold based on qualitative criteria

Providers should track both their absolute FLOPs and benchmark performance to assess classification risk.

How Annex XIII connects to the rest of the Act

  • Article 51 — Uses Annex XIII criteria to define systemic risk; paragraph (2) establishes the FLOPs presumption.
  • Article 52Procedure for classification (notification, designation, rebuttal) based on Annex XIII assessment.
  • Article 55Additional obligations triggered by systemic-risk classification.
  • Annex XI Section 2 — Documentation requirements triggered by classification (evaluation strategies, red teaming, architecture).
  • Article 97Delegated acts allowing the Commission to update Annex XIII criteria and thresholds.
  • Article 90Scientific panel that may issue qualified alerts based on Annex XIII analysis.
  • Article 113Application dates (Chapter V from 2 August 2025).

Recitals (preamble) on EUR-Lex

The recitals in the same consolidated AI Act on EUR-Lex contextualise the 10^25 FLOPs calibration, the rebuttable presumption design, and the Commission's power to evolve criteria. Use the official preamble on EUR-Lexdo not rely on unofficial recital lists without checking sequence and wording against the authentic text.

Compliance checklist

  • Calculate and document cumulative training compute (FLOPs) for each GPAI model release.
  • Track benchmark performance against state-of-the-art metrics across modalities.
  • Monitor Commission delegated acts for threshold updates to Annex XIII.
  • If above 10^25 FLOPs: prepare notification to the AI Office under Article 52.
  • If below threshold but with broad deployment: assess qualitative criteria proactively.
  • Document rebuttal arguments if you believe systemic-risk classification is not warranted despite threshold crossing.
  • Track the AI Office's published list of systemic-risk models for upstream dependencies.

Assess your GPAI model against Annex XIII criteria—free assessment.

Start Free Assessment

Related annexes

  • Annex XI — GPAI technical documentation (Section 2 triggered by systemic-risk classification)

Frequently asked questions

Is the 10^25 FLOPs threshold permanent?

No. The Commission can update it via delegated act under Article 97 based on evolving technological benchmarks and state of the art.

Can a model below 10^25 FLOPs still be systemic risk?

Yes. Article 51(1)(b) allows the AI Office to designate based on equivalent capabilities or impact using qualitative Annex XIII criteria, even if the compute threshold is not crossed.

How do I calculate FLOPs?

FLOPs typically refers to the total number of floating-point operations used during training. For transformer models, common approximations exist based on parameter count, dataset size, and training steps. Document your methodology.

Does fine-tuning compute count?

The Annex refers to 'cumulative amount of computation used for training.' Whether fine-tuning adds to the base model's FLOPs depends on interpretation—document your position and monitor AI Office guidance.