AI-Assisted Accessibility Remediation with Mandatory Verification Gates

Large language models can draft alt text, ARIA labels, and heading-structure corrections far faster than a human can type them — but a model’s output is a suggestion, never a verified fix. This guide is part of Automated Remediation & Accessibility Fixing Patterns, and its core thesis is simple: treat every AI-proposed change as untrusted input that must pass an automated re-scan and a human review before it is allowed to merge. The danger is not that a model is wrong occasionally; it is that a model is confidently wrong and an unguarded pipeline will ship that confidence straight to production.

Key implementation targets:

Wrap any model-generated fix in a re-scan gate using axe-core before it can advance.
Require a human approval step — AI proposals are never auto-merged.
Validate ARIA correctness against WCAG 2.2 SC 4.1.2 and reject regressions that add new violations.
Record provenance so every accepted suggestion is traceable to the prompt and model that produced it.

Every AI-proposed fix must clear an automated re-scan and a human review; either gate can bounce it back to the model.

The Problem: Confident Output Is Not Verified Output

A model asked to label an icon button might return aria-label="Submit" for a control that actually deletes a record. It might invent alt text describing an image it never saw, or restructure headings in a way that reads plausibly but breaks the document outline. None of these failures are detectable by reading the diff alone — they require re-measuring the page against the same rules that flagged the original defect. The only safe architecture is one where the model never touches the merge button: it produces a candidate, and an automated tool plus a human decide whether the candidate is real.

This matters more for accessibility than for most code, because the failure mode is silent. A broken aria-label does not throw an exception or turn a test red on its own — it simply misinforms an assistive-technology user. That is exactly why the re-scan gate and the human gate are non-negotiable.

Key Implementation Targets

The pipeline has four moving parts: a generation step that prompts a model with the violating node and its context, a constraint layer that rejects malformed output before it ever reaches a browser, a re-scan that proves the fix removes the original violation and introduces no new one, and a human review surface where a person approves or rejects with full context. Skipping any one of these turns “AI-assisted” into “AI-unsupervised.”

Prerequisites

axe-core (axe-core >= 4.9) installed and configured for your framework
A test runner that can mount the patched component or serve the patched page
Node.js >= 20 for the validation scripts
A model endpoint reachable from CI via a secret-stored API key (provider-agnostic)
Branch protection so AI-authored branches cannot self-merge

1. Generate a Candidate, Never a Commit

The generation step must output structured data, not a patch applied in place. Prompt the model with the failing element, its accessible context, and an explicit instruction set, then capture the proposal as JSON for downstream validation.

// suggest.js — produces a candidate fix, applies nothing
import fs from "node:fs";

async function suggestFix(violation) {
  // violation: { html, target, ruleId, contextText }
  const prompt = [
    "You propose a single accessibility fix. Output JSON only:",
    '{ "attribute": "aria-label", "value": "..." }',
    "Rules: <= 80 chars, no leading/trailing space, do not include the",
    'word "button" in an aria-label, language must match the page.',
    `Element: ${violation.html}`,
    `Nearby text: ${violation.contextText}`,
  ].join("\n");

  const res = await callModel(prompt); // provider-agnostic wrapper
  const candidate = JSON.parse(res.text); // throws on malformed output
  candidate.ruleId = violation.ruleId;
  candidate.target = violation.target;
  return candidate; // returned, not written to source
}

export async function suggestAll(violations) {
  const out = [];
  for (const v of violations) out.push(await suggestFix(v));
  fs.writeFileSync("candidates.json", JSON.stringify(out, null, 2));
  return out;
}

The deeper validation of these accessible-name proposals — length, redundant-role wording, and WCAG 2.2 SC 2.5.3 label-in-name — is covered in Using LLMs to Suggest ARIA Labels Safely.

2. Apply Candidates to a Throwaway Build

Candidates are applied to an ephemeral working copy, never to the source branch directly. This lets the re-scan run against a real rendered DOM without committing anything a human has not seen.

// apply.js — writes candidates into a scratch checkout only
import { JSDOM } from "jsdom";
import fs from "node:fs";

export function applyCandidates(html, candidates) {
  const dom = new JSDOM(html);
  const doc = dom.window.document;
  for (const c of candidates) {
    const el = doc.querySelector(c.target);
    if (!el) continue; // stale selector: skip, do not guess
    el.setAttribute(c.attribute, c.value);
  }
  return dom.serialize();
}

const html = fs.readFileSync("dist/index.html", "utf8");
const candidates = JSON.parse(fs.readFileSync("candidates.json", "utf8"));
fs.writeFileSync("dist/index.patched.html", applyCandidates(html, candidates));

3. The Re-Scan Gate

This is the automated half of the safety contract. The patched output is re-scanned and compared to the pre-fix baseline. The fix is only allowed forward if the original violation is gone and the new violation count has not risen. The full CI implementation of this gate — including ARIA attribute-validity checks — lives in Validating AI-Generated ARIA Fixes in CI.

// gate.js — fails the build unless the fix is a net improvement
import { chromium } from "playwright";
import AxeBuilder from "@axe-core/playwright";

async function scan(url) {
  const browser = await chromium.launch();
  const page = await browser.newPage();
  await page.goto(url);
  const { violations } = await new AxeBuilder({ page })
    .withTags(["wcag2a", "wcag2aa", "wcag22aa"])
    .analyze();
  await browser.close();
  return violations;
}

const before = await scan("http://localhost:3000/index.html");
const after = await scan("http://localhost:3000/index.patched.html");

const beforeCount = before.reduce((n, v) => n + v.nodes.length, 0);
const afterCount = after.reduce((n, v) => n + v.nodes.length, 0);

if (afterCount > beforeCount) {
  console.error(`AI fix added violations: ${beforeCount} -> ${afterCount}`);
  process.exit(1); // reject and loop back to suggestion stage
}
console.log(`Re-scan passed: ${beforeCount} -> ${afterCount}`);

4. The Human Review Gate

The re-scan proves a fix is not worse; it cannot prove the label is correct. Only a human knows that the delete control should not say “Submit.” Surface each accepted candidate as a normal pull-request diff with the prompt, the model, and the before/after counts attached, and require a reviewer approval before the branch is mergeable. Never configure the AI’s service account as an auto-merge actor.

Pipeline Integration

Run generation and the re-scan gate as a single CI job that emits candidates.json and a JUnit-style summary artifact. The gate’s process.exit(1) blocks the job; the human gate is a required reviewer on the branch. Upload the before/after axe JSON with actions/upload-artifact so reviewers see exactly what changed. Wire the job as a required check, following the patterns in Pull Request Gating & Branch Policies.

Troubleshooting & Flaky-Test Mitigation

Model output that fails JSON.parse should be retried with a stricter prompt, then surfaced as a failure rather than silently dropped. If the re-scan flickers between counts, the page is scanning before hydration — add waitForLoadState('networkidle') before analyze(). Cache the pre-fix baseline scan per commit so the comparison is stable across reruns.

Common Pitfalls

Auto-merging AI branches. The single most dangerous misconfiguration. AI suggestions must always pass a human gate.
Trusting the diff over the re-scan. A plausible-looking label can still be semantically wrong; only the re-scan and a human catch that.
Comparing total counts without per-rule detail. A fix can remove one violation while adding another of equal count — compare rule IDs, not just totals.
Losing provenance. Without the originating prompt and model recorded, an incorrect accepted fix is impossible to audit later.

FAQ

Can I let the AI commit directly if the re-scan passes? No. The re-scan only confirms no automated rule regressed. Roughly a third of accessibility requirements are not machine-verifiable, so a human must confirm the label’s meaning. Auto-commit defeats the entire safety model.

Which model should I use? This pipeline is provider-agnostic by design — the callModel wrapper isolates the vendor. The gates matter far more than the model: a weaker model behind strong gates is safer than a strong model with none.

What if the model proposes a structurally large change like heading restructuring? Constrain proposals to one attribute or one element per candidate. Large restructures should be split into many small candidates, each independently re-scanned and reviewed, so a single bad suggestion cannot hide inside a big diff.

Automated Remediation & Accessibility Fixing Patterns — the parent section covering all fixing strategies.
Validating AI-Generated ARIA Fixes in CI — the runnable CI gate that enforces ARIA validity.
Using LLMs to Suggest ARIA Labels Safely — constraining accessible-name output before review.

# AI-Assisted Accessibility Remediation with Mandatory Verification Gates

# The Problem: Confident Output Is Not Verified Output

# Key Implementation Targets

# Prerequisites

# 1. Generate a Candidate, Never a Commit

# 2. Apply Candidates to a Throwaway Build

# 3. The Re-Scan Gate

# 4. The Human Review Gate

# Pipeline Integration

# Troubleshooting & Flaky-Test Mitigation

# Common Pitfalls

# FAQ

# Related

In This Section