Validating AI-Generated ARIA Fixes in CI

When a model proposes an ARIA patch, the diff tells you what changed but not whether the change is correct or safe. This page — part of AI-Assisted Accessibility Remediation — gives you a concrete CI step that takes an AI-proposed patch, re-runs axe-core, confirms it introduces no new violations, verifies that every ARIA role and attribute is actually valid, and rejects any fix that makes the accessible name harder to compute. The model’s output is treated as a hostile input until it clears all four checks.

Baseline controls this page enforces:

  • Re-scan the patched DOM and block if total or per-rule violation counts rise.
  • Validate role/attribute combinations against allowed-attribute rules (WCAG 2.2 SC 4.1.2).
  • Reject patches that increase accessible-name ambiguity.
  • Emit a non-zero exit code so the gate blocks the pull request.
Four-check ARIA patch validation gate An AI patch enters a gate with four sequential checks. Each check can fail and reject the patch. Only passing all four reaches merge. AI ARIA patch 1. axe re-scan runs clean 2. no new violations 3. attr valid SC 4.1.2 4. name not ambiguous Pass: merge exit 0 Any check fails -> exit 1, reject patch
Four sequential checks gate every AI ARIA patch; a single failure exits non-zero and rejects the change.

Root Cause / Context

Standard accessibility scanners answer one question — “does this DOM violate a rule?” — but they do not answer “is this new attribute a regression relative to the previous DOM?” An AI patch can satisfy axe-core’s static rule set while still being wrong: it might add aria-label to an element that already has aria-labelledby, producing two competing name sources, or assign a role whose required attributes are absent. The default tool will not flag the delta in name-computation ambiguity, and it trusts that any role/aria-* pairing you wrote was intentional. Against AI output, neither assumption holds, so you need a differential gate layered on top of the raw scan.

The second gap is attribute validity. WCAG 2.2 SC 4.1.2 (Name, Role, Value) requires that roles and states be valid and correctly applied. A model can hallucinate an attribute like aria-labeled (a misspelling) or apply aria-expanded to a non-expandable element. axe-core has rules for many of these, but a dedicated allowed-attribute check gives you a precise, fail-fast verdict scoped to exactly the nodes the AI touched.

Configuration

The validator loads the patched nodes, runs a fresh axe scan, diffs the result against a committed baseline, and independently verifies the ARIA attributes the model added. It exits non-zero on any failure.

// validate-aria-fix.js
import { chromium } from "playwright";
import AxeBuilder from "@axe-core/playwright";
import fs from "node:fs";

// Minimal allowed-attribute map for roles the AI is permitted to touch.
// Mirrors the ARIA-in-HTML allowed-attr concept (WCAG 2.2 SC 4.1.2).
const ROLE_ALLOWED = {
  button: ["aria-label", "aria-labelledby", "aria-pressed", "aria-expanded", "aria-disabled"],
  link: ["aria-label", "aria-labelledby", "aria-current"],
  checkbox: ["aria-label", "aria-labelledby", "aria-checked", "aria-required"],
  img: ["aria-label", "aria-labelledby"],
};
const GLOBAL_ARIA = ["aria-hidden", "aria-describedby"]; // allowed on any role

function attrIsValid(role, attr) {
  if (GLOBAL_ARIA.includes(attr)) return true;
  return (ROLE_ALLOWED[role] || []).includes(attr);
}

async function scan(url) {
  const browser = await chromium.launch();
  const page = await browser.newPage();
  await page.goto(url);
  await page.waitForLoadState("networkidle"); // avoid hydration races
  const { violations } = await new AxeBuilder({ page })
    .withTags(["wcag2a", "wcag2aa", "wcag22aa"])
    .analyze();
  await browser.close();
  return violations;
}

function countByRule(violations) {
  const map = {};
  for (const v of violations) map[v.id] = v.nodes.length;
  return map;
}

function fail(msg) {
  console.error(`REJECT: ${msg}`);
  process.exit(1); // blocks the CI job
}

// patch.json: [{ target, role, attribute, value, hasLabelledby }]
const patch = JSON.parse(fs.readFileSync("patch.json", "utf8"));
const baseline = countByRule(JSON.parse(fs.readFileSync("baseline-axe.json", "utf8")));

// Check 1 + 2: re-scan and per-rule regression diff.
const after = countByRule(await scan("http://localhost:3000/index.patched.html"));
for (const ruleId of Object.keys(after)) {
  const before = baseline[ruleId] || 0;
  if (after[ruleId] > before) {
    fail(`rule ${ruleId} increased ${before} -> ${after[ruleId]}`);
  }
}

// Check 3: ARIA attribute validity (SC 4.1.2).
for (const p of patch) {
  if (p.attribute.startsWith("aria-") || p.attribute === "role") {
    if (p.attribute !== "role" && !attrIsValid(p.role, p.attribute)) {
      fail(`${p.attribute} not allowed on role="${p.role}" at ${p.target}`);
    }
  }
}

// Check 4: name-computation ambiguity. aria-label alongside an existing
// aria-labelledby creates two competing name sources.
for (const p of patch) {
  if (p.attribute === "aria-label" && p.hasLabelledby) {
    fail(`aria-label added where aria-labelledby exists at ${p.target}`);
  }
}

console.log("All checks passed.");
process.exit(0);

Validation

Run the gate locally against a patch that is deliberately wrong to prove it rejects. A patch adding aria-label to a node that already carries aria-labelledby should fail check 4:

node validate-aria-fix.js
# REJECT: aria-label added where aria-labelledby exists at #save-btn
# echo $?  ->  1

A clean patch prints All checks passed. and exits 0, allowing the job to proceed to human review.

GitHub Actions Step

- name: Validate AI ARIA patch
  run: |
    npx serve dist --listen 3000 &   # serve patched build
    npx wait-on http://localhost:3000  # block until reachable
    node validate-aria-fix.js          # exits 1 on any rejected check
- uses: actions/upload-artifact@v4
  if: always()
  with:
    name: aria-validation
    path: |
      patch.json
      baseline-axe.json

Edge Cases & Conditional Guards

  • aria-busy regions: if the patched node sits inside an aria-busy="true" container, wait for it to flip to false before scanning, or the re-scan reads a transitional DOM.
  • Virtualized lists: a patch targeting a row that is unmounted at scan time produces a stale selector — treat a missing target as a rejection, never a silent pass.
  • Async name sources: if aria-labelledby points to a node rendered later, run check 4 after networkidle so the labelledby reference is resolvable.

Pipeline Impact

The validator’s process.exit(1) fails the CI job, which — when set as a required status check — blocks the merge. The uploaded patch.json and baseline artifacts give the human reviewer the exact before/after evidence described in AI-Assisted Accessibility Remediation. Pair this gate with severity-aware thresholds from Progressive Threshold Management so a single new minor violation does not necessarily block an otherwise large net improvement, while criticals always do.

Common Pitfalls

  • Diffing totals instead of per-rule counts. A patch can remove one rule’s violations and add another’s at equal count; only a per-rule diff catches it.
  • Skipping the attribute-validity check because axe “would catch it.” axe covers many cases but not every role/attribute pairing your AI might invent — fail fast on an explicit allow-list.
  • Allowing stale selectors to pass. If the model’s target no longer exists, the absence of a violation is meaningless; reject it.
  • Forgetting the hydration wait. Scanning before networkidle produces unstable counts that make the regression diff flaky.

FAQ

Why diff against a committed baseline instead of scanning only the patched node? Because a fix can have non-local effects — changing a role can alter the accessible name of a parent or sibling. A full-page re-scan compared to a baseline catches collateral regressions that a node-scoped scan would miss.

Isn’t an allow-list of roles too restrictive? It is intentionally narrow. AI patches should only touch a small, well-understood set of interactive primitives. Anything outside the list is escalated to a human rather than auto-approved, which is the safe default for untrusted output.

How do I handle a legitimate fix that does add a violation elsewhere? That is exactly when the human gate matters. The CI step rejects it automatically; a reviewer can then split the patch or accept a documented, tracked exception rather than letting the regression slip through silently.