Using LLMs to Suggest ARIA Labels Safely

An unlabeled icon button is one of the most common accessibility defects, and it is tempting to hand the whole class of them to a language model. This page — part of AI-Assisted Accessibility Remediation — shows how to generate accessible-name suggestions (aria-label or alt) for unlabeled controls with an LLM, then constrain and validate that output before any human sees it. The model proposes; a deterministic validation layer filters; a person approves. Crucially, the structure here is provider-agnostic: nothing depends on a particular vendor’s API shape.

Baseline controls this page enforces:

  • Cap label length and strip whitespace before the suggestion is shown.
  • Reject redundant role words like “button” or “link” inside an aria-label.
  • Require the suggestion’s language to match the page’s lang.
  • Enforce WCAG 2.2 SC 2.5.3 (Label in Name) when a visible label exists.
Constrained LLM label-suggestion pipeline An unlabeled control is sent to an LLM, which returns a candidate name. A validation layer applies length, redundancy, language, and label-in-name checks before human approval. Unlabeled control LLM (any provider) Validation layer length cap no role words language match SC 2.5.3 Human approves Any constraint fails -> candidate dropped
The model only proposes; a deterministic validation layer enforces four constraints before a human ever approves the label.

Root Cause / Context

Models are good at producing fluent, plausible names — which is exactly the risk. A model handed a magnifier icon will happily return aria-label="Search button", embedding the role word “button” that screen readers already announce, so users hear “Search button button.” It may return a label in English on a page whose lang="fr", breaking the user’s expected pronunciation. And when a control has visible text, WCAG 2.2 SC 2.5.3 (Label in Name) requires the accessible name to contain that visible text so speech-input users can target it — a model that paraphrases the visible label silently breaks voice control. None of these are caught by generic spell-checks; they need a purpose-built validation layer that knows the rules of accessible naming.

The default approach of “ask the model and apply the answer” fails because the failure is invisible in the rendered UI. The button still looks fine. Only an assistive-technology user experiences the defect, which is why the constraint layer must run automatically and reject before a human is asked to approve.

Configuration

The structure separates three concerns: a thin provider-agnostic adapter, a prompt builder, and a pure validation function with no I/O. Swapping vendors means rewriting only the adapter.

// label-suggester.js

// 1. Provider-agnostic adapter. Implement this for whatever vendor you use.
//    It must return a plain string. Nothing downstream knows the vendor.
async function callModel(prompt) {
  const res = await fetch(process.env.LLM_ENDPOINT, {
    method: "POST",
    headers: {
      "content-type": "application/json",
      authorization: `Bearer ${process.env.LLM_API_KEY}`, // secret-stored
    },
    body: JSON.stringify({ prompt, max_output_tokens: 40 }),
  });
  const data = await res.json();
  return (data.text ?? "").trim();
}

// 2. Prompt builder. Pass the page language and any visible text so the
//    model has the context the validators will later enforce.
function buildPrompt({ html, visibleText, lang }) {
  return [
    "Suggest a concise accessible name for this control.",
    "Return ONLY the name text, no quotes, no punctuation at the end.",
    `Write it in this language code: ${lang}.`,
    visibleText
      ? `The control shows the visible text "${visibleText}". The name MUST contain it.`
      : "There is no visible text.",
    "Do not include the words button, link, or image.",
    `Control HTML: ${html}`,
  ].join("\n");
}

// 3. Pure validation layer. Returns { ok, reason }. No network, no DOM.
const ROLE_WORDS = ["button", "link", "image", "graphic", "img"];

export function validateLabel(candidate, { visibleText, lang, pageLang }) {
  const name = candidate.trim();
  if (!name) return { ok: false, reason: "empty" };
  if (name.length > 80) return { ok: false, reason: "too long (>80 chars)" };
  if (name !== candidate) return { ok: false, reason: "leading/trailing space" };

  const lower = name.toLowerCase();
  for (const w of ROLE_WORDS) {
    // whole-word match so "Submit" is fine but "Submit button" is not
    if (new RegExp(`\\b${w}\\b`, "i").test(lower)) {
      return { ok: false, reason: `redundant role word "${w}"` };
    }
  }

  if (lang !== pageLang) {
    return { ok: false, reason: `lang ${lang} != page ${pageLang}` };
  }

  // WCAG 2.2 SC 2.5.3: accessible name must contain the visible label.
  if (visibleText && !lower.includes(visibleText.trim().toLowerCase())) {
    return { ok: false, reason: "visible text not in name (SC 2.5.3)" };
  }
  return { ok: true, reason: "passed" };
}

export async function suggest(control) {
  const candidate = await callModel(buildPrompt(control));
  const result = validateLabel(candidate, {
    visibleText: control.visibleText,
    lang: control.lang,
    pageLang: control.pageLang,
  });
  return { candidate, ...result }; // dropped candidates never reach a human
}

Validation

Unit-test the validation layer in isolation — it is pure, so it needs no model or browser. These assertions prove each constraint rejects what it should:

// label-suggester.test.js
import assert from "node:assert";
import { validateLabel } from "./label-suggester.js";

const base = { visibleText: "", lang: "en", pageLang: "en" };

assert.equal(validateLabel("Search button", base).ok, false); // role word
assert.equal(validateLabel("a".repeat(81), base).ok, false);  // too long
assert.equal(validateLabel("Rechercher", { ...base, lang: "fr" }).ok, false); // lang mismatch
assert.equal(
  validateLabel("Save document", { ...base, visibleText: "Save" }).ok,
  true
); // contains visible text -> SC 2.5.3 satisfied
assert.equal(
  validateLabel("Store file", { ...base, visibleText: "Save" }).ok,
  false
); // paraphrased visible text -> SC 2.5.3 violation
console.log("validation layer OK");

Run with node --test. Every rejected candidate is discarded before the human-review surface, so reviewers only ever see names that already satisfy the rules. The downstream re-scan and ARIA validity gate is detailed in Validating AI-Generated ARIA Fixes in CI.

Edge Cases & Conditional Guards

  • Mixed-language pages: when a control lives inside a subtree with its own lang, validate against the nearest lang ancestor, not the document root, or you will reject correct localized labels.
  • Icon-plus-text controls: if the control has both an icon and visible text, the visible text governs SC 2.5.3 — never let the model replace it with an icon description.
  • Empty visible text from CSS pseudo-content: text injected via ::before is not a real accessible label source; treat the control as unlabeled rather than trusting scraped pseudo-content.

Pipeline Impact

Because the validation layer is deterministic and pure, run it as a fast pre-filter step before any human-review job, emitting a JSON list of surviving candidates as a build artifact. Failed candidates are logged but do not block the build on their own — the gate that blocks merges is the human approval plus the Pull Request Gating & Branch Policies required check. This keeps the LLM cost low and the merge decision firmly with a person.

Common Pitfalls

  • Applying labels without language validation. A right answer in the wrong language is still wrong; screen readers mispronounce it.
  • Allowing role words through. “Close button” produces a doubled role announcement; strip role words on a whole-word boundary.
  • Ignoring SC 2.5.3 when visible text exists. Paraphrasing a visible label breaks voice-control targeting even if the label reads well.
  • Coupling to a vendor SDK. Hard-coding a provider’s response shape makes the pipeline brittle; keep the adapter thin and the validators pure.

FAQ

Why constrain length to 80 characters? Long accessible names are fatiguing to hear and usually signal the model padded the answer with description that belongs in aria-describedby, not the name. The cap is a cheap proxy for “is this a name or a paragraph.”

Should the model ever set the label automatically if validation passes? No. Validation proves the name is well-formed, not that it is accurate for the control’s actual function. A human still confirms meaning, consistent with the safety model in AI-Assisted Accessibility Remediation.

How do I keep this provider-agnostic in practice? Confine every vendor-specific detail — endpoint, auth header, response field — to the callModel adapter. The prompt builder and validators take and return plain strings, so switching vendors is a one-function change.