Tracking Accessibility Violation Trends Across Sprints

A single scan answers “are we passing right now?” It cannot answer “are we getting better?” To show whether accessibility debt is shrinking sprint over sprint, you have to persist per-commit violation counts by severity and compute the deltas. This page is part of Reporting, Dashboards & Violation Tracking, and it covers appending counts to a time series and querying it for burn-down, regressions, and improvements.

  • Append a per-commit severity snapshot to a CSV or SQLite time series
  • Compute sprint-over-sprint deltas per severity bucket
  • Flag a regression when serious-or-worse counts increase versus the prior sprint
  • Render a simple burn-down of total violations over time
Trend tracking Each commit writes a severity snapshot to a time series; queries compute sprint deltas and a burn-down line. Commit snapshot severity counts Time series CSV / SQLite Sprint delta regression flag Burn-down total over time
Snapshots accumulate into a series that yields deltas and a burn-down.

Root Cause / Context

Most pipelines treat each scan as stateless: they fail or pass and discard the numbers. Trend questions are inherently stateful — a delta needs both the current count and a prior one — so without a persisted series you have nothing to subtract from. The fix is small: after each scan, append one row of severity counts keyed by commit and timestamp.

The second subtlety is what “a sprint” means in the data. Commits arrive continuously, so a sprint boundary is a date range, not a row. You compute a sprint figure by taking the last snapshot in each sprint window (the state you shipped) and comparing windows, which keeps mid-sprint churn from polluting the comparison. The severity buckets here are the same critical/serious/moderate/minor used in auto-fail vs warning workflows.

Configuration

This appender reads the normalized findings, counts them by severity, and writes one row to both a CSV (easy to chart anywhere) and a SQLite table (easy to query). It is idempotent per commit so retries do not double-count.

// trend.js — append a per-commit severity snapshot to CSV + SQLite
const fs = require('fs');
const Database = require('better-sqlite3');
const findings = require('./normalized.json');

const counts = { critical: 0, serious: 0, moderate: 0, minor: 0 };
for (const f of findings) counts[f.impact] = (counts[f.impact] || 0) + 1;
const total = Object.values(counts).reduce((a, b) => a + b, 0);

const row = {
  commit: process.env.GITHUB_SHA || 'local',
  ts: new Date().toISOString(),
  ...counts,
  total,
};

// --- CSV (create header once) ---
const CSV = 'trend.csv';
if (!fs.existsSync(CSV)) {
  fs.writeFileSync(CSV, 'commit,ts,critical,serious,moderate,minor,total\n');
}
fs.appendFileSync(CSV,
  `${row.commit},${row.ts},${row.critical},${row.serious},${row.moderate},${row.minor},${row.total}\n`);

// --- SQLite (idempotent per commit) ---
const db = new Database('trend.db');
db.exec(`CREATE TABLE IF NOT EXISTS trend (
  commit_sha TEXT PRIMARY KEY, ts TEXT,
  critical INT, serious INT, moderate INT, minor INT, total INT
)`);
db.prepare(`INSERT OR REPLACE INTO trend
  (commit_sha, ts, critical, serious, moderate, minor, total)
  VALUES (@commit, @ts, @critical, @serious, @moderate, @minor, @total)`).run(row);

console.log(`Snapshot: ${total} total (${counts.critical} critical, ${counts.serious} serious)`);

Because the time series must survive ephemeral runners, restore the prior trend.db from a cache or artifact before appending, then re-upload it. The same persisted series feeds the dashboards in visualizing WCAG compliance trends with Grafana.

- name: Restore trend series
  uses: actions/cache@v4
  with:
    path: trend.db
    key: a11y-trend          # single rolling key so each run reads the latest series
- name: Append trend snapshot
  if: always()
  run: node trend.js

Validation

Compute the sprint-over-sprint delta directly from the series. This query takes the most recent snapshot in each 14-day window and reports whether serious-or-worse counts rose (a regression) or fell (an improvement).

-- Sprint deltas: last snapshot per 14-day window, then diff vs the prior window
WITH per_sprint AS (
  SELECT CAST(julianday(ts) / 14 AS INT) AS sprint,
         critical + serious AS sev,
         total,
         ROW_NUMBER() OVER (
           PARTITION BY CAST(julianday(ts) / 14 AS INT) ORDER BY ts DESC
         ) AS rn
  FROM trend
)
SELECT sprint,
       sev,
       total,
       sev - LAG(sev) OVER (ORDER BY sprint) AS sev_delta,   -- >0 means regression
       total - LAG(total) OVER (ORDER BY sprint) AS total_delta
FROM per_sprint
WHERE rn = 1            -- the state shipped at the end of each sprint
ORDER BY sprint;

A positive sev_delta is a regression in serious-or-worse violations and should be surfaced to the team; a negative total_delta is your burn-down trending the right way.

Edge Cases & Conditional Guards

  • First sprint has no baseline: LAG returns NULL for the first window; treat NULL deltas as “no comparison yet” rather than zero.
  • Sparse commits: a sprint with no scans produces no row and the window is skipped; if you need continuity, carry the prior snapshot forward when charting.
  • Branch noise: feature branches inflate the series; filter to the default branch (WHERE branch = 'main') when computing official sprint figures, even though every run still records a snapshot.

Pipeline Impact

The append step uses if: always() so even a failed gate records the snapshot — you want the regression visible precisely on the builds that broke. The step does not change exit codes. Cache or artifact the trend.db/trend.csv between runs so the series is continuous; losing it resets every delta to NULL.

Common Pitfalls

  • Computing deltas across all branches at once, so a spike in a throwaway feature branch reads as a release regression.
  • Forgetting the per-commit primary key, so retried builds append duplicate snapshots and the burn-down jitters.
  • Comparing arbitrary adjacent commits instead of end-of-sprint states, which makes mid-sprint work-in-progress look like regressions.
  • Charting raw totals while rule coverage changes; adding new rules raises counts without any real regression — annotate the series when you expand coverage.

FAQ

CSV or SQLite for the time series? SQLite if you want the window-function delta queries shown here; CSV if you only need to hand the series to a charting tool. Writing both costs nothing and keeps options open.

How do I tell a real regression from added rule coverage? Record when you expand the rule set and annotate the series at that commit. A count increase that coincides with new rules is coverage, not regression; an increase with a stable rule set is a real regression in serious-or-worse counts.

What’s the right sprint window? Match it to your cadence — 14 days is common. The query parameterizes the window by dividing the Julian day by the sprint length, so change the divisor to fit a one-week or three-week sprint.