Tracking Accessibility Violation Trends Across Sprints
A single scan answers “are we passing right now?” It cannot answer “are we getting better?” To show whether accessibility debt is shrinking sprint over sprint, you have to persist per-commit violation counts by severity and compute the deltas. This page is part of Reporting, Dashboards & Violation Tracking, and it covers appending counts to a time series and querying it for burn-down, regressions, and improvements.
- Append a per-commit severity snapshot to a CSV or SQLite time series
- Compute sprint-over-sprint deltas per severity bucket
- Flag a regression when serious-or-worse counts increase versus the prior sprint
- Render a simple burn-down of total violations over time
Root Cause / Context
Most pipelines treat each scan as stateless: they fail or pass and discard the numbers. Trend questions are inherently stateful — a delta needs both the current count and a prior one — so without a persisted series you have nothing to subtract from. The fix is small: after each scan, append one row of severity counts keyed by commit and timestamp.
The second subtlety is what “a sprint” means in the data. Commits arrive continuously, so a sprint boundary is a date range, not a row. You compute a sprint figure by taking the last snapshot in each sprint window (the state you shipped) and comparing windows, which keeps mid-sprint churn from polluting the comparison. The severity buckets here are the same critical/serious/moderate/minor used in auto-fail vs warning workflows.
Configuration
This appender reads the normalized findings, counts them by severity, and writes one row to both a CSV (easy to chart anywhere) and a SQLite table (easy to query). It is idempotent per commit so retries do not double-count.
// trend.js — append a per-commit severity snapshot to CSV + SQLite
const fs = require('fs');
const Database = require('better-sqlite3');
const findings = require('./normalized.json');
const counts = { critical: 0, serious: 0, moderate: 0, minor: 0 };
for (const f of findings) counts[f.impact] = (counts[f.impact] || 0) + 1;
const total = Object.values(counts).reduce((a, b) => a + b, 0);
const row = {
commit: process.env.GITHUB_SHA || 'local',
ts: new Date().toISOString(),
...counts,
total,
};
// --- CSV (create header once) ---
const CSV = 'trend.csv';
if (!fs.existsSync(CSV)) {
fs.writeFileSync(CSV, 'commit,ts,critical,serious,moderate,minor,total\n');
}
fs.appendFileSync(CSV,
`${row.commit},${row.ts},${row.critical},${row.serious},${row.moderate},${row.minor},${row.total}\n`);
// --- SQLite (idempotent per commit) ---
const db = new Database('trend.db');
db.exec(`CREATE TABLE IF NOT EXISTS trend (
commit_sha TEXT PRIMARY KEY, ts TEXT,
critical INT, serious INT, moderate INT, minor INT, total INT
)`);
db.prepare(`INSERT OR REPLACE INTO trend
(commit_sha, ts, critical, serious, moderate, minor, total)
VALUES (@commit, @ts, @critical, @serious, @moderate, @minor, @total)`).run(row);
console.log(`Snapshot: ${total} total (${counts.critical} critical, ${counts.serious} serious)`);
Because the time series must survive ephemeral runners, restore the prior trend.db from a cache or artifact before appending, then re-upload it. The same persisted series feeds the dashboards in visualizing WCAG compliance trends with Grafana.
- name: Restore trend series
uses: actions/cache@v4
with:
path: trend.db
key: a11y-trend # single rolling key so each run reads the latest series
- name: Append trend snapshot
if: always()
run: node trend.js
Validation
Compute the sprint-over-sprint delta directly from the series. This query takes the most recent snapshot in each 14-day window and reports whether serious-or-worse counts rose (a regression) or fell (an improvement).
-- Sprint deltas: last snapshot per 14-day window, then diff vs the prior window
WITH per_sprint AS (
SELECT CAST(julianday(ts) / 14 AS INT) AS sprint,
critical + serious AS sev,
total,
ROW_NUMBER() OVER (
PARTITION BY CAST(julianday(ts) / 14 AS INT) ORDER BY ts DESC
) AS rn
FROM trend
)
SELECT sprint,
sev,
total,
sev - LAG(sev) OVER (ORDER BY sprint) AS sev_delta, -- >0 means regression
total - LAG(total) OVER (ORDER BY sprint) AS total_delta
FROM per_sprint
WHERE rn = 1 -- the state shipped at the end of each sprint
ORDER BY sprint;
A positive sev_delta is a regression in serious-or-worse violations and should be surfaced to the team; a negative total_delta is your burn-down trending the right way.
Edge Cases & Conditional Guards
- First sprint has no baseline:
LAGreturns NULL for the first window; treat NULL deltas as “no comparison yet” rather than zero. - Sparse commits: a sprint with no scans produces no row and the window is skipped; if you need continuity, carry the prior snapshot forward when charting.
- Branch noise: feature branches inflate the series; filter to the default branch (
WHERE branch = 'main') when computing official sprint figures, even though every run still records a snapshot.
Pipeline Impact
The append step uses if: always() so even a failed gate records the snapshot — you want the regression visible precisely on the builds that broke. The step does not change exit codes. Cache or artifact the trend.db/trend.csv between runs so the series is continuous; losing it resets every delta to NULL.
Common Pitfalls
- Computing deltas across all branches at once, so a spike in a throwaway feature branch reads as a release regression.
- Forgetting the per-commit primary key, so retried builds append duplicate snapshots and the burn-down jitters.
- Comparing arbitrary adjacent commits instead of end-of-sprint states, which makes mid-sprint work-in-progress look like regressions.
- Charting raw totals while rule coverage changes; adding new rules raises counts without any real regression — annotate the series when you expand coverage.
FAQ
CSV or SQLite for the time series? SQLite if you want the window-function delta queries shown here; CSV if you only need to hand the series to a charting tool. Writing both costs nothing and keeps options open.
How do I tell a real regression from added rule coverage? Record when you expand the rule set and annotate the series at that commit. A count increase that coincides with new rules is coverage, not regression; an increase with a stable rule set is a real regression in serious-or-worse counts.
What’s the right sprint window? Match it to your cadence — 14 days is common. The query parameterizes the window by dividing the Julian day by the sprint length, so change the divisor to fit a one-week or three-week sprint.
Related
- Reporting, Dashboards & Violation Tracking — the parent guide that defines the normalized snapshot.
- Visualizing WCAG Compliance Trends with Grafana — charting this series as live panels.
- Auto-Fail vs Warning Workflows — the severity buckets these trends are counted by.