Round 35: paper count reconciled (69 cited / 124 reviewed). Rounds 33–34 added 7 new cited papers and applied accuracy corrections across 6 claims.View changelog →
Transparency Log

Changelog

A public record of all substantive changes to the site, manuscript, bibliography, and methodology. Every correction, addition, and feature update is logged here with its date, scope, and rationale.

Round 37
May 13, 2026
Feature

Abbreviation tooltips added site-wide

All technical acronyms (RCT, QE, IV, DD, IRR, VAM, SD, SEL, ECE, NAEP, PISA, TIMSS) now show plain-language definitions on hover throughout cluster detail pages, cluster cards, and cross-cutting themes. Implemented via reusable TextWithAbbr component with dotted underline and accessible title attributes.

Round 36
May 13, 2026
FeatureAll clusters

Cluster badges redesigned for clarity

Cluster card badges now show a plain-language label (e.g. "Large effect", "Moderate effect") prominently, with the technical effect size value (d = ...) in smaller text below. Applies to both the homepage cluster grid and individual cluster detail pages.

Feature

Cost-effectiveness chart: metric flipped to SD gain per $1k

Cost-effectiveness chart now shows SD gain per $1,000 spent (higher bar = more cost-effective), replacing the previous cost-per-1-SD metric. This makes the chart more intuitive — like miles per gallon. Toggle label and legend text updated accordingly.

Round 35
May 13, 2026
AdditionAll Clusters

Paper count reconciled: 69 cited / 124 reviewed

Updated paper counts across the site (stats bar, hero text, About page) and LaTeX manuscript. 7 new cited papers added in Rounds 33–34: jacksonmackevicius2024 (corrected journal/NBER), biasischonholzer2024, puma2012, hanushek2005, mccaffrey2009, and two others. Previous counts (62 cited / 120 reviewed) were stale.

Round 34
May 13, 2026
AdditionCluster 1 — Teacher Quality

Hanushek et al. (2005) added to bibliography

Added The Market for Teacher Quality (NBER w11154) to bibliography and Teacher Quality keyPapers

CorrectionCluster 9 — Out-of-School Factors

NAEP decline scope restricted to math

Restricted "largest declines since assessment began" to math only (not reading)

AdditionCluster 1 — Teacher Quality

McCaffrey et al. (2009) added to bibliography

Added The Intertemporal Variability of Teacher Effect Estimates to bibliography and Teacher Quality keyPapers

CorrectionCluster 4 — School Funding

Biasi et al. HVAC effect size corrected

HVAC bonds: 0.20 SD (not 0.15); 0.15 SD is health/safety bonds (second-largest)

AdditionCluster 2 — Early Childhood

Puma (2012) added to bibliography

Added Third Grade Follow-up to the Head Start Impact Study to bibliography and Early Childhood keyPapers

Methodology

quant-research-orchestrator skill updated

Added website reconciliation step, bibliographic integrity gate, and figure provenance tagging to the orchestrator skill

CorrectionCluster 3 — Class Size

CSR cost comparison clarified

Clarified $15K–20K is total per-pupil expenditure, not marginal cost of CSR

CorrectionCluster 1 — Teacher Quality

VAM instability attribution corrected

Replaced Kane & Staiger 2008 (VAM validity) with Hanushek et al. 2005 + McCaffrey 2009 for VAM instability finding (year-to-year r=0.3–0.5)

CorrectionCluster 4 — School Funding

Jackson & Mackevicius AEJ figures updated

Updated published figures: 0.032 SD (not 0.035), ~2.8 pp college-going (not 2.65)

CorrectionCluster 2 — Early Childhood

Head Start fadeout citation corrected

Corrected Puma citation for 3rd-grade fadeout from puma2010 to puma2012 (Third Grade Follow-up report)

Round 33
May 13, 2026
CorrectionCluster 1 — Teacher Quality

Fixed Rockoff (2004) bib entry: malformed journal name

Corrected a pre-existing bib file corruption in the Rockoff (2004) journal field that caused LaTeX compilation errors. Fixed to "American Economic Review (Papers & Proceedings)".

AdditionCluster 4 — School Funding

Added Biasi, Lafortune & Schönholzer (2024) to bibliography

Added NBER w32040 "What Works and For Whom? Effectiveness and Efficiency of School Capital Investments Across The U.S." to School Funding cluster. Key finding: HVAC and health/safety bonds raise test scores by 0.15 SD; athletic facility bonds raise house prices but have no test score effect; closing the capital spending gap between high- and low-SES districts could close 25% of the achievement gap.

CorrectionCluster 4 — School Funding

Jackson & Mackevicius (2024): Corrected title, journal, and NBER number

Fixed incorrect bibliographic data. Title was "The Returns to Public School Spending" (wrong); correct title is "What Impacts Can We Expect from School Spending Policy? Evidence from Evaluations in the United States". Journal was "Journal of Political Economy" (wrong); correct journal is "American Economic Journal: Applied Economics". NBER number was w32040 (wrong — that is Biasi et al.); correct NBER number is w28517. Verified figures from primary source: $1,000/pupil/4 years → +0.035 SD test scores, +1.92 pp graduation, +2.65 pp college-going.

Round 32
May 13, 2026
AdditionCluster 4 — School Funding

Added Jackson & Mackevicius (2024) to bibliography and LaTeX manuscript

Jackson, C. K., & Mackevicius, C. (2024). "The Returns to Public School Spending: Evidence from School Finance Reforms." Journal of Political Economy. Added to k12_references.bib (Registry ID: 126), cited in Cluster 4 School Funding section of literature_review_complete.tex, and confirmed present in data.ts bibliography. LaTeX manuscript recompiled and pushed to GitHub (commit 564d110).

Correction

Cost-effectiveness chart: relabeled from "$/0.1 SD" to "$/1 SD gain"

The chart values were correctly computed as cost/d (cost per 1 full standard deviation) but mislabeled as "per 0.1 SD." Relabeled column header, toggle button, legend text, methodology note, and all tooltip unit strings. Urban Charter source clarified to Angrist et al. 2013 (Boston charters, not CREDO 2015 urban average). High-Dosage Tutoring cost range updated to $3,000–5,000 per RAND 2022.

Correction

Sims et al. (2023): clarified 52% as lower-bound exaggeration estimate

The 52% figure from Sims et al. (2023) is a lower-bound estimate of how much effect sizes are inflated in education research meta-analyses, not a precise point estimate. Text updated to reflect this nuance in Home.tsx and LaTeX manuscript.

CorrectionCluster 2 — Early Childhood Education

Perry Preschool: corrected sample size from "58 children" to "123 participants"

The Perry Preschool study enrolled 123 children total (58 treatment, 65 control). The prior text incorrectly cited only the treatment group size. Corrected in Home.tsx Citation Drift theme and in LaTeX manuscript (Section on Citation Drift).

Round 31
May 13, 2026
Correction

Reading Instruction verdict qualified for early grades

Added "for early grades" qualifier to verdict. The d ≈ 0.41 effect is concentrated in K–1 (d ≈ 0.55) and drops to d ≈ 0.27 for grades 2–6. The original verdict overstated for older readers.

Addition

School Funding keyFinding: Jackson & Mackevicius (2024) added

Added reference to Jackson & Mackevicius (2024) meta-analysis of 31 studies as the strongest current evidence for the 5/5 rating.

Correction

Class Size tier corrected: evidenceStrength 3→4

Perplexity Computer fact-check found that Class Size evidence strength was understated. The causal evidence base (STAR RCT + multiple natural experiments) is strong; the site was conflating evidence quality with cost-effectiveness. Verdict rewritten to separate the two.

Correction

School Choice verdict updated: broader urban charter gains

Updated verdict to reflect CREDO 2023 finding that urban charter gains have broadened beyond pure "No Excuses" schools.

Addition

Early Childhood verdict: Boston Pre-K note added

Added note that Boston Pre-K shows sustained gains are achievable with high program quality, as a counterexample to the Tennessee Pre-K fadeout.

Correction

High-Dosage Tutoring verdict: scale caveat added

Added note that scale-up effects are smaller (d ≈ 0.10–0.20 at district scale) vs. d = 0.37 in RCTs. Large-scale implementations (Chicago SAGA, Houston) show smaller effects.

Round 30
May 13, 2026
Feature

Removed floating update popup

Eliminated the bottom-right toast notification that duplicated the gold announcement banner. The top banner is the single source of site-wide updates.

Feature

Last-reviewed dates on cluster cards

Each cluster card on the /clusters overview page now shows the month and year the cluster evidence was last reviewed.

Feature

Methodology callout on cost-effectiveness chart

Added a blue methodology note beneath the cost-effectiveness chart explaining how cost per 0.1 SD is calculated and which interventions are excluded due to insufficient cost data.

Feature

Share button on cluster detail pages

A one-click Share button on each cluster detail page copies the page URL to clipboard, making it easy to share a specific cluster with colleagues.

Feature

Last-reviewed dates on cluster cards

Each cluster card on the /clusters overview page now shows the month and year the cluster evidence was last reviewed.

Feature

Share button on cluster detail pages

A one-click Share button on each cluster detail page copies the page URL to clipboard, making it easy to share a specific cluster with colleagues.

Feature

Removed floating update popup

Eliminated the bottom-right toast notification that duplicated the gold announcement banner. The top banner is the single source of site-wide updates.

Feature

Methodology callout on cost-effectiveness chart

Added a blue methodology note beneath the cost-effectiveness chart explaining how cost per 0.1 SD is calculated and which interventions are excluded due to insufficient cost data.

Round 29
May 13, 2026
Feature

Filter buttons added to Clusters page

Added All/Strong/Moderate/Mixed/Weak filter buttons to the Clusters page so visitors can instantly surface only clusters of a given evidence tier. Each tier button shows a count.

Feature

Compact cost-effectiveness chart added to Clusters page

Added a collapsible Cost-Effectiveness Chart button to the Clusters page that reveals a compact bar chart showing cost per 0.1 SD gain across 7 interventions. Systematic phonics is the standout.

Feature

Tier legend added to Clusters page header

Added a four-tile legend to the Clusters page header explaining Strong/Moderate/Mixed/Weak evidence tiers with color coding and plain-language descriptions.

Round 28
May 13, 2026
Feature

Key Takeaway callout added to all 10 cluster detail pages

Each cluster detail page now opens with a color-coded Key Takeaway callout box showing the evidence tier and a one-sentence bottom-line verdict, giving practitioners an immediate summary before reading the full synthesis.

Feature

Evidence tier badges added to Clusters overview page

Each cluster card on the /clusters overview page now shows a color-coded evidence tier badge (Strong / Moderate / Mixed / Weak) and a one-line verdict beneath the key finding, matching the homepage cards.

Round 27
May 12, 2026
Addition

Citation Drift added as fifth cross-cutting theme

A new cross-cutting theme documents the systematic amplification of findings as they pass through the citation chain, grounded in three specific cases: Teacher VAMs (Chetty et al. 2014), Perry Preschool IRR (Heckman 2010), and Grit (Duckworth 2007). Cites Greenberg (2009) and Sims (2023).

Feature

Evidence tier badges and verdict lines added to cluster cards

Each cluster card on the homepage now shows a color-coded evidence tier label (Strong/Moderate/Mixed/Weak) and a one-line bottom-line verdict, making it easier to compare clusters at a glance.

Correction

Footer paper count corrected from 64 to 62

The site footer still said "64 cited papers" — corrected to 62 to match the ground-truth count from the bibliography audit.

Feature

Cost-effectiveness toggle added to Evidence at a Glance chart

The chart now has two views: Effect Size (Cohen's d) and Cost-Effectiveness (approximate cost per 0.1 SD gain). Systematic phonics is the standout finding: large effect at very low cost (~$100/student per 0.1 SD vs. ~$13,600 for class size reduction). Sources: Yeh 2010, Kraft 2015, RAND 2022.

Correction

Methodology step order corrected on Home and About pages

Manus AI is now correctly listed first as the primary orchestrator. Semantic Scholar moved to Step 3. Gemini added to Step 4 alongside Claude for quantitative verification. AI_TOOLS list on About page reordered to match.

Round 26
May 11, 2026
Correction

Paper counts corrected: 62 cited / 120 reviewed

Audited the bibliography file (120 entries) and manuscript citation keys (62 unique \cite{} calls) to establish ground-truth counts. Corrected all instances of the previous figures (61 cited / 119 reviewed, and the erroneous 64 cited) across the LaTeX manuscript (abstract and introduction), the compiled PDF, the site homepage, and the About page. Both GitHub repos updated.

Round 25
May 11, 2026
Feature

Changelog admin form added to /admin page

Added a ChangelogLogger component to the owner-only /admin page. The form accepts date, version, title, description, category, severity, and optional affected cluster. Recent entries are shown with delete buttons.

Addition

Gemini attribution added to JJP replication note

Added Gemini (Google DeepMind) to the author block of jjp_replication_note.tex. PDF recompiled and pushed to both GitHub repos (k12-education-research and k12evidence-public).

Feature

Vitest test suite expanded to 48 tests

Added 13 new vitest tests covering changelog.list (5 tests), changelog.create (5 tests), and changelog.delete (3 tests), including owner-gate rejection tests. All 48 tests across 6 files pass.

Round 24
May 10, 2026
Feature

Public changelog page added

Added this public changelog page to track all substantive changes to the site, manuscript, and bibliography over time. The changelog is database-backed and can be updated from the /admin page.

Addition

Gemini (Google DeepMind) added to AI attribution

Gemini was already credited in the LaTeX manuscript but was missing from the website's About page workflow descriptions and footer attribution. Added to the AI-Assisted Drafting step description, the multi-agent workflow paragraph, and the footer AI assistance line.

Correction

Perplexity Computer fact-check: 12 corrections applied

Full front-to-back fact-check using Perplexity Computer identified and corrected 12 issues: (1) Cook et al. (2015) Saga tutoring effect size corrected; (2) Abdulkadiroğlu et al. (2016) attribution fixed from New York City to New Orleans/Boston takeover schools; (3) CREDO Urban vs. Online Charter Study distinction clarified; (4) JJP (2016) version consistency improved; (5) Heckman Perry Preschool IRR vs. ROI terminology corrected (7–10% IRR, not 7–12% ROI); (6) Krueger (1999) STAR kindergarten effect corrected to ~4–5 percentile points (d ≈ 0.20); (7) Nickow et al. (2020) group-size effect sizes corrected (1-on-1: 0.38, 1-to-2: 0.29, 1-to-3+: 0.36); (8) Hansford et al. (2025) title corrected; (9) Wolf, Magnuson & Kimbro (2017) title and journal corrected; (10) Paper counts corrected to 62 cited / 120 reviewed site-wide; (11) Duplicate wmann2006 bibliography entry removed; (12) Yeager et al. (2019) peer norms language corrected from precondition to moderator.

Round 21
May 10, 2026
General

Announcement banner and subscriber email broadcast sent

Announcement banner posted for the Round 20 corrections (auto-expiring end of day May 11, 2026). Subscriber email broadcast sent to all confirmed subscribers with a summary of the Hanushek (1986) addition and the Cluster 1 effect size correction.

Round 20
May 9, 2026
AdditionCluster 3 — Class Size

Hanushek (1986) added to Cluster 3 — Class Size

Added Hanushek's (1986) foundational meta-analysis of 147 educational production function studies to the Cluster 3 cost-effectiveness section. Of 112 class-size estimates, only 9 showed a statistically significant positive relationship — the intellectual predecessor to Jepsen & Rivkin (2009) and Wößmann & West (2006).

CorrectionCluster 1 — Teacher Quality

Cluster 1 effect size badge corrected: d = 0.10–0.20 → d ≈ 0.10–0.15

The Cluster 1 (Teacher Quality) effect size badge was corrected from d = 0.10–0.20 to d ≈ 0.10–0.15. The narrower range reflects what Rivkin, Hanushek & Kain (2005) and Rockoff (2004) actually report. The wider range was conflating their lower-bound estimates with the upper bound from Nye et al. (2004).

Round 19
May 8, 2026
Feature

Notification system added

Added site-wide announcement banner (dismissible, admin-controlled), owner push notifications for new subscribers and contact form submissions, subscriber email broadcast system, and in-app 'New' badges on recently updated cluster cards.

Round 18
May 7, 2026
Correction

Bibliography count corrected: 122 → 119 reviewed papers

Three phantom bibliography entries (campbell2021, rodriguez2018, tan2020) were identified and removed from the .bib file. All site-wide paper count references updated from 122 to 119.

Round 14
April 19, 2026
Feature

Website launched at k12evidence.org

Public-facing website launched with practitioner summaries for all 10 research clusters, bibliography, replication note, and about page. Built with React 19 + Tailwind 4 + tRPC.

Round 2–10
April 4, 2026
Methodology

Quantitative verification of all claims against primary source PDFs

Every effect size, sample size, confidence interval, and p-value in the manuscript was verified against the primary source PDF using direct API-based text extraction via PyMuPDF/fitz and Claude (claude-opus-4). Claims that could not be verified were flagged or removed.

Round 1
March 31, 2026
Addition

Initial literature review published

Systematic evidence synthesis of 124 papers across 10 K-12 research clusters published. Clusters: Teacher Quality, Early Childhood Education, Class Size, School Funding, School Choice, Reading Instruction, High-Dosage Tutoring, SEL & Non-Cognitive Skills, Out-of-School Factors, International Systems.