Changelog
A public record of all substantive changes to the site, manuscript, bibliography, and methodology. Every correction, addition, and feature update is logged here with its date, scope, and rationale.
Abbreviation tooltips added site-wide
All technical acronyms (RCT, QE, IV, DD, IRR, VAM, SD, SEL, ECE, NAEP, PISA, TIMSS) now show plain-language definitions on hover throughout cluster detail pages, cluster cards, and cross-cutting themes. Implemented via reusable TextWithAbbr component with dotted underline and accessible title attributes.
Cluster badges redesigned for clarity
Cluster card badges now show a plain-language label (e.g. "Large effect", "Moderate effect") prominently, with the technical effect size value (d = ...) in smaller text below. Applies to both the homepage cluster grid and individual cluster detail pages.
Cost-effectiveness chart: metric flipped to SD gain per $1k
Cost-effectiveness chart now shows SD gain per $1,000 spent (higher bar = more cost-effective), replacing the previous cost-per-1-SD metric. This makes the chart more intuitive — like miles per gallon. Toggle label and legend text updated accordingly.
Paper count reconciled: 69 cited / 124 reviewed
Updated paper counts across the site (stats bar, hero text, About page) and LaTeX manuscript. 7 new cited papers added in Rounds 33–34: jacksonmackevicius2024 (corrected journal/NBER), biasischonholzer2024, puma2012, hanushek2005, mccaffrey2009, and two others. Previous counts (62 cited / 120 reviewed) were stale.
Hanushek et al. (2005) added to bibliography
Added The Market for Teacher Quality (NBER w11154) to bibliography and Teacher Quality keyPapers
NAEP decline scope restricted to math
Restricted "largest declines since assessment began" to math only (not reading)
McCaffrey et al. (2009) added to bibliography
Added The Intertemporal Variability of Teacher Effect Estimates to bibliography and Teacher Quality keyPapers
Biasi et al. HVAC effect size corrected
HVAC bonds: 0.20 SD (not 0.15); 0.15 SD is health/safety bonds (second-largest)
Puma (2012) added to bibliography
Added Third Grade Follow-up to the Head Start Impact Study to bibliography and Early Childhood keyPapers
quant-research-orchestrator skill updated
Added website reconciliation step, bibliographic integrity gate, and figure provenance tagging to the orchestrator skill
CSR cost comparison clarified
Clarified $15K–20K is total per-pupil expenditure, not marginal cost of CSR
VAM instability attribution corrected
Replaced Kane & Staiger 2008 (VAM validity) with Hanushek et al. 2005 + McCaffrey 2009 for VAM instability finding (year-to-year r=0.3–0.5)
Jackson & Mackevicius AEJ figures updated
Updated published figures: 0.032 SD (not 0.035), ~2.8 pp college-going (not 2.65)
Head Start fadeout citation corrected
Corrected Puma citation for 3rd-grade fadeout from puma2010 to puma2012 (Third Grade Follow-up report)
Fixed Rockoff (2004) bib entry: malformed journal name
Corrected a pre-existing bib file corruption in the Rockoff (2004) journal field that caused LaTeX compilation errors. Fixed to "American Economic Review (Papers & Proceedings)".
Added Biasi, Lafortune & Schönholzer (2024) to bibliography
Added NBER w32040 "What Works and For Whom? Effectiveness and Efficiency of School Capital Investments Across The U.S." to School Funding cluster. Key finding: HVAC and health/safety bonds raise test scores by 0.15 SD; athletic facility bonds raise house prices but have no test score effect; closing the capital spending gap between high- and low-SES districts could close 25% of the achievement gap.
Jackson & Mackevicius (2024): Corrected title, journal, and NBER number
Fixed incorrect bibliographic data. Title was "The Returns to Public School Spending" (wrong); correct title is "What Impacts Can We Expect from School Spending Policy? Evidence from Evaluations in the United States". Journal was "Journal of Political Economy" (wrong); correct journal is "American Economic Journal: Applied Economics". NBER number was w32040 (wrong — that is Biasi et al.); correct NBER number is w28517. Verified figures from primary source: $1,000/pupil/4 years → +0.035 SD test scores, +1.92 pp graduation, +2.65 pp college-going.
Added Jackson & Mackevicius (2024) to bibliography and LaTeX manuscript
Jackson, C. K., & Mackevicius, C. (2024). "The Returns to Public School Spending: Evidence from School Finance Reforms." Journal of Political Economy. Added to k12_references.bib (Registry ID: 126), cited in Cluster 4 School Funding section of literature_review_complete.tex, and confirmed present in data.ts bibliography. LaTeX manuscript recompiled and pushed to GitHub (commit 564d110).
Cost-effectiveness chart: relabeled from "$/0.1 SD" to "$/1 SD gain"
The chart values were correctly computed as cost/d (cost per 1 full standard deviation) but mislabeled as "per 0.1 SD." Relabeled column header, toggle button, legend text, methodology note, and all tooltip unit strings. Urban Charter source clarified to Angrist et al. 2013 (Boston charters, not CREDO 2015 urban average). High-Dosage Tutoring cost range updated to $3,000–5,000 per RAND 2022.
Sims et al. (2023): clarified 52% as lower-bound exaggeration estimate
The 52% figure from Sims et al. (2023) is a lower-bound estimate of how much effect sizes are inflated in education research meta-analyses, not a precise point estimate. Text updated to reflect this nuance in Home.tsx and LaTeX manuscript.
Perry Preschool: corrected sample size from "58 children" to "123 participants"
The Perry Preschool study enrolled 123 children total (58 treatment, 65 control). The prior text incorrectly cited only the treatment group size. Corrected in Home.tsx Citation Drift theme and in LaTeX manuscript (Section on Citation Drift).
Reading Instruction verdict qualified for early grades
Added "for early grades" qualifier to verdict. The d ≈ 0.41 effect is concentrated in K–1 (d ≈ 0.55) and drops to d ≈ 0.27 for grades 2–6. The original verdict overstated for older readers.
School Funding keyFinding: Jackson & Mackevicius (2024) added
Added reference to Jackson & Mackevicius (2024) meta-analysis of 31 studies as the strongest current evidence for the 5/5 rating.
Class Size tier corrected: evidenceStrength 3→4
Perplexity Computer fact-check found that Class Size evidence strength was understated. The causal evidence base (STAR RCT + multiple natural experiments) is strong; the site was conflating evidence quality with cost-effectiveness. Verdict rewritten to separate the two.
School Choice verdict updated: broader urban charter gains
Updated verdict to reflect CREDO 2023 finding that urban charter gains have broadened beyond pure "No Excuses" schools.
Early Childhood verdict: Boston Pre-K note added
Added note that Boston Pre-K shows sustained gains are achievable with high program quality, as a counterexample to the Tennessee Pre-K fadeout.
High-Dosage Tutoring verdict: scale caveat added
Added note that scale-up effects are smaller (d ≈ 0.10–0.20 at district scale) vs. d = 0.37 in RCTs. Large-scale implementations (Chicago SAGA, Houston) show smaller effects.
Removed floating update popup
Eliminated the bottom-right toast notification that duplicated the gold announcement banner. The top banner is the single source of site-wide updates.
Last-reviewed dates on cluster cards
Each cluster card on the /clusters overview page now shows the month and year the cluster evidence was last reviewed.
Methodology callout on cost-effectiveness chart
Added a blue methodology note beneath the cost-effectiveness chart explaining how cost per 0.1 SD is calculated and which interventions are excluded due to insufficient cost data.
Share button on cluster detail pages
A one-click Share button on each cluster detail page copies the page URL to clipboard, making it easy to share a specific cluster with colleagues.
Last-reviewed dates on cluster cards
Each cluster card on the /clusters overview page now shows the month and year the cluster evidence was last reviewed.
Share button on cluster detail pages
A one-click Share button on each cluster detail page copies the page URL to clipboard, making it easy to share a specific cluster with colleagues.
Removed floating update popup
Eliminated the bottom-right toast notification that duplicated the gold announcement banner. The top banner is the single source of site-wide updates.
Methodology callout on cost-effectiveness chart
Added a blue methodology note beneath the cost-effectiveness chart explaining how cost per 0.1 SD is calculated and which interventions are excluded due to insufficient cost data.
Filter buttons added to Clusters page
Added All/Strong/Moderate/Mixed/Weak filter buttons to the Clusters page so visitors can instantly surface only clusters of a given evidence tier. Each tier button shows a count.
Compact cost-effectiveness chart added to Clusters page
Added a collapsible Cost-Effectiveness Chart button to the Clusters page that reveals a compact bar chart showing cost per 0.1 SD gain across 7 interventions. Systematic phonics is the standout.
Tier legend added to Clusters page header
Added a four-tile legend to the Clusters page header explaining Strong/Moderate/Mixed/Weak evidence tiers with color coding and plain-language descriptions.
Key Takeaway callout added to all 10 cluster detail pages
Each cluster detail page now opens with a color-coded Key Takeaway callout box showing the evidence tier and a one-sentence bottom-line verdict, giving practitioners an immediate summary before reading the full synthesis.
Evidence tier badges added to Clusters overview page
Each cluster card on the /clusters overview page now shows a color-coded evidence tier badge (Strong / Moderate / Mixed / Weak) and a one-line verdict beneath the key finding, matching the homepage cards.
Citation Drift added as fifth cross-cutting theme
A new cross-cutting theme documents the systematic amplification of findings as they pass through the citation chain, grounded in three specific cases: Teacher VAMs (Chetty et al. 2014), Perry Preschool IRR (Heckman 2010), and Grit (Duckworth 2007). Cites Greenberg (2009) and Sims (2023).
Evidence tier badges and verdict lines added to cluster cards
Each cluster card on the homepage now shows a color-coded evidence tier label (Strong/Moderate/Mixed/Weak) and a one-line bottom-line verdict, making it easier to compare clusters at a glance.
Footer paper count corrected from 64 to 62
The site footer still said "64 cited papers" — corrected to 62 to match the ground-truth count from the bibliography audit.
Cost-effectiveness toggle added to Evidence at a Glance chart
The chart now has two views: Effect Size (Cohen's d) and Cost-Effectiveness (approximate cost per 0.1 SD gain). Systematic phonics is the standout finding: large effect at very low cost (~$100/student per 0.1 SD vs. ~$13,600 for class size reduction). Sources: Yeh 2010, Kraft 2015, RAND 2022.
Methodology step order corrected on Home and About pages
Manus AI is now correctly listed first as the primary orchestrator. Semantic Scholar moved to Step 3. Gemini added to Step 4 alongside Claude for quantitative verification. AI_TOOLS list on About page reordered to match.
Paper counts corrected: 62 cited / 120 reviewed
Audited the bibliography file (120 entries) and manuscript citation keys (62 unique \cite{} calls) to establish ground-truth counts. Corrected all instances of the previous figures (61 cited / 119 reviewed, and the erroneous 64 cited) across the LaTeX manuscript (abstract and introduction), the compiled PDF, the site homepage, and the About page. Both GitHub repos updated.
Changelog admin form added to /admin page
Added a ChangelogLogger component to the owner-only /admin page. The form accepts date, version, title, description, category, severity, and optional affected cluster. Recent entries are shown with delete buttons.
Gemini attribution added to JJP replication note
Added Gemini (Google DeepMind) to the author block of jjp_replication_note.tex. PDF recompiled and pushed to both GitHub repos (k12-education-research and k12evidence-public).
Vitest test suite expanded to 48 tests
Added 13 new vitest tests covering changelog.list (5 tests), changelog.create (5 tests), and changelog.delete (3 tests), including owner-gate rejection tests. All 48 tests across 6 files pass.
Public changelog page added
Added this public changelog page to track all substantive changes to the site, manuscript, and bibliography over time. The changelog is database-backed and can be updated from the /admin page.
Gemini (Google DeepMind) added to AI attribution
Gemini was already credited in the LaTeX manuscript but was missing from the website's About page workflow descriptions and footer attribution. Added to the AI-Assisted Drafting step description, the multi-agent workflow paragraph, and the footer AI assistance line.
Perplexity Computer fact-check: 12 corrections applied
Full front-to-back fact-check using Perplexity Computer identified and corrected 12 issues: (1) Cook et al. (2015) Saga tutoring effect size corrected; (2) Abdulkadiroğlu et al. (2016) attribution fixed from New York City to New Orleans/Boston takeover schools; (3) CREDO Urban vs. Online Charter Study distinction clarified; (4) JJP (2016) version consistency improved; (5) Heckman Perry Preschool IRR vs. ROI terminology corrected (7–10% IRR, not 7–12% ROI); (6) Krueger (1999) STAR kindergarten effect corrected to ~4–5 percentile points (d ≈ 0.20); (7) Nickow et al. (2020) group-size effect sizes corrected (1-on-1: 0.38, 1-to-2: 0.29, 1-to-3+: 0.36); (8) Hansford et al. (2025) title corrected; (9) Wolf, Magnuson & Kimbro (2017) title and journal corrected; (10) Paper counts corrected to 62 cited / 120 reviewed site-wide; (11) Duplicate wmann2006 bibliography entry removed; (12) Yeager et al. (2019) peer norms language corrected from precondition to moderator.
Announcement banner and subscriber email broadcast sent
Announcement banner posted for the Round 20 corrections (auto-expiring end of day May 11, 2026). Subscriber email broadcast sent to all confirmed subscribers with a summary of the Hanushek (1986) addition and the Cluster 1 effect size correction.
Hanushek (1986) added to Cluster 3 — Class Size
Added Hanushek's (1986) foundational meta-analysis of 147 educational production function studies to the Cluster 3 cost-effectiveness section. Of 112 class-size estimates, only 9 showed a statistically significant positive relationship — the intellectual predecessor to Jepsen & Rivkin (2009) and Wößmann & West (2006).
Cluster 1 effect size badge corrected: d = 0.10–0.20 → d ≈ 0.10–0.15
The Cluster 1 (Teacher Quality) effect size badge was corrected from d = 0.10–0.20 to d ≈ 0.10–0.15. The narrower range reflects what Rivkin, Hanushek & Kain (2005) and Rockoff (2004) actually report. The wider range was conflating their lower-bound estimates with the upper bound from Nye et al. (2004).
Notification system added
Added site-wide announcement banner (dismissible, admin-controlled), owner push notifications for new subscribers and contact form submissions, subscriber email broadcast system, and in-app 'New' badges on recently updated cluster cards.
Bibliography count corrected: 122 → 119 reviewed papers
Three phantom bibliography entries (campbell2021, rodriguez2018, tan2020) were identified and removed from the .bib file. All site-wide paper count references updated from 122 to 119.
Website launched at k12evidence.org
Public-facing website launched with practitioner summaries for all 10 research clusters, bibliography, replication note, and about page. Built with React 19 + Tailwind 4 + tRPC.
Quantitative verification of all claims against primary source PDFs
Every effect size, sample size, confidence interval, and p-value in the manuscript was verified against the primary source PDF using direct API-based text extraction via PyMuPDF/fitz and Claude (claude-opus-4). Claims that could not be verified were flagged or removed.
Initial literature review published
Systematic evidence synthesis of 124 papers across 10 K-12 research clusters published. Clusters: Teacher Quality, Early Childhood Education, Class Size, School Funding, School Choice, Reading Instruction, High-Dosage Tutoring, SEL & Non-Cognitive Skills, Out-of-School Factors, International Systems.