CaTCCH
.jpg)
The Problem
Existing content moderation tools focused on abusive language, misinformation, and calls for violence, but they failed to measure how content influenced democratic health. CaTCCH expanded this scope by targeting civic harm—specifically:
Dehumanization – Content that stripped groups or individuals of dignity, reinforcing division. | Partisan Violence – Content that normalized or incited political violence.
While many classifiers labeled harmful content, they rarely measured its actual impact on user attitudes and behaviors. CaTCCH shifted the focus from simple content tagging to outcome-based evaluation, ensuring classifiers identified content that truly threatened democracy.
.png)
Our Approach: Building and Testing Smarter Classifiers
CaTCCH went beyond traditional moderation approaches by developing a rigorous testing framework that evaluated how content influenced real-world civic engagement.
Key Features: Testing Classifier Effectiveness – Compared leading classifiers (e.g., Moral Foundations, Google Perspective, Toxic-bert) to see which best identified content that affected civic health. | Experimental Field Testing – Used browser-based experiments to measure how exposure to classified content changed user beliefs and behaviors over time. | LLM-Powered Innovations – Integrated AI-driven classification models (e.g., GPT-4) to improve detection of both harmful and prosocial content. | Outcome-Based Approach – Evaluated classifiers not just on accuracy, but on real-world civic impact, ensuring they reduced polarization without enabling censorship.
.png)
Why This Mattered
With AI-generated content flooding online spaces, the risk of toxic discourse scaling beyond human moderation was growing. CaTCCH helped platforms, researchers, and policymakers understand how content impacted democracy—and how to intervene responsibly.
By moving beyond simple content labeling and toward measurable civic impact, CaTCCH ensured that online interventions actually strengthened democratic norms, rather than just removing problematic content.
.png)
.png)
.png)

.png)