Large Language Models (LLMs) show remarkable abilities but can perpetuate societal biases. This interactive visualisation explores how LLM responses change based on assigned demographic personas (Subject 'Alex' vs Responder 'Blake') in different dual-persona social scenarios, especially under power imbalances.
The heatmap displays Demographic Sensitivity (Cosine Distance) or Response Quality (Win Rate), calculated by comparing demographically-prompted responses against baseline non-demographically-prompted responses.
Crucially, focus not on the absolute values, but on the differences in these metrics between various demographic groups. Large disparities often indicate potential implicit biases, showing the model treats certain demographic combinations differently.
Use the dropdowns to select the Model, Metric, and Power Disparity filter. Hover and click on heatmap cells for details.
Based on the paper: "Unmasking Implicit Bias..." (arXiv:2503.01532) | Download Research Data
This tool explores potential implicit biases based on this paper. LLMs responded in simulated social scenarios as a Responder ('Blake') to a Subject ('Alex'), assigned different demographic identities (e.g., Race).
1 - cosine_similarity(demog_prompted_response, non_demog_prompted_response)
. Higher values = greater semantic distance from non-demographically-prompted response. Demographic combinations leading to low-cosine-distance responses are are considered an LLM's 'default'.These are general trends observed across the models and scenarios studied:
Note: These are general findings based on the study's overall dataset. Explore the heatmap and specific interpretations below for model-specific details under different conditions.
Loading scenarios...
Loading demographics...
Select controls below to view interpretations.