Unmasking Implicit Bias: Evaluating Persona-Prompted LLM Responses in Power-Disparate Social Scenarios

Large Language Models (LLMs) show remarkable abilities but can perpetuate societal biases. This interactive visualisation explores how LLM responses change based on assigned demographic personas (Subject 'Alex' vs Responder 'Blake') in different dual-persona social scenarios, especially under power imbalances.

The heatmap displays Demographic Sensitivity (Cosine Distance) or Response Quality (Win Rate), calculated by comparing demographically-prompted responses against baseline non-demographically-prompted responses.

Crucially, focus not on the absolute values, but on the differences in these metrics between various demographic groups. Large disparities often indicate potential implicit biases, showing the model treats certain demographic combinations differently.

Use the dropdowns to select the Model, Metric, and Power Disparity filter. Hover and click on heatmap cells for details.

Based on the paper: "Unmasking Implicit Bias..." (arXiv:2503.01532) | Download Research Data

About This Visualisation & Study

This tool explores potential implicit biases based on this paper. LLMs responded in simulated social scenarios as a Responder ('Blake') to a Subject ('Alex'), assigned different demographic identities (e.g., Race).

Metrics Explored:

How to Use:

  1. Use dropdowns for Model, Metric, and Power Disparity.
  2. Optionally explore , , .
  3. Heatmap shows average metric for Subject (Y-axis) vs Responder (X-axis) demographics.
  4. Hover cells for values/counts. Click cells for detailed examples pop-up.
  5. The table above the controls dynamically summarises heatmap extremes.

Key Findings From The Paper

These are general trends observed across the models and scenarios studied:

Note: These are general findings based on the study's overall dataset. Explore the heatmap and specific interpretations below for model-specific details under different conditions.

Scenario Explorer (0 total)

Loading scenarios...

Demographic Axes and Identities

Loading demographics...

Heatmap Interpretations

Select controls below to view interpretations.

Overall Mean: N/A | Overall Std Dev: N/A