LLM Implicit Bias Visualisation (Plotly)

Large Language Models (LLMs) show remarkable abilities but can perpetuate societal biases. This interactive visualisation explores how LLM responses change based on assigned demographic personas (Subject 'Alex' vs Responder 'Blake') in different dual-persona social scenarios, especially under power imbalances.

The heatmap displays Demographic Sensitivity (Cosine Distance) or Response Quality (Win Rate), calculated by comparing demographically-prompted responses against baseline non-demographically-prompted responses.

Crucially, focus not on the absolute values, but on the differences in these metrics between various demographic groups. Large disparities often indicate potential implicit biases, showing the model treats certain demographic combinations differently.

Use the dropdowns to select the Model, Metric, and Power Disparity filter. Hover and click on heatmap cells for details.

Based on the paper: "Unmasking Implicit Bias..." (arXiv:2503.01532) | Download Research Data

About This Visualisation & Study

This tool explores potential implicit biases based on this paper. LLMs responded in simulated social scenarios as a Responder ('Blake') to a Subject ('Alex'), assigned different demographic identities (e.g., Race).

Metrics Explored:

Cosine Distance (Sensitivity): Measures semantic change vs. baseline. Calculated as 1 - cosine_similarity(demog_prompted_response, non_demog_prompted_response). Higher values = greater semantic distance from non-demographically-prompted response. Demographic combinations leading to low-cosine-distance responses are are considered an LLM's 'default'.
Win Rate (Quality): Perceived quality vs. baseline (judged by LLM on Helpfulness, Honesty, Harmlessness).
- 1: Demog. response preferred.
- 0.5: Draw.
- 0: Non-Demog. Baseline preferred.
Values > 0.5 suggest comparatively better quality with demog.; values < 0.5 suggest a comparative drop. Different demographic combinations produce responses with varying win rates and hence varying quality.

How to Use:

Use dropdowns for Model, Metric, and Power Disparity.
Optionally explore , , .
Heatmap shows average metric for Subject (Y-axis) vs Responder (X-axis) demographics.
Hover cells for values/counts. Click cells for detailed examples pop-up.
The table above the controls dynamically summarises heatmap extremes.

Key Findings From The Paper

These are general trends observed across the models and scenarios studied:

Default Persona Bias: LLM responses often implicitly lean towards a "default persona" resembling a middle-aged, able-bodied, native-born, Caucasian, atheistic male with centrist political views. This means responses might change less (lower Cosine Distance) when personas match this default.
Response Quality Variation: Interactions involving certain specific demographics were associated with comparatively lower-quality responses (lower Win Rate vs. baseline).
Impact of Power Disparity:
- The presence of power disparities between the Subject (SUB) and Responder (RES) tends to increase the variability in both response meaning (Cosine Distance) and quality (Win Rate) across different demographic groups.
- This suggests that implicit biases might be more pronounced or activated under conditions of unequal power.

Note: These are general findings based on the study's overall dataset. Explore the heatmap and specific interpretations below for model-specific details under different conditions.

Unmasking Implicit Bias: Evaluating Persona-Prompted LLM Responses in Power-Disparate Social Scenarios

About This Visualisation & Study

Metrics Explored:

How to Use:

Key Findings From The Paper

Scenario Explorer (0 total)

Demographic Axes and Identities

Heatmap Interpretations

Unmasking Implicit Bias: Evaluating Persona-Prompted LLM Responses in Power-Disparate Social Scenarios

About This Visualisation & Study

Metrics Explored:

How to Use:

Key Findings From The Paper

Scenario Explorer (0 total)

Demographic Axes and Identities

Heatmap Interpretations

Detailed Examples