Polylogue: A Multi-Persona Debate Simulation Framework

Published:

Developed an automated evaluation pipeline assessing environmental and ethical alignment across large language models (Phi-4, Llama-3.2-8B), processing 12K+ model–persona interactions. Benchmarked >1,000 sustainability scenarios across 4 diverse personas, revealing 35% decision inconsistency and quantifying bias correlation (Cramér’s V = 0.42) to inform fairness-aware model tuning.

Technologies: Python, Large Language Models, Fairness Evaluation, Statistical Analysis

Key Achievements:

  • Processed 12K+ model–persona interactions
  • Benchmarked >1,000 sustainability scenarios
  • Identified 35% decision inconsistency across personas
  • Quantified bias correlation (Cramér’s V = 0.42)