Polylogue: A Multi-Persona Debate Simulation Framework
Published:
Developed an automated evaluation pipeline assessing environmental and ethical alignment across large language models (Phi-4, Llama-3.2-8B), processing 12K+ model–persona interactions. Benchmarked >1,000 sustainability scenarios across 4 diverse personas, revealing 35% decision inconsistency and quantifying bias correlation (Cramér’s V = 0.42) to inform fairness-aware model tuning.
Technologies: Python, Large Language Models, Fairness Evaluation, Statistical Analysis
Key Achievements:
- Processed 12K+ model–persona interactions
- Benchmarked >1,000 sustainability scenarios
- Identified 35% decision inconsistency across personas
- Quantified bias correlation (Cramér’s V = 0.42)
