Back to Projects

Detecting Online Polarization (SemEval-2026 Task 9)

Completed
NLP
Sep 2025 – Jan 2026

Overview

Multilingual polarization detection (binary classification: Polarized vs Non-Polarized) with emphasis on fair evaluation (Macro-F1).

Problem

Online polarization is a growing societal concern, but detecting it automatically across multiple languages remains challenging due to linguistic diversity and class imbalance.

Solution

A multi-stage pipeline from TF-IDF baselines through transformer models (XLM-R, mDeBERTa-v3, RemBERT) with focal loss, weighted sampling, and soft-voting ensembles.

Highlights

  • TF-IDF + Logistic Regression baseline
  • Neural baselines: BiLSTM, BiLSTM + Attention with language identity features
  • Transformers: XLM-R (base/large), InfoXLM, mDeBERTa-v3, RemBERT
  • Imbalance strategies: focal loss, weighted sampling / inverse-frequency weighting
  • Ensembling: soft voting / weighted combination across models
  • Parameter-efficient tuning exploration (QLoRA-style) for compute constraints

Tech Stack

PythonPython
HuggingFace Transformers
NLP
Multilingual ML

Related Projects