ICFD-31k: A Large-Scale Dataset and Benchmark for Real-Time Conversational Fraud Detection

^†Department of Information Technology, Dr. B.R. Ambedkar National Institute of Technology Jalandhar

Accepted to the IJCAI-ECAI 2026 main conference special track. Presentation scheduled in Bremen, Germany (Aug 15-21, 2026).

IJCAI-ECAI 2026 venue

TL;DR

ICFD-31k introduces 31,000+ Indian English/Hinglish fraud-call transcripts with chunk-level streaming labels and slow-thinking rationales, plus RoBERTa baselines that reach 99.40 F1 in-domain and 92.97 F1 on unseen scam types.

Abstract

The proliferation of sophisticated telephone scams poses a significant societal and economic threat, impacting diverse linguistic contexts in a country like India. Furthermore, the lack of large-scale, publicly available datasets remains a critical barrier impacting research on robust, real-time countermeasures. In view of this, the proposed work introduces ICFD-31k, the first Indian Conversational Fraud Dataset, representing a new benchmark containing over 31,000 realistic conversational transcripts. ICFD-31k comprises systematically generated content, covering 10 distinct fraud umbrellas spanning from financial impersonation to job scams. ICFD-31k transcripts feature rich annotations comprising a final verdict, chunk-level streaming labels, and detailed slow-thinking rationales. In addition, the human-in-the-loop evaluation validates the ICFD-31k's quality, achieving a Cohen's Kappa of 0.534 that confirms annotation reliability. Furthermore, the proposed work introduces two fine-tuned models based on RoBERTa: M1 for non-streaming data and M2 for streaming data. The comprehensive experiments with strong baselines (M1, M2) further demonstrate the ICFD-31k's utility.

Highlights

31,000+ realistic conversational transcripts built for the Indian fraud landscape.
Bilingual coverage across English and Hinglish, including code-switching patterns.
Chunk-level streaming labels for real-time fraud detection experiments.
Slow-thinking rationales for explainability and auditability.
RoBERTa baselines for both static and streaming settings.

Why it matters

Most fraud-detection datasets either focus on transactions, emails, or monolingual contexts. This paper pushes toward a more deployable benchmark for live call settings, where the hard part is not just classification, but early intervention under incomplete context.

Release note

I will add camera-ready metadata and any public artifacts here once the final proceedings and release flow are ready.

ICFD-31k: A Large-Scale Dataset and Benchmark for Real-Time Conversational Fraud Detection

Abstract

Highlights

Why it matters

Release note

More Research

Retrieval Mechanisms Surpass Long-Context Scaling in Time Series Forecasting