accepted Accepted Apr 30, 2026

Accepted to IJCAI-ECAI 2026 · IJCAI-ECAI 2026 Special Track (Main Conference)

ICFD-31k: A Large-Scale Dataset and Benchmark for Real-Time Conversational Fraud Detection

Department of Information Technology, Dr. B.R. Ambedkar National Institute of Technology Jalandhar

Accepted to the IJCAI-ECAI 2026 main conference special track. Presentation scheduled in Bremen, Germany (Aug 15-21, 2026).

TL;DR

ICFD-31k introduces 31,000+ Indian English/Hinglish fraud-call transcripts with chunk-level streaming labels and slow-thinking rationales, plus RoBERTa baselines that reach 99.40 F1 in-domain and 92.97 F1 on unseen scam types.

Abstract

The proliferation of sophisticated telephone scams poses a significant societal and economic threat, impacting diverse linguistic contexts in a country like India. Furthermore, the lack of large-scale, publicly available datasets remains a critical barrier impacting research on robust, real-time countermeasures. In view of this, the proposed work introduces ICFD-31k, the first Indian Conversational Fraud Dataset, representing a new benchmark containing over 31,000 realistic conversational transcripts. ICFD-31k comprises systematically generated content, covering 10 distinct fraud umbrellas spanning from financial impersonation to job scams. ICFD-31k transcripts feature rich annotations comprising a final verdict, chunk-level streaming labels, and detailed slow-thinking rationales. In addition, the human-in-the-loop evaluation validates the ICFD-31k's quality, achieving a Cohen's Kappa of 0.534 that confirms annotation reliability. Furthermore, the proposed work introduces two fine-tuned models based on RoBERTa: M1 for non-streaming data and M2 for streaming data. The comprehensive experiments with strong baselines (M1, M2) further demonstrate the ICFD-31k's utility.

Highlights

  • 31,000+ realistic conversational transcripts built for the Indian fraud landscape.
  • Bilingual coverage across English and Hinglish, including code-switching patterns.
  • Chunk-level streaming labels for real-time fraud detection experiments.
  • Slow-thinking rationales for explainability and auditability.
  • RoBERTa baselines for both static and streaming settings.

Why it matters

Most fraud-detection datasets either focus on transactions, emails, or monolingual contexts. This paper pushes toward a more deployable benchmark for live call settings, where the hard part is not just classification, but early intervention under incomplete context.

Release note

I will add camera-ready metadata and any public artifacts here once the final proceedings and release flow are ready.

More Research