IDIOM - Intelligent Detection of Hate Speech


European Commission Grant Proposal

PROPOSAL REFERENCE: JUST-2018-IDIOM-652847

TOTAL BUDGET: €1,660,000

DURATION: 36 months

SUBMISSION DATE: 2018

LEAD ORGANIZATION & CONSORTIUM

LEAD PARTNER: University of Maribor, Faculty of Electrical Engineering and Computer Science (SLOVENIA)

  • Project Coordination

  • Machine Learning Architecture Development

  • Data Management Infrastructure

  • Personnel: Prof. Aleš Holobar (Project Manager)

CONSORTIUM MEMBERS (5 countries)

  • SLOVENIA

    University of Maribor - Lead partner (ML/AI/Project Management) Alcyone d.o.o. - SME (Cloud Infrastructure/API Development)

  • ISRAEL
    Talpiot College (Technology Integration/Field Testing)

  • JORDAN University of Jordan (Arabic Language Processing/Regional Coordination)

  • CYPRUS University of Cyprus (AI Systems/EU Coordination/Quality Assurance)

  • MOROCCO Morocco Forum for Tolerance NGO (Civil Society/Community Engagement)

PROJECT OVERVIEW

Challenge Addressed

Hate speech proliferates across digital platforms at unprecedented scale. Major platforms receive 500+ million posts daily—far exceeding human moderation capacity. Current response times average 18-24 hours. Multilingual hate speech detection remains critically underdeveloped, leaving non-English speakers inadequately protected. Vulnerable populations face systematic harassment with inadequate recourse.

Innovation

IDIOM represents the first comprehensive European multilingual hate speech detection system combining:

  • Deep learning neural networks for 4 languages (English, Arabic, Hebrew, Slovenian)

  • Scalable REST API enabling enterprise platform integration

  • Open-source licensing ensuring unlimited adoption beyond project funding

  • Distributed regional development ensuring cultural and linguistic expertise

  • Continuous learning architecture enabling ongoing improvement

Expected Outcomes

  • Functional multilingual hate speech detection platform

  • 15+ platform integrations protecting 50,000+ users

  • 94-97% hate speech detection accuracy

  • 35-40% reduction in published hate speech on integrated platforms

  • 500+ researcher community advancing multilingual NLP

  • Open-source technology enabling community development

WORK PACKAGES SUMMARY

WORK PACKAGE 1: PROJECT MANAGEMENT & COORDINATION

Lead Partner: University of Maribor

Duration: 36 months

Budget: €120,000

Personnel: 18 person/months

Key Activities

  • Project kick-off meeting and team establishment

  • Continuous project coordination and oversight

  • Financial management and administration

  • Bi-weekly steering committee meetings

  • Annual consortium meetings (Months 12, 24)

  • Project management system administration

  • Project management system administration

WORK PACKAGE 2: DATA COLLECTION & ANNOTATION

Lead Partner: University of Maribor

Duration: 24 months (intensive), ongoing

Budget: €280,000

Personnel: 85 person/months

Objectives

  • Integrate 8 external data sources

  • Collect 150,000+ hate speech examples

  • Create validated, labeled dataset for training

  • Ensure GDPR compliance and ethical data handling

Key Activities

  • Data source integration (Twitter, Reddit, 4Chan, YouTube, Facebook, news sites, forums, platform partnerships)

  • Automated data pre-processing and normalization

  • Annotation framework development (taxonomy, procedures, standards)

  • Human annotation (30-40 annotators; 150+ people-hours per 1,000 items)

  • Data quality validation and inter-annotator agreement

  • Dataset documentation and metadata

  • Secure data storage and access control

Key Milestones

  • M6: Data source integration complete; collectors operational

  • M12: 100,000 records collected and partially annotated

  • M18: 150,000 records collected and fully annotated

  • M24: Dataset v1.0 finalized; documentation complete

WORK PACKAGE 3: DEEP LEARNING MODEL DEVELOPMENT

Lead Partner: Bar-Ilan University

Duration: 18 months intensive, 36 months total

Budget: €380,000

Personnel: 50 person/months

Objectives

  • Design deep learning neural networks for language classification

  • Train hate speech detection networks for 4 languages

  • Achieve 94-97% precision in hate speech identification

  • Implement continuous learning system

Key Milestones

  • M9: Language classification architecture finalized

  • M12: Language classification network trained (>98% accuracy)

  • M18: All hate speech detection networks trained and validated

  • M22: Production models optimized for real-time inference

  • M24: Continuous learning system designed

WORK PACKAGE 4: REST API DEVELOPMENT & INTEGRATION

Lead Partner: Alcyone d.o.o.

Duration: 18 months intensive, 36 months total

Budget: €320,000

Personnel: 45 person/months

Objectives

  • Design and implement scalable REST API

  • Achieve 1,000+ requests/second throughput

  • Maintain <100ms latency

  • Integrate with 8 pilot platforms

  • Provide comprehensive documentation

Key Milestones

  • M12: API architecture finalized

  • M15: Core API implementation complete

  • M18: API performance optimized

  • M20: Client libraries complete; integration testing finalized

  • M24: All pilot platforms live; performance data collected

WORK PACKAGE 5: DISSEMINATION, TRAINING & SUSTAINABILITY

Lead Partner: University of Cyprus

Duration: 24 months intensive, 36 months total

Budget: €280,000

Personnel: 35 person/months

Objectives

  • Deliver training for 85+ platform administrators and moderators

  • Publish 4+ peer-reviewed research papers

  • Present at 8+ international conferences

  • Develop comprehensive project website

  • Plan long-term sustainability

Key Milestones

  • M12: Website launched

  • M18: Publications submitted; training programs begin

  • M20: First policy brief released

  • M30: Training completed (200+ trained participants)

  • M24: All pilot platforms live; performance data collected

BUDGET BREAKDOWN

TOTAL PROJECT BUDGET: €1,660,000

Budget by Work Package

Budget by Cost Category

Partner Cost Distribution

SUSTAINABILITY & LONG-TERM IMPACT

Post-Project Continuation Mechanisms

  • Platform Partnerships: 15+ digital platforms integrating IDIOM commit to licensing fees; projected revenue €400,000+ annually post-project

  • Research Funding: Consortium universities secure EU research funding (H2020, Horizon Europe); estimated €300,000+ annually available

  • Foundation Support: Civil society funding from human rights organizations; estimated €200,000+ annually

  • Government Support: MOSAIC25 regional cooperation framework provides institutional backing; estimated €150,000+ annually

  • Open-Source Community: MIT license enabling unlimited worldwide use; distributed development ensuring resilience

Projected Long-Term Impact (12-36 months post-completion)

Reach

  • Platform adoption: 50+ platforms (vs. 15 pilot)

  • User protection: 200+ million Europeans

  • Researcher community: 500+ institutions

  • Language support: 8+ additional languages via community

Research & Knowledge

  • Publications: 100+ papers citing IDIOM

  • Research data: Public multilingual hate speech datasets

  • Best practices: Templates for international tech governance

Policy Influence

  • EU Digital Services Act: IDIOM referenced as evidence-based solution

  • National regulations: Adoption into government infrastructure

  • International cooperation: Model for cross-border justice technology

Economic

  • Job creation: 100-150 positions in integration, support, development

  • Commercial ecosystem: Platforms and services built on IDIOM

  • Regional development: Tech sector growth in MOSAIC25 countries

Social

  • Protected communities: Reduced hate speech targeting minorities

  • Moderator wellbeing: Reduced trauma through AI assistance

  • Online safety: Faster response to harmful content

  • Digital rights: Evidence supporting equitable online environments

KEY METRICS & SUCCESS CRITERIA

Technical Performance

  • Model Accuracy: 94-97% precision hate speech detection

  • API Performance: 1,000+ requests/second at <100ms latency

  • System Uptime: 99.5%+

  • Language Coverage: 4 priority languages + expandable architecture

Adoption & Reach

  • Platform Integrations: 15 minimum by Month 24

  • Trained Personnel: 85+ (35+ administrators; 50+ moderators)

  • User Protection: 50,000+ direct beneficiaries

  • Researcher Community: 500+ researchers engaged

Research & Knowledge

  • Publications: 4+ peer-reviewed papers in top venues

  • Conferences: 8+ international presentations

  • Open-source: GitHub repository with 500+ stars

  • Training: Training curriculum reaching 200+ participants

Sustainability

  • Identified Funding: €400,000+ annually (platforms + research + foundations)

  • Community Engagement: 200+ active open-source contributors

  • Institutional Commitment: Long-term funding pledges from partners

  • Governance Structure: Established for post-project continuation

Impact

  • Hate Speech Reduction: 35-40% on integrated platforms

  • Response Time: 70% improvement (18-24 hours → 8-12 hours)

  • Policy Influence: 3+ policy briefs informing regulations

  • Regional Cooperation: Demonstrated cross-border tech governance model

ETHICAL CONSIDERATIONS

The IDIOM project addresses significant ethical challenges:

Algorithmic Bias & Fairness

  • Diverse development teams ensure cultural representation

  • Systematic fairness evaluation across demographic groups

  • Community feedback mechanisms identifying bias concerns

  • Transparent bias documentation and mitigation

Privacy & Data Protection

  • GDPR compliance for all data handling

  • Informed consent and anonymization procedures

  • Secure storage and encryption

  • Data minimization principles

Freedom of Speech Protection

  • Clear legal definitions aligned with EU law

  • Conservative detection thresholds minimizing false positives

  • Human review before all moderation actions

  • Appeals mechanisms for contested decisions

Equity & Inclusion

  • Multilingual approach ensuring linguistic equity

  • Civil society partnership ensuring marginalized voices heard

  • Community advisory boards for ongoing accountability

  • Transparent decision-making processes

Surveillance & Misuse Prevention

  • Open-source code enabling external oversight

  • Governance structure preventing authoritarian control

  • License restrictions on non-democratic deployments

  • Active monitoring of deployment contexts

CONSORTIUM EXPERTISE SUMMARY

Machine Learning & AI

  • University of Maribor: 15+ years international ML projects; 30+ researchers

  • Bar-Ilan University: Leading NLP research; 40+ researchers; 200+ publications

  • University of Cyprus: AI systems research; 7,000 students; 350 research projects

Multilingual NLP

  • Bar-Ilan University: Hebrew and Arabic language processing expertise

  • University of Jordan: Arabic dialect processing; Middle East regional knowledge

  • University of Maribor: Slovenian language; Central European perspective

Platform Integration & Deployment

  • Alcyone d.o.o.: 12 employees; enterprise solutions specialization

  • Talpiot College: Operational technology deployment; 25+ years experience

Civil Society & Community Engagement

  • Morocco Forum for Tolerance: 50+ staff; regional NGO networks

  • Lived experience with hate speech victims and communities

European Coordination

  • University of Cyprus: 12+ years managing EU research projects

  • All partners: Extensive H2020 and FP7 project experience

Combined Expertise

  • 100+ peer-reviewed publications in AI/NLP

  • 80+ international projects as leaders or key experts

  • 2,000+ years cumulative experience

  • Multi-generational team (established researchers + emerging talents)

CONCLUSION

The IDIOM project represents a significant investment in European technological leadership addressing a Union-wide challenge: online hate speech and its impact on vulnerable populations and democratic discourse.

Through this project, the European Union will:

  • Develop evidence-based solutions for digital platform governance

  • Advance multilingual AI research contributing to global knowledge

  • Demonstrate effective cross-border collaboration on justice technologies

  • Support vulnerable communities through faster, more equitable responses

  • Establish open-source infrastructure enabling long-term sustainable impact

  • Influence regulatory frameworks with empirical evidence

  • Build regional innovation capacity in MOSAIC25 countries

The project is technically sound, strategically aligned with EU priorities, and backed by a consortium of leading research institutions and civil society organizations across Europe. Sustainability mechanisms ensure continued impact long after project completion.

IDIOM demonstrates Europe's commitment to responsible innovation serving justice, security, and fundamental rights in the digital age.

Scroll to Top