Back to HyperDeteX

HYPERDETEX WHITEPAPER

Version 1.0 - May 2026

Table of Contents

1. Executive Summary

Overview

HyperDeteX is a decentralized platform for synthetic voice detection. Our production model was internally audited in April 2026 on a balanced 400-sample test set: it flags 100% of synthetic speech generated by ElevenLabs, OpenAI, Azure, Deepgram and Voxtral, and correctly classifies 99.48% of real voices from LibriSpeech and VoxPopuli. These results are strong but in-distribution only: they do not yet measure robustness to replay attacks, unseen TTS vendors, telephony audio, or non-English languages. The model is retrained continuously on community-contributed samples precisely to close that gap. A blockchain-based incentive layer on Base rewards contributors who expand and diversify this dataset.

Our Mission

To protect the authenticity of voice communication in the digital age by developing accessible and effective detection solutions, supported by an engaged community.

Our Vision

To become the global standard for synthetic voice detection, establishing a trust framework for digital voice communications.

Key Objectives

  • Preserve the audited 100% fake-detection and 99.48% real-classification rates across the 5 covered TTS vendors (ElevenLabs, OpenAI, Azure, Deepgram, Voxtral) via continuous community-driven retraining
  • Extend coverage to telephony audio, replay attacks, and additional languages (DE, FR, IT) — measured against public benchmarks
  • Scale the contributed dataset from the audit baseline to 10k+ samples by Q1 2027, balanced across languages, channels and TTS vendors
  • Reach sub-300 ms inference on 2 s audio chunks to make detection viable on streaming and live-call audio

2. Introduction

In an era where ElevenLabs, OpenAI, Azure Neural HD, Deepgram and Mistral Voxtral produce voices indistinguishable from real ones, existing detection methods relying on hand-crafted spectral features fail to generalize. HyperDeteX uses a deep learning model that operates directly on raw waveforms and, per our April 2026 internal audit, flags 100% of digital-to-digital synthetic speech from the five providers above and classifies 99.48% of real voices correctly on LibriSpeech + VoxPopuli. The same model is then continuously retrained on community-contributed samples to extend coverage to replay attacks, unseen vendors, telephony audio and multiple languages.

The model uses a frozen self-supervised backbone (rich acoustic representations learned from 960 hours of speech) topped by a lightweight, task-specific classification head. This keeps the system data-efficient and fast to retrain as new TTS engines emerge — every new community sample feeds the next training cycle without architectural changes. A blockchain incentive layer on Base rewards the community contributors who keep the dataset current.

Market Context

  • 5+ major commercial TTS APIs (ElevenLabs, OpenAI, Azure, Deepgram, Mistral) now produce near-human quality voices
  • Existing spectral-feature detectors become obsolete with each new TTS model release
  • No decentralized, continuously-retrained detection standard exists today

Innovation Focus

  • Raw-waveform detection via wav2vec2 (no hand-crafted features)
  • Parameter-efficient fine-tuning: only 0.21% of params trained
  • Community dataset expansion rewarded on-chain with DTX tokens

3. Problem Statement

Current Challenges

Modern TTS systems — ElevenLabs v3/Flash v2.5, OpenAI gpt-4o-mini-tts/tts-1-hd, Azure Neural HD, Deepgram Aura 2.0, Mistral VoxTral mini-2603 — now generate voices that fool human listeners and traditional MFCC/spectral detectors alike. Classifiers trained on one TTS engine generalize poorly to others, creating a perpetual cat-and-mouse dynamic. The absence of a shared, continuously updated dataset means every organization rebuilds detection from scratch.

Critical Issues

Security Threats
  • Voice-based authentication bypass
  • Social engineering attacks
  • Identity theft and impersonation
Technical Limitations
  • MFCC/spectral detectors don't generalize across TTS engines
  • No public balanced dataset spanning multiple modern TTS APIs
  • Centralized solution bottlenecks

Market Impact

$5B+

Annual losses from voice fraud

250%

Increase in deepfake incidents

85%

Companies seeking solutions

4. Technical Solution

Architecture Overview

HyperDeteX leverages a proprietary deep learning architecture combining advanced neural networks optimized for voice authentication. Our multi-layer approach processes raw audio waveforms directly, extracting acoustic signatures that distinguish genuine human speech from AI-generated voices. The system employs a transfer learning strategy with selective fine-tuning, enabling rapid adaptation to emerging deepfake technologies while maintaining computational efficiency. Blockchain smart contracts handle dataset provenance, contributor rewards, and governance.

AI Detection Engine

  • Advanced neural architecture with multi-layer feature extraction
  • Direct raw audio processing without traditional preprocessing dependencies
  • Fast inference — verdict within seconds of submission
  • Efficient fine-tuning mechanism for rapid adaptation to new threats

Blockchain Integration

  • Smart contract verification
  • Decentralized storage system
  • Automated reward distribution
  • Immutable audit trail

v1 Audit — April 2026 (in-distribution scope)

Fake-as-fake rate

100%

ElevenLabs · OpenAI · Azure · Deepgram · Voxtral

Real-as-real rate

99.48%

LibriSpeech + VoxPopuli

Inference

47 ms

GPU optimized · 1-3 s CPU

Scope disclosure. These numbers are in-distribution on the 5 TTS vendors listed above. They do not yet cover replay attacks, voice cloning, unseen TTS engines, telephony audio or non-English speech. Public benchmark results will be published as the model is retrained on community-contributed samples (see §9 Roadmap).

5. Technical Sheet - Neural Network Model

5.1 AI Model Overview

HyperDeteX runs a continuously trained detection model, audited at 100% fake-detection and 99.48% real-classification on in-distribution samples across 5 TTS vendors. Rather than swapping architectures, the model is retrained on every cycle with new community-contributed samples, expanding coverage to replay attacks, unseen TTS engines, telephony audio, and non-English speech. The architecture is designed for rapid adaptation as new TTS engines emerge.

Core Capabilities

Processing Approach:

• Direct raw audio analysis

• Multi-layer feature extraction

• Contextual pattern recognition

Optimization:

• Efficient transfer learning

• Minimal retraining requirements

• Fast inference — verdict within seconds

5.2 Technical Approach

Our detection system employs a sophisticated multi-stage pipeline that analyzes raw audio signals through proprietary deep learning models. The approach combines modern neural architecture patterns with custom optimization techniques.

Processing Pipeline

1.
Audio Preprocessing

Direct waveform analysis without traditional feature extraction dependencies

2.
Feature Learning

Multi-layer neural networks extract acoustic signatures automatically

3.
Pattern Recognition

Advanced contextual analysis identifies synthetic voice characteristics

4.
Classification

Binary decision output with confidence scoring

5.3 System Architecture (Conceptual)

HyperDeteX Detection Pipeline


    ┌─────────────────────────────────────────────┐
    │           RAW AUDIO INPUT                   │
    │        Voice message or audio clip          │
    └─────────────────────────────────────────────┘
                       ↓
    ┌─────────────────────────────────────────────┐
    │      ACOUSTIC FEATURE EXTRACTION            │
    │   Multi-layer neural processing pipeline    │
    └─────────────────────────────────────────────┘
                       ↓
    ┌─────────────────────────────────────────────┐
    │      CONTEXTUAL PATTERN ANALYSIS            │
    │   Advanced deep learning architecture       │
    └─────────────────────────────────────────────┘
                       ↓
    ┌─────────────────────────────────────────────┐
    │      CLASSIFICATION LAYER                   │
    │   Binary decision with confidence scoring    │
    └─────────────────────────────────────────────┘
                       ↓
    ┌─────────────────────────────────────────────┐
    │              OUTPUT                         │
    │   REAL | FAKE  +  Confidence Score (%)      │
    └─────────────────────────────────────────────┘
                  

5.4 Training & Optimization

Dataset Strategy

  • • Balanced real vs. synthetic voice samples
  • • Diverse TTS provider coverage
  • • High-quality verified data sources
  • • Continuous dataset expansion

Training Approach

  • • Transfer learning with selective fine-tuning
  • • Optimized for computational efficiency
  • • Advanced data augmentation techniques
  • • Rigorous validation protocols

5.5 Performance Analysis

5.5.1 Internal Audit — In-Distribution Metrics

Fake-as-fake rate (5 TTS)100%
Real-as-real rate99.48%
ElevenLabs accuracy100%
OpenAI accuracy100%
Azure accuracy100%
Deepgram accuracy100%
Voxtral accuracy100%
What we can and cannot claim

Confirmed: 100% of digital-to-digital TTS from the 5 audited vendors is detected; real speech is classified at 99.48% on LibriSpeech + VoxPopuli; no blind spot per provider.

Not yet measured: replay attacks (microphone-played TTS), voice cloning (impersonation from short reference audio), unseen vendors, telephony audio, non-English speech, and public benchmarks. Published as the model is retrained on community-contributed samples.

5.5.2 Model Reliability

Validation Performance

  • • Balanced test set evaluation
  • • False Positive Rate: < 2%
  • • False Negative Rate: < 2%
  • • High confidence predictions

Real-World Performance

  • • Tested across diverse audio conditions
  • • Robust to background noise
  • • Language-agnostic detection

5.5.3 System Performance

Inference Speed

1–3 sec

Result delivered within seconds of submission

Efficiency

Optimized

Minimal computational overhead

Scalability

Enterprise

Supports high-volume processing

5.6 Continuous Improvement

Model Evolution

  • • Continuous learning from new deepfake techniques
  • • Adaptive model updates via community contributions
  • • Active learning for strategic data collection
  • • Regular benchmark evaluations

Privacy-Preserving Training

  • • Decentralized training architecture
  • • Local model updates with global aggregation
  • • Differential privacy guarantees
  • • Secure multi-party computation

5.7 Scaling Challenges & Mitigation Strategies

Transparency Note: As we scale from controlled datasets to community-contributed data, we anticipate performance variations that are normal in production ML systems. This section outlines expected challenges and our mitigation strategies.

Challenge 1: Distribution Shift

v1 Baseline — In-Distribution Audit

  • • Real voices: LibriSpeech + VoxPopuli, studio-quality 16 kHz
  • • Synthetic: ElevenLabs, OpenAI, Azure, Deepgram, Voxtral
  • • Digital-to-digital, controlled environment
  • Result: 100% fake / 99.48% real

v2 Target — Real-World, Voice Cloning & Telco

  • • Replay attacks, phone recordings, compressed audio
  • • Voice cloning attacks (impersonation from short reference audio)
  • • Telephony narrowband + common phone codecs (G.711, 8 kHz)
  • • Multilingual (DE/FR/IT/EN), unseen TTS vendors
  • Target: EER < 3% on public real-world benchmarks

Mitigation Strategy:

  • • Continuous retraining with community-contributed replay, voice-cloning, telephony & multilingual samples
  • • Channel-augmentation pipeline (room impulse responses, codecs, microphone profiles)
  • • Public benchmark tracking as the dataset grows
  • Generalization measured on held-out vendors, languages, and codecs

Challenge 2: Label Noise & Quality Control

Community-contributed samples may contain labeling errors, affecting training quality.

Mitigation Strategy:

  • • Multi-validator consensus mechanism (3+ validators per sample)
  • • Automated quality scoring (confidence thresholds)
  • • Outlier detection and flagging
  • • Contributor reputation system (penalties for incorrect labels)
  • • Active learning to prioritize high-uncertainty samples

Challenge 3: Concept Drift & Evolution

Deepfake technology evolves rapidly. TTS systems in 2026-2027 will be significantly more sophisticated than current models.

Mitigation Strategy:

  • • Monthly model updates with latest synthetic voice samples
  • • Adaptive learning pipeline to detect new TTS patterns
  • • Partnership with TTS providers for early access to new models
  • Performance fluctuations are normal and expected
  • • Target: EER < 5% on held-out unseen TTS generations

Expected Performance Timeline

Q1 2026 Foundation — model trained & audited, Telegram bot live.
Q2 2026:Launch — presale & DTX token launch, rewards on-chain (testnet → mainnet), community onboarding.
Q3 2026:Platform — Dashboard, continuous training improvements, web platform, new detection capabilities.
Q4 2026:Next generation — next-generation model development, B2B pilot program, dataset expansion.
Q1 2027:Recognition & scale — third-party performance audit, B2B program expansion, community growth milestones.

6. DTX Token Economics

Token Overview

DTX is an ERC-20 utility token deployed on Base L2 (Coinbase layer-2). It powers the HyperDeteX ecosystem by rewarding dataset contributors and enabling decentralized governance. A single-tier presale at $0.01/DTXprecedes a Uniswap V2 launch at $0.015/DTX (+50% upside for presale buyers).

Presale Structure

Up to 10% of total supply (15M DTX) is offered through a 5-day FOMO price ladder. Earlier days receive a better price; after Day 5 the price stabilizes until the presale closes.

DayPriceUpside vs launch
Day 1$0.010+50%
Day 2$0.011+36%
Day 3$0.012+25%
Day 4$0.013+15%
Day 5 → close$0.014+7%
TGE / Launch$0.015
ParameterValue
AllocationUp to 15,000,000 DTX (10% of supply)
Soft cap$20,000 USDC
Hard cap$125,000 USDC
Min / Max contribution$100 / $2,500 per wallet
PaymentUSDC on Base L2

TGE unlock: 15% immediate at launch + 85% released linearly, second by second, on-chain over 6 months from TGE. No cliff, no waiting period — buyers can claim at any moment.

Token Allocation (150M)

Development Fund32.5% — 48.75M
Marketing19% — 28.50M
Community Rewards18.25% — 27.38M
Team & Advisors13.25% — 19.88M
Presale10% — 15.00M
Reserve Fund5% — 7.50M
Pool Launch (LP seed)2% — 3.00M

Pool Launch is the maximum DTX reserved for LP seeding. The actual LP seed is sized to 30% of the raise at $0.015/DTX. Any unused portion of the bucket is burned 100% at launch(). Reserve splits 10% (750K) to a Trading Wallet (liquid, MM inventory) and 90% (6.75M) to 12-month linear vesting.

Token Utility

  • Contributor Rewards: Proportional to dataset quality + provider multiplier
  • Governance: Token holders steer model priorities and community programs

Liquidity & Raise Allocation

The presale raise is split three ways at finalize(). The LP is locked for 12 months and opens at the invariant price of $0.015 / DTX:

LP receiver: 30% of raise → Uniswap V2 pair

Buyback wallet: ~9% of raise → price support post-launch

Treasury: ~61% of raise → operations, dev, marketing

DTX_LP = LP_USDC / $0.015

Example (hardcap $125K): $37.5K USDC + 2.5M DTX in LP → spot $0.015. The Pool Launch bucket holds 3M DTX, so 500K DTX are burned 100% at launch() even at hardcap. The lower the raise, the larger the burn — natural deflation proportional to softness of the raise.

RaiseLP USDCDTX in LPBurned at launch
$125K (hardcap)$37,5002,500,000500,000
$75K$22,5001,500,0001,500,000
$20K (softcap)$6,000400,0002,600,000

Buyback wallet ≈ tokensSold × 7.5% × $0.015. The market-making Trading Wallet is funded separately at TGE with 750K DTX (10% of the Reserve allocation, one-time, liquid).

Vesting Schedule

All linear schedules start at TGE (not at deploy) and unlock continuously, second by second, on-chain. No cliffs, no end-of-month wait — claimable balance grows on every Base block (~2s). No tradable allocation can move before launch().

Presale (15M DTX)

15% at TGE + 85% linear continuous over 6 months

Development Fund (48.75M DTX)

1% at TGE + 95% linear continuous over 12 months

Marketing (28.50M DTX)

1% at TGE + 95% linear continuous over 12 months

Team & Advisors (19.88M DTX)

0% at TGE + 100% linear continuous over 12 months, no cliff

Reserve (7.50M DTX)

Split at TGE: 750K (10%) sent immediately to a liquid Trading Wallet (market-making inventory, one-time). Remaining 6.75M (90%) linear continuous over 12 months from TGE.

Community Rewards (27.38M DTX)

Distributed only via an on-chain oracle that signs reward attestations as users contribute valid voice samples. The reward formula and any discretionary exception rules (airdrops, giveaways, partner allocations) are disclosed before the presale opens — including how, how much, and the maximum cap.

Pool Launch (3.00M DTX)

Used at launch() to seed Uniswap LP at $0.015. Sized to exactly fit the LP at hardcap raise — any surplus is burned 100%.

Token Metrics

Total Supply

150M

Fixed, non-inflationary, no mint

Presale Entry

$0.010

Day 1, 5-day ladder to $0.014

Launch Price

$0.015

+7% to +50% upside per tier

Blockchain

Base

Coinbase L2

7. Contribution Model

Participation Framework

The HyperDeteX contribution model is designed to continuously expand the dataset beyond the initial corpus the production model was trained on. Community submissions feed each retraining cycle, targeting the dimensions the model does not yet cover well: replay attacks, voice cloning, unseen TTS engines, telephony audio, and non-English speech. The production model evolves through continuous fine-tuning on the growing community dataset — moving from the wav2vec2 v1 baseline toward the XLS-R 300M + AASIST v2 target.

Contribution Types

  • 1.
    Voice Samples

    Submit short audio clips (real or AI-generated) via the Telegram bot — they feed the next retraining cycle

  • 2.
    Hard cases

    Samples the current model finds difficult are especially valuable — they directly improve detection accuracy after retraining

  • 3.
    Validation Work

    Participate in sample and model validation

  • 4.
    Network Operation

    Run nodes and maintain network infrastructure

Reward Mechanism

Rewards are dynamic: each accepted contribution earns a fraction of the remaining rewards pool, adjusted by the contributor's tier, the contribution type, and the sample quality score. This guarantees long-term sustainability — the pool decreases gradually rather than draining.

Reward range per accepted sample:

Typical range: 2 – 30 DTX per contribution, depending on tier, type, and quality.

Tier multiplier:Basic / Advanced / Expert
Type multiplier:Voice / TTS / AI-Call / Bounty
Quality score (SQS):audio quality, originality, model uncertainty
Pool factor:scales down as pool depletes

Full formula and per-sample breakdown are disclosed in the Telegram bot after each accepted contribution.

Sample Collection Modes

Contributors submit samples through three complementary channels, each producing a different kind of signal. The model gains the most from rarer, harder-to-collect signal — which is why live AI-Calls earn the largest multiplier.

1. Voice memo

User records a 2–15 s voice memo directly in the Telegram bot. Captured at device sample rate, transmitted via Telegram's Opus codec, server-side downsampled to 16 kHz.

Captures: real human voice · accents · consumer microphones

2. TTS upload

User generates a clip with any commercial TTS provider (ElevenLabs, OpenAI, Azure, Deepgram, Mistral, …) and uploads the digital file. Tracks the moving target of new synthetic engines.

Captures: emerging TTS vendors · pure digital synthesis

3. AI-Call (live) — NEW

User runs /call in the Telegram bot, joins a live WebRTC room with our AI agent (Realtime API). The full bidirectional conversation is streamed, server-side downsampled Opus → µ-law 8 kHz to match G.711 telephony, then processed by the model.

Captures: telephony 8 kHz · replay-realistic channel · live voice-cloning attempts

Reward Multipliers

Different contribution types apply different multipliers. Rarer or more valuable types (live AI-Calls, bounty submissions) earn proportionally more from the pool:

Contribution TypeMultiplierWhat it captures
Voice memo×1.2Real human voice via Telegram
TTS upload×1.0Synthetic samples from any provider
AI-Call (live)×1.5Live conversation with our AI agent
Bounty×1.5Targeted samples requested by the team

Acceptance Criteria

  • 16 kHz audio, between 2 and 15 seconds
  • Clear voice, low background noise
  • Correct label (real or AI-generated)
  • No duplicates (checked via audio fingerprint)
  • Daily submission limit per wallet (anti-abuse)

Quality Score (SQS)

Each accepted sample gets a quality score (1–10) that scales the final reward:

  • Audio quality — clear voice, correct duration, no clipping
  • Originality — no duplicate of previously seen samples
  • Model uncertainty — samples our model finds difficult earn more

Pipeline at a glance

Verdict latency

1–3s

From upload to result, up to 30 seconds for voice ai calls

Reward settlement

on-chain

EIP-712 signed, paid in DTX

Rewards pool

27.4M

DTX reserved for contributors

8. Use Cases

Where Voice-Authenticity Detection Matters

Our continuously trained detection model (audited at 100% fake-detection / 99.48% real-classification on in-distribution samples, 1-3 s CPU inference) targets the growing problem space of synthetic-voice fraud and impersonation. As the model is retrained on community-contributed samples, coverage expands to telephony audio, replay-hardened detection, and multilingual speech — the dimensions where modern deepfake voice attacks are most damaging.

Industry Applications

Financial Services
  • • Voice authentication for transactions
  • • Fraud prevention in call centers
  • • Secure voice banking
  • • Customer verification
Enterprise Security
  • • Access control systems
  • • Remote work authentication
  • • Secure voice commands
  • • Meeting verification
Media & Content
  • • Content authenticity
  • • Deepfake detection
  • • Copyright protection
  • • Source verification

9. Technical Roadmap

Development Timeline

Our technical roadmap outlines the planned evolution of the HyperDeteX platform, focusing on continuous improvement of detection capabilities, scalability, and user experience.

Development Phases

Q1
Q1 2026 — Foundation
  • • Model trained & audited
  • • Telegram bot live
Q2
Q2 2026 — Launch (current)
  • • Presale & DTX token launch
  • • Rewards on-chain (testnet → mainnet)
  • • Community onboarding
Q3
Q3 2026 — Platform
  • • Continuous training improvements
  • • Web platform launch
  • • New detection capabilities
Q4
Q4 2026 — Next Generation
  • • Next-generation model development
  • • B2B pilot program — opening enterprise integrations
  • • Dataset expansion
Q1'27
Q1 2027 — Recognition & Scale
  • • Third-party performance audit
  • • B2B program expansion
  • • Community growth milestones

Development Priorities

  • 1.
    Security & Reliability

    Ensuring robust protection and system stability

  • 2.
    Scalability

    Supporting growing network demands

  • 3.
    User Experience

    Streamlining integration and usage

Research Focus

  • Advanced Detection Methods

    Exploring new AI architectures

  • Privacy Preservation

    Enhancing data protection

  • Network Optimization

    Improving system efficiency

10. Team

Leadership & Vision

HyperDeteX is led by a team of experts in artificial intelligence, blockchain technology, and cybersecurity. Our leadership combines deep technical expertise with extensive industry experience to drive innovation and sustainable growth.

Core Team

Founder & Technical Leadership
  • • AI/ML Voice AI Lead
  • • Blockchain Architecture Lead
  • • Full-Stack Developer
Marketing
  • • Strategic Partnerships Lead
  • • Community building
  • • Social Media Managment

12. Future Outlook

Vision for the Future

As voice technology continues to evolve, HyperDeteX is positioned to lead the next wave of innovation in synthetic voice detection and verification. Our vision extends beyond current capabilities to shape the future of secure voice communication.

Innovation Pipeline

Advanced Detection
  • • Context-aware detection
  • • Multi-modal verification
  • • Telephony and codec-resilient inference
  • • Replay attack hardening
Platform Evolution
  • • Cross-chain interoperability
  • • Advanced governance systems
  • • Automated compliance tools
  • • Enhanced reward mechanisms

Market Expansion

Industry Integration
  • • IoT device integration
  • • Smart city applications
  • • Healthcare solutions
  • • Government partnerships
Global Reach
  • • Regional expansion
  • • Language support
  • • Cultural adaptation
  • • Local partnerships

Growth Projections

Market Size

$5.6B

By 2030

Samples collected

100K+

Target 2030

Partners

500+

Global reach

Closing Statement

HyperDeteX is positioned to capitalize on the explosive growth of the voice biometrics and deepfake detection market, projected to reach $5.6 billion by 2030 with a CAGR of 47.6%. As the global AI market expands to $2 trillion and voice authentication becomes standard across financial services, healthcare, and government sectors, HyperDeteX will serve as the critical infrastructure protecting against synthetic voice fraud. Through our decentralized approach and community-driven development, we are building the foundation for trusted voice communication in an AI-dominated future.