Datasets

Custom datasets, built to your spec.

Six categories we collect every week. None are off-the-shelf — every project is scoped, recorded, and QA'd to your requirements.

Multi-speaker conversations

Natural, overlapping dialogue between two or more speakers — diarized, turn-segmented, and tagged.

Speakers
2–8 per session
Modality
Audio (optional video)
Format
Multi-channel WAV + JSON
Metadata
Diarization, overlaps, sentiment, demographics

Common use cases

  • Diarization & ASR
  • Conversational LLM RLHF
  • Voice agent evaluation

Single-speaker monologues

Long-form, expressive reads from a single speaker — controlled prompts, consistent capture.

Speakers
1 per session
Modality
Audio
Format
WAV + transcript + phoneme alignment
Metadata
Emotion, pace, style, pitch contour

Common use cases

  • TTS & voice cloning
  • Prosody modeling
  • ASR fine-tuning

Audio + video

Synchronized multi-mic, multi-camera capture for multimodal research — viseme, gaze, and gesture friendly.

Cameras
1–4 angles, up to 4K
Audio
Lavalier + boom + ambient
Sync
Timecode-locked
Metadata
Visemes, gaze, gesture, scene context

Common use cases

  • Lip-sync & avatar models
  • Multimodal LLM grounding
  • Sign / gesture recognition

Multilingual

Native speakers across 40+ languages — collected by humans, never machine-translated or synthesized.

Coverage
40+ languages
Speakers
Native, age & gender balanced
Format
Aligned transcripts in source language
Metadata
Language, region, native/L2 tag

Common use cases

  • Multilingual ASR/TTS
  • Translation evals
  • Low-resource language coverage

Multi-accent

Region-tagged accents within each language — from London RP to Lagos English to Chicano Spanish.

Granularity
Country + region tags
Speakers
Verified residence & background
Format
WAV + accent metadata
Metadata
Accent label, confidence, demographic

Common use cases

  • Bias & fairness audits
  • Robust ASR
  • Localized voice products

Metadata-rich

Every recording shipped with structured metadata — speaker, environment, device, emotion, and consent.

Speaker
Demographics, voice profile
Environment
Room type, noise level, device
Annotation
Transcript, emotion, intent
Provenance
Consent record, capture date, geo

Common use cases

  • RLHF & preference data
  • Eval set construction
  • Compliance & audit

Need something we haven't listed?

Custom is the default. Tell us what your model needs and we'll scope it.