Preview and Explore Data

Data Asset Exchange - TensorFlow Speech Commands

Sample Records

Core Word - On

Core Word - Off

Core Word - Yes

Core Word - No

Core Word - Up

Core Word - Down

Auxillary Word - Bird

Background Noise

Metadata

Format WAV
License CC BY 4.0
Domain Audio
Number of Records 65,000 WAV Files
Data Split Train - 51,094 audio clips, Validation - 6,798 audio clips, Test - 6,835 audio clips
Size 1.49 GB
Data Origin The audio clips were originally collected by Google, and recorded by volunteers in uncontrolled locations around the world.
Dataset Version Version 1 – March 17, 2020
Dataset Coverage

    Core words

  • Yes, No, Up, Down, Left, Right, On, Off, Stop, Go, Zero, One, Two, Three, Four, Five, Six, Seven, Eight, and Nine.
  • Auxiliary words

  • Bed, Bird, Cat, Dog, Happy, House, Marvin, Sheila, Tree, and Wow.
  • Background noise

  • doing_the_dishes, dude_miaowing, exercise_bike, pink_noise, running_tap, and white_noise.
Business Use Case
  • Build voice recognition systems that are widely used in the Internet of Things, Automotive, Security and UX/UI.
  • Build voice based search applications and voice-activated assistants.

Dataset Feature Definition

Feature Description
Audio clip folders
(Duration - one second)
30 audio clip folders. Each folder name is labelled with the word that is spoken. 30 folders ( 20 core words, 10 auxillary words).
Audio clip name contains the id of the participant. For example, the file path `happy/3cfc6b3a_nohash_2.wav` indicates that the word spoken was "happy", the speaker's id was "3cfc6b3a", and this is the third utterance (indicated by `2`) of that word by this speaker in the data set.
First utterance is indicated by `0` at the end of the file name.
The 'nohash' section is to ensure that all the utterances by a single speaker are sorted into the same training partition, to keep very similar repetitions from giving unrealistically optimistic evaluation scores.
Backgroud noise audio clip folder The `_background_noise_` folder contains a set of longer audio clips that are either recordings or mathematical simulations of noise.
For more details, see the `_background_noise_/README.md`.