Open acoustic models and speech data for German speech recognition

In the course of the BMBF project Dialog+, the LT and the Teleccoperation group have developed acoustic models for German distant speech recognition. These have been built with the open source software toolkits Sphinx and Kaldi. Unfortunately, German data resources needed to train such acoustic are rarely open source and easily accessible. We thus decided to record our own German speech data corpus, which we have now released under an open source license (CC-BY). Pretrained models and scripts to generate those are also available (see download links below) and are released under the same permissive CC-BY license.

 

The generation of this speech data corpus is supported by the BMBF project dialog+:

  

Summary of collected data (March 2015)
Overall duration per microphone:

about 36 hours (31 hrs train / 2.5 hrs dev / 2.5 hrs test)

Count of microphones:3 (Microsoft Kinect, Yamaha, Samson)
Count of wave-files per microphone:about 14500
Overall count of participations:180 (130 male / 50 female)

 

What is the difference to the freely available German Voxforge corpus?

  • We have recorded all our speech data under controlled conditions: same room, same microphone distances, ...
  • We recorded with three microphones in parallel. An additional signal was recorded with enabled beamforming and noise reduction (Microsoft Kinect).
  • The data is curated, to reduce speaking errors and artefacts.

Downloads: Find downloads now on our new page at UHH

People

 

 

A A A | Drucken Print | Impressum Impressum | Sitemap Sitemap | Suche Search | Kontakt Contact | Webseitenanalyse: Mehr Informationen
zum Seitenanfangzum Seitenanfang