The TIMIT corpus of read speech was the first annotated speech database to be widely distributed, and it has an especially clear organization.
TIMIT was developed by a consortium including Texas Instruments and MIT, from which it derives its name.
The goal of this chapter is to answer the following questions: Along the way, we will study the design of existing corpora, the typical workflow for creating a corpus, and the lifecycle of corpus.
As in other chapters, there will be many examples drawn from practical experience managing linguistic data, including data that has been collected in the course of linguistic fieldwork, laboratory work, and web crawling.
Cette technologie a t choisie afin de vous offrir des outils de navigation intuitifs et performants, ainsi qu'un haut degr d'interactivit.
La cartographie interactive accessible dans ces pages propose un contenu grand format, forte valeur ajoute et infiniment personnalisable.
En cas de besoin, il est disponible sur le site de Macromedia France.
Two sentences, read by all speakers, were designed to bring out dialect variation: The remaining sentences were chosen to be phonetically rich, involving all phones (sounds) and a comprehensive range of diphones (phone bigrams).Therefore, many of the computational methods described in this book are applicable.Moreover, notice that all of the data types included in the TIMIT corpus fall into the two basic categories of lexicon and text, which we will discuss below.Any transformations of that artifact which involve human judgment — even something as simple as tokenization — are subject to later revision, thus it is important to retain the source material in a form that is as close to the original as possible.: Structure of the Published TIMIT Corpus: The CD-ROM contains doc, train, and test directories at the top level; the train and test directories both have 8 sub-directories, one per dialect region; each of these contains further subdirectories, one per speaker; the contents of the directory for female speaker A fourth feature of TIMIT is the hierarchical structure of the corpus.