English-Corpora: COCA [Davies] 1 1 billion word corpus of American English, 1990-2010 Compare to the BNC and ANC Large, balanced, up-to-date, and freely-available online
Text corpus - Wikipedia In linguistics and natural language processing, a corpus (pl : corpora) or text corpus is a dataset, consisting of natively digital and older, digitalized, language resources, either annotated or unannotated
What is a corpus, and how is it used in NLP? - Medium In short, a corpus is a large set of language training data for statistical NLP applications Here at BAVL, we have all the tools you need to collect and annotate text and voice data
The Corpus | Open American National Corpus The OANC is a 15 million word (and growing) corpus of American English produced since 1990, all of which is in the public domain or otherwise free of usage and redistribution restrictions
Definition and Examples of Corpora in Linguistics - ThoughtCo A corpus is a collection of language data used for research and learning about language The Brown Corpus was the first major computer database of American English, created in the 1960s
corpus - Wiktionary, the free dictionary corpus (plural corpora or corpuses or corpusses or (proscribed) corpi) A collection of writings, often on a specific topic, of a specific genre, from a specific demographic or a particular author, etc Synonyms: collection, compilation, aggregation; see also Thesaurus: body