• The Lancaster-Oslo/Bergen (LOB) Corpus is a one-million-word collection of British English texts which was compiled in the 1970s in collaboration between...
    2 KB (161 words) - 20:05, 14 June 2024
  • Thumbnail for Brown Corpus
    later corpora such as the Lancaster-Oslo-Bergen Corpus (British English from the early 1990s) and the Freiburg-Brown Corpus of American English (FROWN)...
    9 KB (1,056 words) - 13:22, 29 February 2024
  • The Enron Corpus is a database of over 600,000 emails generated by 158 employees of the Enron Corporation in the years leading up to the company's collapse...
    7 KB (712 words) - 16:21, 20 May 2024
  • The Oxford English Corpus (OEC) is a text corpus of 21st-century English, used by the makers of the Oxford English Dictionary and by Oxford University...
    4 KB (345 words) - 10:40, 19 November 2022
  • The Corpus of Contemporary American English (COCA) is a one-billion-word corpus of contemporary American English. It was created by Mark Davies, retired...
    9 KB (1,135 words) - 01:14, 12 September 2024
  • as three corpora: a corpus from the Survey of English Usage, the Lancaster-Oslo-Bergen Corpus (UK English), and the Brown Corpus (US English). In 1988...
    3 KB (292 words) - 20:35, 24 November 2023
  • The Cambridge International Corpus (CIC) is a collection of over 800 million words of real spoken and written English . The texts are stored in a database...
    8 KB (1,016 words) - 19:14, 29 May 2024
  • British National Corpus (BNC) is a 100-million-word text corpus of samples of written and spoken English from a wide range of sources. The corpus covers British...
    31 KB (3,894 words) - 01:18, 14 June 2024
  • The Bergen Corpus of London Teenage Language (COLT) is a data set of samples of spoken English that was compiled in 1993 from tape recorded and transcribed...
    4 KB (361 words) - 17:32, 27 June 2022
  • Thumbnail for Quranic Arabic Corpus
    The Quranic Arabic Corpus (Arabic: المدونة القرآنية العربية, romanized: al-modwana al-Qurʾāni al-ʿArabiyya) is an annotated linguistic resource consisting...
    6 KB (599 words) - 10:00, 22 April 2024