• Measuring Massive Multitask Language Understanding (MMLU) is a benchmark for evaluating the capabilities of language models. It consists of about 16,000...
    3 KB (372 words) - 06:17, 27 July 2024
  • focused on STEM subjects, it achieved 56.6% on MATH benchmark and 63.47% on MMLU. it was released under Apache 2.0 license. it has a context length of 32k...
    20 KB (2,017 words) - 14:06, 5 August 2024
  • translation. GPT-4o scored 88.7 on the Massive Multitask Language Understanding (MMLU) benchmark compared to 86.5 by GPT-4. Unlike GPT-3.5 and GPT-4, which rely...
    14 KB (1,554 words) - 04:34, 17 August 2024
  • accuracy of 67.5% on the Measuring Massive Multitask Language Understanding (MMLU) benchmark, which is 7% higher than Gopher's performance. Chinchilla was...
    7 KB (548 words) - 21:42, 7 August 2024
  • HellaSwag, WinoGrande, ARC, OpenBookQA, NaturalQuestions, TriviaQA, RACE, MMLU (Massive Multitask Language Understanding), BIG-bench hard, GSM8k, RealToxicityPrompts...
    14 KB (2,302 words) - 12:33, 16 July 2024
  • evaluated relative to each other through standardized task benchmarks like MMLU, MMMU, HumanEval, and GSM8K. Given that foundation models are multi-purpose...
    46 KB (5,057 words) - 20:58, 11 August 2024
  • translation. It scored 88.7% on the Massive Multitask Language Understanding (MMLU) benchmark compared to 86.5% by GPT-4. On July 18, 2024, OpenAI released...
    187 KB (16,190 words) - 17:50, 15 August 2024
  • Thumbnail for Gemini (language model)
    human experts on the 57-subject Massive Multitask Language Understanding (MMLU) test, obtaining a score of 90%. Gemini Pro was made available to Google...
    44 KB (3,466 words) - 20:49, 15 August 2024
  • different evaluation datasets and tasks. Examples include GLUE, SuperGLUE, MMLU, BIG-bench, and HELM. OpenAI has released tools for running composite benchmarks...
    137 KB (12,428 words) - 19:43, 14 August 2024
  • Thumbnail for Moroccan Arabic
    qal/yqul "say", kan/ykun "be" (the only examples) II FeMMeL; FeMMLu yFeMMeL, yFeMMLu beddel/ybeddel "change" FeMMit, FeMMa yFeMMi werra/ywerri "show"...
    95 KB (7,933 words) - 21:01, 10 August 2024