• artificial intelligence, Measuring Massive Multitask Language Understanding (MMLU) is a benchmark for evaluating the capabilities of large language models...
    5 KB (394 words) - 00:32, 10 October 2024
  • translation. GPT-4o scored 88.7 on the Massive Multitask Language Understanding (MMLU) benchmark compared to 86.5 by GPT-4. Unlike GPT-3.5 and GPT-4, which rely...
    17 KB (1,787 words) - 13:48, 6 October 2024
  • subjects, achieving a score of 56.6% on the MATH benchmark and 63.47% on the MMLU benchmark. The model was produced in collaboration with Project Numina, and...
    21 KB (2,191 words) - 15:35, 9 October 2024
  • Thumbnail for Neural scaling law
    MMLU performance vs AI scale...
    37 KB (4,931 words) - 17:41, 19 October 2024
  • HellaSwag, WinoGrande, ARC, OpenBookQA, NaturalQuestions, TriviaQA, RACE, MMLU (Massive Multitask Language Understanding), BIG-bench hard, GSM8k, RealToxicityPrompts...
    14 KB (2,215 words) - 12:58, 13 October 2024
  • evaluated relative to each other through standardized task benchmarks like MMLU, MMMU, HumanEval, and GSM8K. Given that foundation models are multi-purpose...
    46 KB (5,035 words) - 12:16, 20 October 2024
  • accuracy of 67.5% on the Measuring Massive Multitask Language Understanding (MMLU) benchmark, which is 7% higher than Gopher's performance. Chinchilla was...
    7 KB (615 words) - 22:23, 4 October 2024
  • translation. It scored 88.7% on the Massive Multitask Language Understanding (MMLU) benchmark compared to 86.5% by GPT-4. On July 18, 2024, OpenAI released...
    196 KB (16,898 words) - 01:13, 20 October 2024
  • Thumbnail for Gemini (language model)
    human experts on the 57-subject Massive Multitask Language Understanding (MMLU) test, obtaining a score of 90%. Gemini Pro was made available to Google...
    44 KB (3,499 words) - 16:17, 16 October 2024
  • different evaluation datasets and tasks. Examples include GLUE, SuperGLUE, MMLU, BIG-bench, and HELM. OpenAI has released tools for running composite benchmarks...
    158 KB (13,501 words) - 05:44, 19 October 2024