MMLU Search Results

MMLU

Measuring Massive Multitask Language Understanding (MMLU) is a benchmark for evaluating the capabilities of language models. It consists of about 16,000...

3 KB (372 words) - 06:17, 27 July 2024

Mistral AI

focused on STEM subjects, it achieved 56.6% on MATH benchmark and 63.47% on MMLU. it was released under Apache 2.0 license. it has a context length of 32k...

20 KB (2,017 words) - 14:06, 5 August 2024

GPT-4o

translation. GPT-4o scored 88.7 on the Massive Multitask Language Understanding (MMLU) benchmark compared to 86.5 by GPT-4. Unlike GPT-3.5 and GPT-4, which rely...

14 KB (1,554 words) - 04:34, 17 August 2024

Chinchilla (language model)

accuracy of 67.5% on the Measuring Massive Multitask Language Understanding (MMLU) benchmark, which is 7% higher than Gopher's performance. Chinchilla was...

7 KB (548 words) - 21:42, 7 August 2024

Language model

HellaSwag, WinoGrande, ARC, OpenBookQA, NaturalQuestions, TriviaQA, RACE, MMLU (Massive Multitask Language Understanding), BIG-bench hard, GSM8k, RealToxicityPrompts...

14 KB (2,302 words) - 12:33, 16 July 2024

Foundation model

evaluated relative to each other through standardized task benchmarks like MMLU, MMMU, HumanEval, and GSM8K. Given that foundation models are multi-purpose...

46 KB (5,057 words) - 20:58, 11 August 2024

OpenAI

translation. It scored 88.7% on the Massive Multitask Language Understanding (MMLU) benchmark compared to 86.5% by GPT-4. On July 18, 2024, OpenAI released...

187 KB (16,190 words) - 17:50, 15 August 2024

Gemini (language model)

human experts on the 57-subject Massive Multitask Language Understanding (MMLU) test, obtaining a score of 90%. Gemini Pro was made available to Google...

44 KB (3,466 words) - 20:49, 15 August 2024

Large language model

different evaluation datasets and tasks. Examples include GLUE, SuperGLUE, MMLU, BIG-bench, and HELM. OpenAI has released tools for running composite benchmarks...

137 KB (12,428 words) - 19:43, 14 August 2024

Moroccan Arabic

qal/yqul "say", kan/ykun "be" (the only examples) II FeMMeL; FeMMLu yFeMMeL, yFeMMLu beddel/ybeddel "change" FeMMit, FeMMa yFeMMi werra/ywerri "show"...

95 KB (7,933 words) - 21:01, 10 August 2024