WikiWhy: Answering and Explaining Cause-and-Effect Questions

Data PlatformsPremium

As large language models (LLMs) grow larger and more sophisticated, assessing their “reasoning” capabilities in natural language grows more challenging. Recent question answering (QA) benchmarks that attempt to assess reasoning are often limited by a narrow scope of covered situations and subject matters. We introduce WikiWhy, a QA dataset built around a novel auxiliary task: explaining why an answer is true in natural language. WikiWhy contains over 9,000 “why” question-answer-rationale triples, grounded on Wikipedia facts across a diverse set of topics. Each rationale is a set of supporting statements connecting the question to the answer. WikiWhy serves as a benchmark for the reasoning capabilities of LLMs because it demands rigorous explicit rationales for each answer to demonstrate the acquisition of implicit commonsense knowledge, which is unlikely to be easily memorized. GPT-3 baselines achieve only 38.7% human-evaluated correctness in the end-to-end answer & explain condition, leaving significant room for future improvements. , Lees op arxiv.org/abs/2210.12152

Premium content

WikiWhy: Answering and Explaining Cause-and-Effect Questions

Dit artikel is exclusief beschikbaar voor nieuwsbrief-abonnees. Schrijf je in voor toegang tot 880+ artikelen.

Geen spam. Uitschrijven op elk moment.

AI & Security Intelligence

Wekelijkse nieuwsbrief met AI updates, security alerts en compliance inzichten, direct in uw inbox.

Security & AI Operating Model

Advisory met executiekracht

Van BIO2 en NIS2 tot EU AI Act, embedded in uw operating model, niet als extern project. Maandelijks opzegbaar, met assessments als bewijsvoering.

Bekijk advisory niveaus →Plan een intake

WikiWhy: Answering and Explaining Cause-and-Effect Questions

Data PlatformsPremium

As large language models (LLMs) grow larger and more sophisticated, assessing their “reasoning” capabilities in natural language grows more challenging. Recent question answering (QA) benchmarks that attempt to assess reasoning are often limited by a narrow scope of covered situations and subject matters. We introduce WikiWhy, a QA dataset built around a novel auxiliary task: explaining why an answer is true in natural language. WikiWhy contains over 9,000 “why” question-answer-rationale triples, grounded on Wikipedia facts across a diverse set of topics. Each rationale is a set of supporting statements connecting the question to the answer. WikiWhy serves as a benchmark for the reasoning capabilities of LLMs because it demands rigorous explicit rationales for each answer to demonstrate the acquisition of implicit commonsense knowledge, which is unlikely to be easily memorized. GPT-3 baselines achieve only 38.7% human-evaluated correctness in the end-to-end answer & explain condition, leaving significant room for future improvements. , Lees op arxiv.org/abs/2210.12152

Premium content

WikiWhy: Answering and Explaining Cause-and-Effect Questions

Dit artikel is exclusief beschikbaar voor nieuwsbrief-abonnees. Schrijf je in voor toegang tot 880+ artikelen.

Geen spam. Uitschrijven op elk moment.

AI & Security Intelligence

Wekelijkse nieuwsbrief met AI updates, security alerts en compliance inzichten, direct in uw inbox.

Security & AI Operating Model

Advisory met executiekracht

Van BIO2 en NIS2 tot EU AI Act, embedded in uw operating model, niet als extern project. Maandelijks opzegbaar, met assessments als bewijsvoering.

Bekijk advisory niveaus →Plan een intake

WikiWhy: Answering and Explaining Cause-and-Effect Questions

WikiWhy: Answering and Explaining Cause-and-Effect Questions

AI & Security Intelligence

Advisory met executiekracht

Gerelateerde artikelen

De AI-levenscyclus een benadering voor gefaseerde innovatie.

Who is the data owner?

Unlocking the Power of Language From Roman Jakobson to Large Language Models (LLMs)

WikiWhy: Answering and Explaining Cause-and-Effect Questions

WikiWhy: Answering and Explaining Cause-and-Effect Questions

AI & Security Intelligence

Advisory met executiekracht

Gerelateerde artikelen

De AI-levenscyclus een benadering voor gefaseerde innovatie.

Who is the data owner?

Unlocking the Power of Language From Roman Jakobson to Large Language Models (LLMs)