Measuring Coding Challenge Competence With APPS

While programming is one of the most broadly applicable skills in modern society, modern machine learning models still cannot code solutions to basic problems. It can be difficult to accurately assess code generation performance, and there has been surprisingly little work on evaluating code generation in a way that is both flexible and rigorous. To meet this challenge, we introduce APPS, a benchmark for code generation. Unlike prior work in more restricted settings, our benchmark mea- sures the ability of models to take an arbitrary natural language specification and generate Python code fulfilling this specification. Similar to how companies assess candidate software developers, we then evaluate models by checking their generated code on test cases. Our benchmark includes 10,000 problems, which range from having simple one-line solutions to being substantial algorithmic challenges. We fine-tune large language models on both GitHub and our training set, and we find that the prevalence of syntax errors is decreasing exponentially. Recent models such as GPT-Neo can pass approximately 15% of the test cases of introductory problems, so we find that machine learning models are beginning to learn how to code. As the social significance of automatic code generation increases over the coming years, our benchmark can provide an important measure for tracking advancements.

arxiv.org/pdf/2105.09938.pdf

Measuring Coding Challenge Competence With APPS

Published by d.landman@djimit.com on mei 21, 2021

0 Comments

Geef een reactie Reactie annuleren

Data

De oorsprong van Word2Vec

Data

Een blik op ordinary least squares (OLS) regressie, oorsprong, formule en gebruikstoepassingen.

Data

The Wizard of Flaws: A Tale of Ownership and Omissions in Tech Land

Measuring Coding Challenge Competence With APPS

Published by d.landman@djimit.com on mei 21, 2021

Dit delen:

0 Comments

Geef een reactie Reactie annuleren

Related Posts

Data

De oorsprong van Word2Vec

Data

Een blik op ordinary least squares (OLS) regressie, oorsprong, formule en gebruikstoepassingen.

Data

The Wizard of Flaws: A Tale of Ownership and Omissions in Tech Land