
This significantly increased the HumanEval score of StarCoder from 34% to over 40%, setting a new state-of-the-art result for open models.
License key webstorm free online code#
To force the model the generate an actual solution we added the prompt solutions/solution_1.py\n# Here is the correct implementation of the code exercise. We also noticed that a failure case of the model was that it would produce # Solution here code, probably because that type of code is usually part of exercise. They also outperform CodeGen-16B-Mono and OpenAI’s code-cushman-001 (12B) model. We found that both StarCoder and StarCoderBase outperform the largest models, including PaLM, LaMDA, and LLaMA, despite being significantly smaller. A popular Python benchmark is HumanEval which tests if the model can complete functions based on their signature and docstring. We thoroughly evaluated StarCoder and several similar models and a variety of benchmarks. We believe that with its strong performance, the StarCoder models will serve as a solid foundation for the community to use and adapt it to their use-cases and products.
License key webstorm free online license#
The updated license simplifies the process for companies to integrate the model into their products.

Under an improved version of the OpenRAIL license. We take several important steps towards a safe open model release, including an improved PII redaction pipeline, a novel attribution tracing tool, and make StarCoder publicly available In addition, the models can be used to autocomplete code, make modifications to code via instructions, and explain a code snippet in natural language. For example, by prompting the StarCoder models with a series of dialogues, we enabled them to act as a technical assistant. With a context length of over 8,000 tokens, the StarCoder models can process more input than any other open LLM, enabling a wide range of interesting applications. We found that StarCoderBase outperforms existing open Code LLMs on popular programming benchmarks and matches or surpasses closed models such as code-cushman-001 from OpenAI (the original Codex model that powered early versions of GitHub Copilot). We fine-tuned StarCoderBase model for 35B Python tokens, resulting in a new model that we call StarCoder. Similar to LLaMA, we trained a ~15B parameter model for 1 trillion tokens. StarCoder and StarCoderBase are Large Language Models for Code (Code LLMs) trained on permissively licensed data from GitHub, including from 80+ programming languages, Git commits, GitHub issues, and Jupyter notebooks.
