Using StarCoder for AI-Powered Auto Completion in the Theia IDE

January 21, 2025 | 5 min Read

StarCoder is an advanced open-source language model designed specifically for code generation and auto-completion tasks. Developed by BigCode, StarCoder is trained on a large, transparent dataset of publicly available code, ensuring a high degree of transparency and accountability. It is particularly interesting for developers and organizations seeking more control over their AI-powered coding tools, thanks to its open-source nature, flexible deployment options and high performance for multiple languages.

AI code completion in the AI-powered Theia IDE based on StarCoder

In this post, we will explore how StarCoder can be integrated into the AI-powered Theia IDE, leveraging the capabilities of Theia AI—a flexible and extensible framework for building AI-driven tools and IDEs. The Theia IDE and its AI capabilities are openly available, offering features like customizable prompts, transparent LLM communication, and support for any LLM, including local deployments.

In case you don’t know Theia AI or the AI-powered Theia IDE, visit the Theia AI introduction and the AI Theia IDE overview, or download the AI-powered Theia IDE here.

Why StarCoder is an Interesting Choice for an AI Coding Assistant

Many existing solutions rely on proprietary models with undisclosed training data, raising concerns about transparency and ethical or legal compliance. StarCoder addresses these concerns by using The Stack for training, a well-documented and transparent dataset of publicly available code. This dataset is open source, allowing users to inspect it and verify whether their code is included. Additionally, it adheres to open-source licenses, making it an attractive choice for developers prioritizing openness and clarity.

StarCoder also offers two interesting deployment-specific advantages:

  1. Available on Hugging Face Serverless: StarCoder is typically “warm” on Hugging Face’s hosting service, allowing developers to use it for free within certain rate limits.

  2. Local Deployment: Unlike many large-scale proprietary models, StarCoder is lightweight enough to be deployed locally, giving users the freedom to avoid sending parts of their code to third parties.

Despite its advantages, using models like StarCoder in IDEs can be challenging, as many AI-powered IDE products restrict users to proprietary LLM providers, lack support for self-hosted models, and use black-box prompts that are not adaptable to specific models. This makes it difficult to experiment with alternative models like StarCoder without specific extensions.

Theia AI and the AI-Powered Theia IDE: A Flexible Solution

Theia AI and the AI-powered Theia IDE (built on Theia AI) provide a truly open and flexible framework, addressing the limitations of traditional solutions. Here’s how they enable seamless integration with models like StarCoder:

  1. Support for Arbitrary LLM Providers: Users can connect to various LLM providers, including Hugging Face, self-hosted models, and locally deployed models (e.g., via Ollama or LlamaFile).

  2. Model Selection for Specific Use Cases: Users can choose any LLM for each specific purpose, such as using StarCoder for code completion.

  3. Customizable Prompts and Settings: Both Theia AI and the AI-powered Theia IDE allow easy adaptation of prompts and relevant settings. In the Theia IDE, even end users can customize these configurations directly to make agents like code completion work with new models.

Steps to Integrate StarCoder into the AI-Powered Theia IDE

The video below demonstrates the integration steps of StarCoder into the Theia IDE. Here’s a summary of the process:

  1. Decide Where to Host the Model: Choose whether to use a hosted version on Hugging Face, deploy StarCoder locally, or self-host it on a custom server. In our case, we add the StarCoder model as a LLama-File running locally.

  2. Adapt the LLM Request Settings: Specify the request settings in Theia AI for your StarCoder model, such as the stop words, etc.

  3. Change the LLM for the Code Completion Agent: Select StarCoder as the model to be used for the code completion agent in the Theia IDE.

  4. Adapt the Prompt: Adjust the prompt to align with StarCoder’s specification for the code completion use case.

Set-up the AI code completion in the AI-powered Theia IDE based on StarCoder

  1. Test the code completion: Now, you can see StarCoder in action in the following video for AI-powered code completion in the Theia IDE. The example looks simple for demonstration purposes, but as you can see, the code completion can “guess” the intended content of the function just by its name—a capability that static code completion typically cannot achieve.
Set-up the AI code completion in the AI-powered Theia IDE based on StarCoder

Conclusion

Theia AI and the AI-powered Theia IDE exemplify how open and flexible tools can empower developers to experiment with and adopt new models like StarCoder. This flexibility enables not only tool providers building on Theia AI, but even end users to switch seamlessly to open-source and locally hosted models with minimal effort, ensuring transparency, control, and adaptability.

While integrating new models like StarCoder, you may encounter challenges such as imperfect results or additional adjustments needed for parsing responses and removing unwanted artifacts. However, Theia AI and the AI-powered Theia IDE’s flexibility ensure that you can quickly prototype a working solution and evaluate new models in a short turnaround time.

If you are then interested in optimizing Theia AI or the AI-powered Theia IDE for specific LLMs and use cases, Theia AI provides you with the tool kit to do so. You can do this for your own usage, for a custom tool you are developing, or even contribute your streamlined AI capability as part of the Theia IDE to the open source community Eclipse Theia is a truly open project and highly appreciates contributions from the community. Start experimenting today and help shape the future of open, AI-powered IDEs!

For more details, check out our blog post: Introducing AI Support in Theia IDE and Introducing Theia AI (the underlying framework for tool builders).

If you are interested in building custom AI-powered Tools, EclipseSource provides consulting and implementation services backed by our extensive experience with successful AI tool projects. We also specialize web- and cloud-based tools and support for popular platforms like Eclipse Theia and VS Code.

Get in touch with us to learn more.

Jonas, Maximilian & Philip

Jonas Helming, Maximilian Koegel and Philip Langer co-lead EclipseSource, specializing in consulting and engineering innovative, customized tools and IDEs, with a strong …