OTFotf
All posts

JetBrains open-sources Mellum2 to let private AI coding infrastructure

D
DaveAuthor
8 min read
JetBrains open-sources Mellum2 to let private AI coding infrastructure

The Mellum2 open source coding model is a serious leap for teams who need fast, controllable AI at the infrastructure layer. JetBrains’ release brings a 12B-parameter, Mixture-of-Experts model purpose-built for agentic AI tasks and on-premises orchestration. Unlike tools welded to a SaaS API, Mellum2 puts the whole model—and every inference—under your own roof. If you've been waiting for an open, self-hostable AI genuinely optimized for engineering workflows instead of just a demo or copy-paste code, this is worth your attention.

What is Mellum2 and how does it differ from Mellum?

JetBrains Mellum2 is a 12-billion parameter open source coding model intended as a core engine for AI infrastructure, not just an autocomplete. It’s the direct successor to Mellum, the 4B-parameter model released by JetBrains in late 2024 as a proprietary, code-completion-only tool for JetBrains IDEs, and open-sourced a year later. With Mellum2, JetBrains is explicit: this isn’t just another plugin model or a ChatGPT knockoff. Mellum2 is open from day one, and its entire architecture and training objectives have shifted from line-by-line code completion to solid agentic infrastructure.

Per JetBrains’ own description, Mellum2 is designed as a “focal model”—fast, specialized, and meant to sit inside real engineering systems alongside frontiers like Claude but optimized for the foundational work of routing, retrieval, context compression, and sub-agent orchestration. Notably, the model comes in three variants: a base model, an "instruct" tuned for direct Q&A, and a "thinking" version for multi-step or agentic tasks (producing explicit reasoning traces as output).

The code completion use case remains, but Mellum2 now targets the entire spectrum of infrastructure tasks, making it fit for self-hosted, production-grade teams powering everything from workflow orchestration to private retrieval-augmented generation.

Key differences:

  • Parameter size: Mellum2 jumps from 4B to 12B.
  • Open source: Mellum2 is open source at launch; Mellum was proprietary for nearly a year.
  • Scope: Expanded to agentic and infrastructure tasks (retrieval, routing, sub-agent orchestration) instead of just code completion.

Diagram contrasting Mellum2’s agentic infrastructure stack with Mellum’s narrow code-compl

The result: Mellum2 represents a clear pivot to infrastructure-grade, on-premises AI—moving far outside the boundaries of its predecessor.

How does Mellum2 enable on-premises AI deployment, and why does it matter?

Unlike Anthropic’s Claude Code and similar models, Mellum2 requires zero third-party API calls for inference. The model itself (all 12B parameters) is delivered as open source, and you can run it entirely within your own infrastructure or private cloud. This is more than just a licensing or cost benefit—it’s a security and governance imperative for many high-trust engineering teams.

Teams using off-the-shelf SaaS models for AI orchestration face the twin risks of vendor lock-in (API pricing, access limits) and data privacy (code, tickets, internal workflow data traveling to external provider infrastructure). Mellum2 solves this by letting enterprises deploy and fine-tune the model on their chosen hardware, segment tasks, and enforce organization-specific privacy standards or compliance frameworks. No upstream calls, no leaking telemetry.

For infrastructure and SRE teams building agentic AI pipelines, it also means full visibility and control over tokens used, context sizes, and internal orchestration logic—critical for debugging, auditing, and cost management. While Mellum2 doesn’t claim the “breadth” of industry frontier models, it’s tuned to be fast and reliable for the real-world, repeatable infrastructure roles teams own end-to-end.

Concrete implications:

  • Deployment: Air-gapped clusters, private VPCs, or even bare metal.
  • Data governance: Full auditability and Python+-level control of inputs/outputs.
  • Customization: Retain the ability to patch, fine-tune, or fork the model as needs evolve.

The difference? Mellum2 brings the model home, literally—removing third-party dependencies from core AI engineering infrastructure.

What infrastructure tasks can Mellum2 perform?

Mellum2 is built for the next wave of “agentic” AI—systems where orchestration, retrieval, and delegation are primary workloads, not afterthoughts. Instead of being a stateless completion engine, Mellum2 is constructed for persistent, high-volume workflows where one AI model may coordinate many sub-agents or perform elastic context compression.

JetBrains highlights these core tasks:

  1. Routing: Mellum2 can function as the central router, delegating subtasks to in-house or external agents depending on task type or context.
  2. Retrieval pipelines: The model is tuned for high-throughput retrieval tasks, efficiently compressing and expanding context to allow larger working windows and relevant information per token.
  3. Sub-agent task handling: Supports running multiple concurrent agent-style workloads, each with its own parallel context. This is directly applicable to code intelligence, documentation, CI/CD coordination, or incident response bots.
  4. Context compression: One of the main MoE (Mixture-of-Experts) benefits: active parameters per token are down to 2.5B, allowing massive context windows on limited hardware.

JetBrains’ official post describes Mellum2 as “lean and fast,” but built to excel at the coordination layer—picking which agent runs what, extracting or restructuring information for long-running tasks, or acting as a “focal” processing node at the heart of a larger agentic AI mesh.

Concretely, this enables setups where a codebase change triggers Mellum2 to:

  • Route ticket summaries to specialized code agents.
  • Orchestrate multi-step deployment flows via sub-agents.
  • Compress and surface only the relevant logs or docs per context window.

The model isn’t designed to compete with GPT-5 on general reasoning, but to dominate the specialized middle-layer of engineering AI.

How to use Mellum2 today in your development workflow

Getting started with Mellum2 is refreshingly direct. JetBrains has fully open-sourced the model, including base, instruct, and thinking variants—each packaged for standard inference engines.

1. Clone and configure

Mellum2’s repository is available on JetBrains’ public GitHub (URL is not listed in the source, so search jetbrains/mellum2 or follow links from the official JetBrains Mellum2 announcement). Download the weights and pick an inference engine your stack supports—most standard engine backends are compatible.

git clone 
cd mellum2
# Follow README for weights download, e.g.:
python download_weights.py --variant instruct

2. Stand up the inference service

Assuming a vanilla torch/onnx backend, a minimal container deploy might look like this (compose example):

# docker-compose.yml
services:
  mellum2-api:
    image: jetbrains/mellum2-inference:latest
    environment:
      - MODEL_VARIANT=instruct
      - MAX_CONTEXT=32000
    ports:
      - "8080:8080"
    volumes:
      - ./weights:/models/

Or launch with CLI:

python serve.py --variant instruct --port 8080 --weights /models

3. Integrate with existing infrastructure

Out of the box, Mellum2 provides RESTful endpoints similar to industry standards. For common IDE integration or orchestration:

curl -X POST  \
     -H "Content-Type: application/json" \
     -d '{
           "prompt": "Write a bash script to backup a database.",
           "max_tokens": 200
         }'

For multi-agent orchestration, connect Mellum2 as the main router or context compression node in your task pipeline—either as a native Python subprocess or REST microservice.

4. Early use cases

  • Basic code suggestion in JetBrains IDEs (plugin triggers Mellum2 API).
  • Routing task requests between specialized bots (deployment manager, doc summarizer).
  • Compressing large logs for incident response (context slimming + targeted retrieval).

With the full model running in-house, you’re free to tune it by organization-specific guidelines and connect to your preferred scheduling, tracing, or monitoring stack.

Example workflow diagram with Mellum2 routing between sub-agents and retrieval services

How does Mellum2 compare to Anthropic’s Claude Code and other AI coding models?

Mellum2 stands out for what it is—and what it isn’t. Where Claude Code and the like require persistent API dependence and give zero local control or source access, Mellum2 is open, forkable, and runs on your infrastructure. The trade-offs are stark:

ModelParameter SizeOpen SourceDeploymentScope
Mellum (old)4BYes (late)On-premisesCode completion
Mellum212B (2.5B active)YesOn-premisesInfra/agentic tasks, routing
Claude CodeNot listedNoAPI-onlyCode, no infra/agentic
  • Parameter size: Mellum2 (12B) is 3× the size of original Mellum (4B), and uses MoE to keep inference fast (2.5B parameters active per token). Claude Code’s size is not listed, but it’s only available as API, so infra deployment is off the table.
  • Scope: Only Mellum2 emphasizes routing, retrieval, and agentic infrastructure endpoints natively; Claude Code addresses only code completion.
  • Open source: Mellum2 is fully open from launch—immediate control, inspection, and modification; Claude Code is closed-source, usage-limited, and external.
  • Latency/deployment: By hosting Mellum2 on your own infra (VPC, air-gapped server, or cloud VM), teams can optimize for latency, privacy, and orchestration—options unavailable with SaaS-bound models.

In summary: if you need custom routing, agentic orchestration, or private, auditable infrastructure AI—with no API leash—Mellum2 is positioned to fill that gap.

What is the future outlook for open-source AI models like Mellum2 in infrastructure?

Open-source models specialized for infrastructure tasks are rapidly becoming essential for serious engineering teams. Mellum2 captures the new demand: agentic orchestration, workload routing, and context-aware retrieval—without trading control for convenience. As more organizations realize the security, auditability, and customization wins from self-hosted AI, expect to see the Mellum2 pattern (fast, focused, open, MoE-tuned) echoed by upcoming projects in agentic infrastructure and toolchain automation.

The trend is shifting: frontier models define the outer limits, but “focal” models like Mellum2 will become the durable backbone inside CI/CD, incident response, data pipelines, and internal developer platforms. JetBrains has set an early precedent by open-sourcing this infrastructure-grade model immediately—making it likely that community forks, internal tuning, and ecosystem plugins will balloon over the next 12-18 months.

Ultimately, Mellum2 exemplifies the future: AI models purpose-built not just to generate code, but to power and orchestrate complex, trust-sensitive engineering infrastructure—on your terms, at full speed, under your roof.

Engineering teams have spent the last two years participating in the “rent” economy of hosted AI. Mellum2 marks a pivot: owning and specializing your own agentic AI is finally practical. If you want one layer that won't change with every new API, OTF persists as the reliable plumbing beneath whatever focal model—the kind Mellum2 now proudly represents—you plug in on top. Try standing Mellum2 up in your workflow this sprint. The difference between API lock-in and real infrastructure-grade autonomy begins here.

ai-toolsopen-sourceagents

On this page