Skip to content

mlx

Native Apple Metal GPU inference and training via mlx-c CGO bindings. Implements the go/inference Backend + TextModel contracts for Apple Silicon (M1–M4). Supports Gemma 3, Gemma 4 (dense and MoE), Qwen 2/3, and Llama 3 from HuggingFace safetensors directories and GGUF checkpoints — fused Metal kernels for RMSNorm, RoPE, scaled dot-product attention, KV cache management, LoRA fine-tuning with AdamW, GRPO, distillation, batch inference. Platform-restricted: darwin/arm64 only; a no-op stub compiles on every other platform so consumers don’t need build tags.

Terminal window
go get dappco.re/go/mlx@latest
import (
"dappco.re/go/inference"
_ "dappco.re/go/mlx" // blank import registers "metal" backend
)

Direct API:

import "dappco.re/go/mlx"
Section titled “Through the inference contract — recommended”

The default path is through go/inference: blank-import go-mlx, then call inference.LoadModel. The backend registers itself as "metal" and the auto-select picks it on Apple Silicon:

r := inference.LoadModel("/path/to/gemma-4-e2b/")
if !r.OK { return r }
model := r.Value.(inference.TextModel)
for tok := range model.Generate(ctx, "Hello") {
fmt.Print(tok.Text)
}

Direct mlx.LoadModel — for advanced cases

Section titled “Direct mlx.LoadModel — for advanced cases”

Skip the contract layer when you need Metal-specific behaviour the generic TextModel doesn’t expose (KV-cache snapshot/fork, GRPO state, explicit memory plan):

model, err := mlx.LoadModel("/path/to/safetensors/model/")
if err != nil { return err }
defer model.Close()
// Stream tokens
for tok := range model.GenerateStream(ctx, "prompt") {
fmt.Print(tok.Text)
}
// One-shot
out, _ := model.Generate(ctx, "prompt", mlx.WithMaxTokens(256))

LoadModelFromMedium accepts a go/io Medium so models can live on remote storage backends without copying to local disk first.

go-mlx is intentionally split across many sub-packages so callers depend only on what they use. CGO compile time + binary size scales linearly with what’s actually imported:

Sub-packageWhat it covers
mlx (root)LoadModel, Generate, GenerateStream — the inference façade
mlx/adapterinference.Backend implementation (NewMLXBackend)
mlx/computeFrame-compute API for non-LLM Metal workloads — Session, PixelBuffer, RGB565→RGBA8, nearest scale, CRT filter, scanline, soften, sharpen
mlx/loraLoRA fine-tuning with AdamW — NewLoRA(model, cfg)
mlx/agentAgent memory + KV-snapshot index for stateful runs
mlx/ggufGGUF checkpoint loader + quantiser (gguf_quantize, gguf_info)
mlx/safetensorsSafetensors I/O primitives
mlx/packModelPack types — bundle weights + config + tokenizer
mlx/mergeModel merging — average, slerp, ties
mlx/kvKV-cache snapshot, fork, restore
mlx/memoryStatic memory-plan builder for predictable VRAM usage
mlx/evalEval Runner — slot into go/ml scoring engine
mlx/bundleState-bundle codec — exportable snapshots
mlx/probeCapability probe runner
mlx/chatChat template formatters per model family
mlx/mlxlmPython subprocess backend — CGO-free fallback
mlx/openaiOpenAI-compatible HTTP shim over a local Model

The split landed during the recent compute-first refactor — see the project memory + commit history for the phased lift order (gguf, lora, pack, merge, kv, …) that produced this layout.

Three training shapes ship today, all on the same Metal kernels:

// Supervised fine-tuning
sft := mlx.NewSFT(model, mlx.SFTConfig{
Dataset: "train.jsonl",
LR: 2e-5,
BatchSize: 4,
Epochs: 3,
})
result, _ := sft.Run(ctx)
// LoRA fine-tuning (low-rank adapter, no full-weight write)
lora := mlx.NewLoRA(model, &mlx.LoRAConfig{
Rank: 16,
Alpha: 32,
Targets: []string{"q_proj", "k_proj", "v_proj", "o_proj"},
})
lora.Train(ctx, dataset)
// GRPO — preference / reward training
grpo := mlx.NewGRPO(model, mlx.GRPOConfig{ /* ... */ })
grpoResult, _ := grpo.Run(ctx)
mlx.NewGRPOCheckpointMetadata(path, cfg, grpoResult, update)
// Distillation
distill := mlx.NewDistill(model, teacher, mlx.DistillConfig{ /* ... */ })
distillResult, _ := distill.Run(ctx)

Every training run writes checkpoint metadata + the canonical adapter files; LoadGRPOCheckpointMetadata / LoadDistillCheckpointMetadata / LoadSFTCheckpointMetadata re-hydrate them so subsequent loads pick up exactly where the run left off.

mlx/compute exposes Metal kernels for image and emulator workloads that aren’t language-model-shaped. Useful when you’ve already got the Metal context open for inference and want to amortise its cost across the rest of the pipeline:

sess, _ := mlx.NewSession()
defer sess.Close()
frame := sess.BeginFrame()
frame.Apply(mlx.KernelRGB565ToRGBA8{})
frame.Apply(mlx.KernelNearestScale{Factor: 2})
frame.Apply(mlx.KernelCRTFilter{Curve: 0.15})
out := sess.FinishFrame(frame)

mlx/mlxlm shells out to a Python mlx-lm subprocess instead of linking mlx-c. Slower, but compiles without the CGO toolchain — useful for dev environments where setting up Xcode is friction:

import "dappco.re/go/mlx/mlxlm"
backend := mlxlm.New("/path/to/python", "/path/to/model")
  • go/inference — the Backend contract go-mlx implements
  • go/ml — scoring + agent orchestrator that drives mlx for eval
  • go/ai — facade that talks to mlx through the inference contract
  • go/store — DuckDB workspace for training metrics + checkpoint scores

github.com/dappcore/go-mlx — full source, the mlx-c CGO bindings, every training surface, and the compute API.