mlx
Native Apple Metal GPU inference and training via mlx-c CGO bindings.
Implements the go/inference Backend + TextModel
contracts for Apple Silicon (M1–M4). Supports Gemma 3, Gemma 4 (dense
and MoE), Qwen 2/3, and Llama 3 from HuggingFace safetensors directories
and GGUF checkpoints — fused Metal kernels for RMSNorm, RoPE, scaled
dot-product attention, KV cache management, LoRA fine-tuning with AdamW,
GRPO, distillation, batch inference. Platform-restricted: darwin/arm64
only; a no-op stub compiles on every other platform so consumers don’t
need build tags.
Install
Section titled “Install”go get dappco.re/go/mlx@latestImport
Section titled “Import”import ( "dappco.re/go/inference" _ "dappco.re/go/mlx" // blank import registers "metal" backend)Direct API:
import "dappco.re/go/mlx"Two API shapes
Section titled “Two API shapes”Through the inference contract — recommended
Section titled “Through the inference contract — recommended”The default path is through go/inference: blank-import
go-mlx, then call inference.LoadModel. The backend registers itself as
"metal" and the auto-select picks it on Apple Silicon:
r := inference.LoadModel("/path/to/gemma-4-e2b/")if !r.OK { return r }model := r.Value.(inference.TextModel)
for tok := range model.Generate(ctx, "Hello") { fmt.Print(tok.Text)}Direct mlx.LoadModel — for advanced cases
Section titled “Direct mlx.LoadModel — for advanced cases”Skip the contract layer when you need Metal-specific behaviour the
generic TextModel doesn’t expose (KV-cache snapshot/fork, GRPO state,
explicit memory plan):
model, err := mlx.LoadModel("/path/to/safetensors/model/")if err != nil { return err }defer model.Close()
// Stream tokensfor tok := range model.GenerateStream(ctx, "prompt") { fmt.Print(tok.Text)}
// One-shotout, _ := model.Generate(ctx, "prompt", mlx.WithMaxTokens(256))LoadModelFromMedium accepts a go/io Medium so models can
live on remote storage backends without copying to local disk first.
Sub-packages — pick what you need
Section titled “Sub-packages — pick what you need”go-mlx is intentionally split across many sub-packages so callers depend only on what they use. CGO compile time + binary size scales linearly with what’s actually imported:
| Sub-package | What it covers |
|---|---|
mlx (root) | LoadModel, Generate, GenerateStream — the inference façade |
mlx/adapter | inference.Backend implementation (NewMLXBackend) |
mlx/compute | Frame-compute API for non-LLM Metal workloads — Session, PixelBuffer, RGB565→RGBA8, nearest scale, CRT filter, scanline, soften, sharpen |
mlx/lora | LoRA fine-tuning with AdamW — NewLoRA(model, cfg) |
mlx/agent | Agent memory + KV-snapshot index for stateful runs |
mlx/gguf | GGUF checkpoint loader + quantiser (gguf_quantize, gguf_info) |
mlx/safetensors | Safetensors I/O primitives |
mlx/pack | ModelPack types — bundle weights + config + tokenizer |
mlx/merge | Model merging — average, slerp, ties |
mlx/kv | KV-cache snapshot, fork, restore |
mlx/memory | Static memory-plan builder for predictable VRAM usage |
mlx/eval | Eval Runner — slot into go/ml scoring engine |
mlx/bundle | State-bundle codec — exportable snapshots |
mlx/probe | Capability probe runner |
mlx/chat | Chat template formatters per model family |
mlx/mlxlm | Python subprocess backend — CGO-free fallback |
mlx/openai | OpenAI-compatible HTTP shim over a local Model |
The split landed during the recent compute-first refactor — see the project memory + commit history for the phased lift order (gguf, lora, pack, merge, kv, …) that produced this layout.
Training surfaces
Section titled “Training surfaces”Three training shapes ship today, all on the same Metal kernels:
// Supervised fine-tuningsft := mlx.NewSFT(model, mlx.SFTConfig{ Dataset: "train.jsonl", LR: 2e-5, BatchSize: 4, Epochs: 3,})result, _ := sft.Run(ctx)
// LoRA fine-tuning (low-rank adapter, no full-weight write)lora := mlx.NewLoRA(model, &mlx.LoRAConfig{ Rank: 16, Alpha: 32, Targets: []string{"q_proj", "k_proj", "v_proj", "o_proj"},})lora.Train(ctx, dataset)
// GRPO — preference / reward traininggrpo := mlx.NewGRPO(model, mlx.GRPOConfig{ /* ... */ })grpoResult, _ := grpo.Run(ctx)mlx.NewGRPOCheckpointMetadata(path, cfg, grpoResult, update)
// Distillationdistill := mlx.NewDistill(model, teacher, mlx.DistillConfig{ /* ... */ })distillResult, _ := distill.Run(ctx)Every training run writes checkpoint metadata + the canonical adapter
files; LoadGRPOCheckpointMetadata / LoadDistillCheckpointMetadata /
LoadSFTCheckpointMetadata re-hydrate them so subsequent loads pick up
exactly where the run left off.
Frame compute — beyond LLMs
Section titled “Frame compute — beyond LLMs”mlx/compute exposes Metal kernels for image and emulator workloads
that aren’t language-model-shaped. Useful when you’ve already got the
Metal context open for inference and want to amortise its cost across
the rest of the pipeline:
sess, _ := mlx.NewSession()defer sess.Close()
frame := sess.BeginFrame()frame.Apply(mlx.KernelRGB565ToRGBA8{})frame.Apply(mlx.KernelNearestScale{Factor: 2})frame.Apply(mlx.KernelCRTFilter{Curve: 0.15})out := sess.FinishFrame(frame)CGO-free fallback — mlxlm
Section titled “CGO-free fallback — mlxlm”mlx/mlxlm shells out to a Python mlx-lm subprocess instead of
linking mlx-c. Slower, but compiles without the CGO toolchain — useful
for dev environments where setting up Xcode is friction:
import "dappco.re/go/mlx/mlxlm"
backend := mlxlm.New("/path/to/python", "/path/to/model")Sibling packages
Section titled “Sibling packages”go/inference— theBackendcontract go-mlx implementsgo/ml— scoring + agent orchestrator that drives mlx for evalgo/ai— facade that talks to mlx through the inference contractgo/store— DuckDB workspace for training metrics + checkpoint scores
Source
Section titled “Source”github.com/dappcore/go-mlx — full source, the mlx-c CGO bindings, every training surface, and the compute API.