ml
The orchestration + evaluation layer above go/inference.
Pluggable backends (Apple Metal via go/mlx, managed
llama-server subprocesses, OpenAI-compatible HTTP), a concurrent scoring
engine that grades model outputs across heuristic / semantic / content /
standard-benchmark suites, 23 capability probes, GGUF model management,
and an SSH-based agent orchestrator that streams checkpoint evaluations
to InfluxDB + DuckDB.
Install
Section titled “Install”go get dappco.re/go/ml@latestImport
Section titled “Import”import "dappco.re/go/ml"Backends
Section titled “Backends”Three pluggable backend shapes implement the shared Backend contract —
pick the one closest to where your model lives:
// Apple Silicon, Metaladapter := ml.NewInferenceAdapter(metalModel, "gemma-4-e2b")
// Managed llama-server subprocessllama := ml.NewLlamaBackend("./model.gguf", ml.WithContextLen(8192))
// Any OpenAI-compatible HTTP endpoint (Ollama, vLLM, hosted APIs)http := ml.NewHTTPBackend("http://localhost:11434", "qwen3-8b")Each backend exposes the same TextModel shape — NewLlamaTextModel
and NewHTTPTextModel wrap them in the go/inference TextModel
interface so they slot directly into the rest of the stack.
Capability probes
Section titled “Capability probes”23 standardised probes that measure what a model can actually do — tool use, structured output, multi-turn coherence, refusal calibration, code synthesis, etc.:
result := ml.RunCapabilityProbes(ctx, backend)fmt.Printf("Score: %.2f (passed %d of %d probes)\n", result.Score, result.Passed, result.Total)
// Full variant emits per-probe response + lets a callback observe each// step (useful for live UIs during long runs):result, responses := ml.RunCapabilityProbesFull(ctx, backend, func(p ml.Probe, r ml.CapResponseEntry) { log.Printf("[%s] %s — %s", p.ID, p.Title, r.Outcome)})The companion RunContentProbes covers content-quality dimensions
(prose, summary, translation, structured rewrite) on the same shape.
Scoring + persistence
Section titled “Scoring + persistence”Probe responses become checkpoint scores via the Judge — a separate model graded against rubrics. Results stream to InfluxDB for time-series analysis and DuckDB for cross-checkpoint joins:
judge := ml.NewJudge(ml.JudgeConfig{ Backend: ml.NewHTTPBackend("...", "claude-opus-4-7"), Rubric: "rubric/v1.yaml",})influx := ml.NewInfluxClient(...)
ml.ScoreCapabilityAndPush(ctx, judge, influx, checkpoint, responses)ml.ScoreContentAndPush(ctx, judge, influx, checkpoint, runID, contentResponses)The DuckDB tables checkpoint_scores and probe_results come from
go/store so any consumer with the Core can join scoring
data against arbitrary local data.
Agent orchestrator
Section titled “Agent orchestrator”Agent runs the eval loop end-to-end across a fleet of remote workers
over SSH — fetch a checkpoint, run probes locally on the worker, ship
responses back, score, persist, repeat:
agent := ml.NewAgent(&ml.AgentConfig{ Fleet: []ml.WorkerSpec{ /* SSH targets + GPU specs */ }, Backends: []ml.BackendSpec{ /* per-worker backend assignments */ }, Cadence: 10 * time.Minute, OnReport: func(report ml.Report) { /* update dashboard */ },})
ml.RunAgentLoop(agent.Config())The orchestrator multiplexes the SSH transport so one local process can drive dozens of workers without per-host shell juggling.
Adapters
Section titled “Adapters”InferenceAdapter is the bridge that turns a go/inference TextModel
into an ml.Backend — useful when you want to point the ml scoring
engine at any model registered through inference:
ir := inference.LoadModel(path, inference.WithBackend("metal"))model := ir.Value.(inference.TextModel)backend := ml.NewInferenceAdapter(model, "gemma-4-e2b")
result := ml.RunCapabilityProbes(ctx, backend)Sibling packages
Section titled “Sibling packages”go/inference— the local-backend contractmladapts viaNewInferenceAdaptergo/mlx— Apple Silicon backend, the Metal defaultgo/ai— facade above ml when consumers want chat ergonomics rather than scoring infrastructurego/store— DuckDB scoring tablesml.Score*AndPushwrites to
Source
Section titled “Source”github.com/dappcore/go-ml — full source, the 23 probes, the scoring engine, the SSH agent orchestrator.