Apple Silicon ML, without Python.

Pure-Go bindings to Apple's MLX with 24+ model architectures, training and inference paths, and single static binaries. Linux CUDA is supported for selected workflows.

Pre-1.0. Source private today, available for review on request. Used in production by skiff for local inference.

What it does

Core MLX runtime — arrays, autograd, compile. Exposed as small Go packages (mlx, mlx/nn, mlx/compile).
Models and training — 24+ architectures across language, vision, multimodal. Full training, inference, and LoRA fine-tuning paths.
Quantization — AWQ, GPTQ, and DWQ. Run quantized 7B models on Apple Silicon at interactive speeds.
Production serving — HTTP server with graceful shutdown, bounded concurrency, health endpoints, and GPU trace capture.
cgo-free — Metal access through apple via purego. Single static binary. No C toolchain in the build.

mlx-go is the Go-native compute foundation for on-device inference on Apple Silicon. Built on apple's Metal bindings, profiled with gputrace, and used by skiff for local inference. The product claim is simple: ship MLX-backed inference from Go without pulling a Python runtime into the deployment.

source private repo, available for review on request — travis@tmc.dev

docs in progress

contact travis@tmc.dev