OpenAI Challenge · 16MB Artifact

OpenAI Model Craft Challenge: Parameter Golf

A frontier model-engineering challenge: train the strongest language model that fits in a 16MB artifact and trains in under 10 minutes on 8×H100s, evaluated by compression performance on FineWeb validation (tokenizer-agnostic, bits per byte).

Overview

Parameter Golf is inspired by NanoGPT Speedrunning, but shifts the optimization target to an explicitly parameter-constrained regime. The objective is to discover architectures and training methods that maximize capability under strict size and runtime limits.

You can think of this as an L(N)-style optimization problem: achieve the lowest possible loss for a fixed parameter budget, unconstrained by architecture creativity.

What Makes It Interesting

Research Surface Area

  • Test-time compute and depth recurrence
  • Aggressive parameter tying and low-rank training
  • Quantization, QAT, bit-level model formats, tokenizer innovation
  • Long-context evaluation and system-level kernel optimizations

Two Tracks of Exploration

  • Record submissions: must satisfy the official 10-minute / 8×H100 bound.
  • Non-record submissions: unlimited-compute explorations still welcomed for ideas and breakthroughs.

Core Constraints & Rules

  • Total artifact budget: 16,000,000 bytes (decimal MB), including code + compressed model.
  • Evaluation and training must be self-contained; no external downloads or network calls during eval.
  • Leaderboard records should beat SOTA by at least 0.005 nats with strong statistical evidence.
  • Validation data cannot be leaked into training beyond allowed test-time training rules.

Leaderboard Snapshot

LeakyReLU² + Score-First TTT + Parallel Muon
1.1194
Author: abaybektursun · 2026-03-23
11L EMA + GPTQ-lite + warmdown3500
1.1228
Author: signalrush · 2026-03-22
11L Partial RoPE + LN Scale + EMA + XSA4
1.1248
Author: jfprincz · 2026-03-21

Getting Started

The official workflow supports both local iteration (e.g., Apple Silicon / MLX) and cloud GPU scaling (e.g., RunPod H100 pods). Typical setup includes cloning the repo, creating a fresh Python environment, downloading cached FineWeb shards, and launching baseline training with torchrun or MLX scripts.

OpenAI is also sponsoring $1,000,000 in compute credits to help participants bootstrap experiments through a compute grant process.

Timeline & Participation

  • Challenge window: March 18 – April 30.
  • Participant form is optional, but useful for attribution and OpenAI outreach.
  • Top technical submissions may stand out to researchers and recruiters.
View Official Repository View My GitHub Fork
← Back to Home