OpenAI launches Jalapeño, a custom AI chip built with Broadcom
Deploy Jalapeño chips in your data centers to achieve higher LLM inference performance per watt.
Evaluate Jalapeño chip specifications and plan procurement for upcoming LLM inference workloads.
Summary
OpenAI has announced Jalapeño, its first custom application‑specific integrated circuit (ASIC) designed in partnership with Broadcom. The chip is engineered to accelerate large‑language‑model (LLM) inference for flagship products such as ChatGPT, Codex, the OpenAI API, and forthcoming agent‑based services. Jalapeño’s architecture features a near‑reticle die that packs 216 GB of HBM3E memory, delivers 7.1–7.4 TB/s of memory bandwidth, and achieves 10 PFLOPS of FP4 compute. The design‑to‑tapeout cycle was completed in just nine months—an unusually rapid timeline for high‑performance ASICs—underscoring OpenAI’s push to own more of the AI stack and reduce reliance on merchant GPU supply chains.
The chip’s specifications translate into strong performance‑per‑watt, enabling OpenAI to tighten control over compute economics and product behaviour. By integrating memory, networking, scheduling, and deployment layers, Jalapeño allows the company to tailor inference pipelines to its specific workloads. The announcement also highlighted parallel industry moves: Qualcomm’s recent acquisition of Modular, which has confirmed that its Mojo language will remain open‑source, and NVIDIA’s NeMo AutoModel, which boosts mixture‑of‑experts training throughput. SkyPilot’s new Endpoints service further illustrates the trend toward unified inference platforms.
Strategically, Jalapeño positions OpenAI to offer on‑premises or edge inference options, potentially lowering cloud costs and reducing latency for end users. The move signals a broader shift toward vertically integrated inference stacks beyond the traditional NVIDIA/CUDA ecosystem. By controlling the hardware, software, and deployment layers, OpenAI aims to shape the future of AI services, ensuring that its models run efficiently and reliably across diverse environments.
Key changes
- Jalapeño is OpenAI’s first Intelligence Processor designed from scratch for LLM inference
- 9‑month design‑to‑tape‑out accelerated by OpenAI models
- Early testing shows performance per watt better than current state‑of‑the‑art
- Architecture reduces data movement and balances compute, memory, networking
- Broadcom’s Tomahawk networking silicon integrated
- Planned deployment at gigawatt scale with Microsoft and partners
- Jalapeño will support a multi‑generation compute platform for future LLMs