Senior AI/ML Engineer – Kernel Optimization

Pasadena or San Francisco

Full-time

About Us

‍We build high-performance foundation models designed to run efficiently across a wide range of environments—from edge devices to large-scale deployments. Our work spans models from ~1B to 100B+ parameters across LLMs, diffusion models, and other modalities, with a strong focus on scalable training, efficient inference, and real-world deployment.

Role Overview

We are seeking a Senior-level (or higher) AI/ML engineer with deep expertise in systems and kernel development to lead efforts in optimizing low-level performance across our model stack. This role focuses on designing and implementing high-performance kernels that accelerate inference and training for highly efficient model architectures, including 1-bit and other compressed representations, across diverse hardware platforms.

Responsibilities

You will design, implement, and optimize high-performance kernels and low-level systems to maximize efficiency across a range of inference runtimes and hardware targets. Core responsibilities include:

Designing and implementing custom kernels for model execution across GPUs and other accelerator hardware
Optimizing inference performance for highly efficient model representations (e.g., 1-bit or quantized models)
Improving throughput, latency, and memory efficiency across different inference runtimes and deployment environments
Collaborating with model and systems teams to co-design architectures and execution strategies for maximum performance
Profiling and debugging performance bottlenecks at the kernel, runtime, and system levels
Translating advances in hardware-aware optimization into production-ready systems
Providing technical leadership through mentoring, design reviews, and cross-functional collaboration

Basic Qualifications

You bring deep experience in systems engineering and performance optimization for AI/ML workloads:

5–8+ years of experience in systems engineering, machine learning infrastructure, or related fields
Strong programming skills in C/C++ and/or CUDA (or equivalent low-level languages) with production-quality code
Hands-on experience developing and optimizing kernels for GPUs or other accelerators
Solid understanding of computer architecture, memory hierarchies, and parallel programming models
Experience profiling and optimizing performance-critical systems
Proven ability to mentor and lead other engineers

Preferred Qualifications

You have additional experience aligned with high-performance AI systems and hardware-aware optimization:

Experience optimizing inference for quantized or compressed models (e.g., low-bit or 1-bit representations)
Familiarity with modern inference runtimes and compiler stacks (e.g., TensorRT, TVM, Triton, XLA, or similar)
Experience working across different hardware platforms (e.g., GPUs, CPUs, custom accelerators)
Knowledge of numerical methods and trade-offs in reduced-precision computation
Experience improving system-level performance, including kernel fusion, scheduling, and memory optimization
Contributions to performance-critical systems, open-source frameworks, or hardware-aware ML research

Ideal Candidate Profile

You have built and optimized kernels or low-level systems that significantly improve performance for large-scale AI workloads. You understand how model architecture, numerical representation, and hardware interact, and you know how to push systems to their limits. You care deeply about efficiency and performance, enjoy working close to the hardware/software boundary, and are comfortable leading efforts that span models, runtimes, and infrastructure.

Back to all roles

Senior AI/ML Engineer – Kernel Optimization

About Us

Role Overview

Responsibilities

Basic Qualifications

Preferred Qualifications

Ideal Candidate Profile

Related roles

Staff AI/ML Engineer – Large-Scale Systems

Senior AI/ML Engineer – Post-Training Platform

AI/ML Engineer – Developer Relations

Staff AI/ML Engineer – Edge & Consumer AI

Join the team

Contact us