Staff AI/ML Engineer – Large-Scale Systems

Pasadena or San Francisco

Full-time

Apply now

About Us

We build high-performance foundation models designed to run efficiently across a wide range of environments—from edge devices to large-scale deployments. Our work spans models from ~1B to 100B+ parameters across LLMs, diffusion models, and other modalities, with a strong focus on scalable training, efficient inference, and real-world deployment.

Role Overview

We are seeking a Staff-level (or higher) AI/ML engineer to lead large-scale model training efforts. This role combines hands-on ownership of large training runs with responsibility for setting technical direction, mentoring engineers, and improving model quality and system performance across the organization.

Responsibilities

You will design, implement, and optimize distributed training systems for large-scale models across all major training phases. Core responsibilities include:

Leading model development across pretraining, fine-tuning, and post-training stages
Designing and improving data pipelines, including curation, filtering, deduplication, and dataset composition
Improving training efficiency, scalability, and reliability across large distributed systems
Optimizing model performance with respect to convergence, throughput, memory usage, and stability
Translating cutting-edge research into robust, production-ready systems
Providing technical leadership through mentoring, design reviews, and cross-functional collaboration

Basic Qualifications

You bring deep experience in large-scale AI/ML systems and strong fundamentals in modern model training:

8–10+ years of experience in machine learning or AI or strong publication record
Strong Python programming skills with production-quality code
Hands-on experience training large-scale models (multi-billion parameters)
Solid understanding of optimization, distributed training, and training dynamics
Experience with modern model training workflows (e.g., pretraining, fine-tuning, reinforcement learning approaches)
Proven ability to mentor and lead other AI/ML engineers

Preferred Qualifications

You have additional experience aligned with large-scale, high-performance AI/ML systems:

Experience training very large models (tens to hundreds of billions of parameters)
Familiarity with modern accelerator hardware (e.g., GPUs or TPUs) and distributed training frameworks
Experience improving system performance, resource utilization, and training efficiency
Exposure to deployment environments with real-world constraints (e.g., latency, cost, or hardware limitations)
Experience with advanced optimization techniques and scaling strategies
Contributions to research, publications, or open-source AI/ML systems

Ideal Candidate Profile

You have led or significantly contributed to training large models end-to-end, understand common failure modes in large-scale training systems, and know how to debug and improve them. You care about building efficient, reliable systems that work in real-world settings, enjoy mentoring others, and thrive at the intersection of research, engineering, and product.

Apply now

Back to home

Staff AI/ML Engineer – Large-Scale Systems

About Us

Role Overview

Responsibilities

Basic Qualifications

Preferred Qualifications

Ideal Candidate Profile

Related roles

Staff AI/ML Engineer – Edge & Consumer AI

Senior AI/ML Engineer – Kernel Optimization

Join the team

Contact us