Open Role

Postdoctoral Researcher - LLM Runtime

Own LLM Serving Runtime and KV Cache systems across scheduling, batching, profiling, and prototype hardware bring-up.

Back to Careers

Own LLM Serving Runtime and KV Cache systems across scheduling, batching, profiling, and prototype hardware bring-up.

Responsibilities

  • Serve as the core owner for LLM Serving Runtime and KV Cache capabilities, from planning and design to implementation.
  • Build key mechanisms for online LLM inference runtime, including request scheduling, batching, KV Cache management, long-context support, and performance optimization.
  • Drive end-to-end performance closure on real workloads and improve runtime benchmark, profiling, and performance-analysis methods.
  • Work with compiler, kernel, and silicon architecture teams to define and connect critical interfaces across the execution stack.
  • Advance bring-up, debugging, and iteration in prototype hardware environments so prototypes become stable systems.
  • Capture reusable designs and engineering practices for long-term system evolution.

Requirements

  • PhD in computer science, electronic engineering, automation, mathematics, computational science, or a related field.
  • Strong systems-software foundation across operating systems, concurrency, memory management, distributed systems, or high-performance computing.
  • Strong engineering capability to independently design, implement, debug, and profile complex modules.
  • Familiarity with at least one mainstream deep-learning or LLM inference stack such as PyTorch, CUDA, Triton, vLLM, SGLang, TensorRT-LLM, or DeepSpeed.
  • Clear understanding of online LLM inference concepts including prefill / decode, KV Cache, attention, batching, long context, and multi-card deployment.
  • Strong abstraction, collaboration, and ownership.