Work across AI accelerator benchmarks, simulators, compiler/runtime prototypes, operator mapping, and real-system validation.
Responsibilities
- Support AI accelerator benchmark design, evaluation-flow setup, and result analysis.
- Profile and analyze representative workloads such as LLM inference to identify bottlenecks.
- Design microbenchmarks to study memory access, utilization, latency, and related behavior.
- Help build simulator environments, configuration management, regression tests, trace parsers, and experiment frameworks.
- Explore compiler, runtime, operator mapping, scheduling, memory layout, tiling, and dataflow prototypes.
- After hardware is available, support bring-up, functional validation, performance testing, and simulator-to-silicon comparison.
Requirements
- Major in computer science, electronic engineering, automation, software engineering, integrated circuits, AI, or a related field; graduate students preferred, strong undergraduates welcome.
- Solid computer-science fundamentals and strong interest or experience in at least one of computer architecture, compilers/systems software, parallel or high-performance computing, AI inference systems, model deployment, or performance optimization.
- Good C/C++ or Python programming skills.
- Strong analytical, experimental, and technical-writing ability.
- Self-driven learner who can move quickly in early-stage projects with uncertainty.
Nice to have
- Coursework or deep study in computer architecture, parallel architecture, storage systems, or compilers.
- Experience with benchmarking, profiling, trace analysis, or performance modeling.
- Familiarity with PyTorch, CUDA, TVM, MLIR, LLVM, Triton, ONNX Runtime, vLLM, or related tools.
- Experience with simulators, microbenchmarks, runtime systems, or kernel optimization.
- Practical understanding of LLM inference, KV Cache, attention, and memory hierarchy.
- Research experience, systems projects, or open-source contributions.