Work across AI accelerator benchmarks, simulators, compiler/runtime prototypes, operator mapping, and real-system validation.

Support AI accelerator benchmark design, evaluation-flow setup, and result analysis.
Profile and analyze representative workloads such as LLM inference to identify bottlenecks.
Design microbenchmarks to study memory access, utilization, latency, and related behavior.
Help build simulator environments, configuration management, regression tests, trace parsers, and experiment frameworks.
Explore compiler, runtime, operator mapping, scheduling, memory layout, tiling, and dataflow prototypes.
After hardware is available, support bring-up, functional validation, performance testing, and simulator-to-silicon comparison.

Major in computer science, electronic engineering, automation, software engineering, integrated circuits, AI, or a related field; graduate students preferred, strong undergraduates welcome.
Solid computer-science fundamentals and strong interest or experience in at least one of computer architecture, compilers/systems software, parallel or high-performance computing, AI inference systems, model deployment, or performance optimization.
Good C/C++ or Python programming skills.
Strong analytical, experimental, and technical-writing ability.
Self-driven learner who can move quickly in early-stage projects with uncertainty.

Coursework or deep study in computer architecture, parallel architecture, storage systems, or compilers.
Experience with benchmarking, profiling, trace analysis, or performance modeling.
Familiarity with PyTorch, CUDA, TVM, MLIR, LLVM, Triton, ONNX Runtime, vLLM, or related tools.
Experience with simulators, microbenchmarks, runtime systems, or kernel optimization.
Practical understanding of LLM inference, KV Cache, attention, and memory hierarchy.
Research experience, systems projects, or open-source contributions.

AI Silicon Systems Software and Architecture Intern