Sequence 01 - JD 要求、CV 内容、匹配关系总表
本文件是所有学习的入口。先不要跑代码,先把 JD 和 CV 的关系看清楚。
JD 来源:
jd.txt指向 NVIDIA 官方职位System Software Architect, AI and GPU Networking / JR2013092- 官方岗位页显示该职位关注 AI Networking Research、GPU Networking、Dynamo、NIXL、UCX、GPUNetIO、inference/model serving、runtime systems、communication libraries、AI workloads、C++/Python/CUDA、NCCL/UCX/MPI、Prefill/Decode、DP/TP/FSDP 等。
1. JD 核心要求拆解
flowchart TB
JD[System Software Architect AI and GPU Networking]
JD --> Work[工作内容]
JD --> Need[硬性要求]
JD --> Plus[加分项]
Work --> W1[增强 NVIDIA GPU Networking for AI workloads]
Work --> W2[Dynamo / NIXL / UCX]
Work --> W3[Data movement for inference and model serving]
Work --> W4[Throughput / latency / memory efficiency]
Work --> W5[Runtime systems / communication libraries]
Work --> W6[NIXL / UCX / GPUNetIO enhancements]
Need --> N1[System architecture / AI systems architecture]
Need --> N2[Scaling AI / parallelism / training workloads]
Need --> N3[Algorithm design / system programming]
Need --> N4[Computer architecture / OS]
Need --> N5[Virtualization / networking / storage]
Need --> N6[Profiling and optimization]
Need --> N7[C++ / Python / CUDA or GPU programming]
Plus --> P1[Research track record]
Plus --> P2[CPU/GPU/memory/storage/networking architecture]
Plus --> P3[Deep learning frameworks]
Plus --> P4[NCCL / UCX / MPI]
Plus --> P5[Inference and training optimization]
Plus --> P6[Prefill / Decode / DP / TP / FSDP]
2. CV 核心内容拆解
flowchart TB
CV[Wenxuan CV]
CV --> AI[AI / LLM Systems]
CV --> Perf[Performance and Diagnostics]
CV --> Net[Networking and Data Movement]
CV --> GPU[GPU / Compute Diagnostics]
CV --> Sys[Systems Programming]
CV --> Arch[Architecture and Prototype]
CV --> HW[Hardware-adjacent Diagnostics]
AI --> AI1[Online inference]
AI --> AI2[Feature pipelines]
AI --> AI3[KV cache awareness]
AI --> AI4[Speculative decoding]
AI --> AI5[Batching / latency throughput tradeoff]
Perf --> PF1[Benchmarking]
Perf --> PF2[Hot-path analysis]
Perf --> PF3[Regression governance]
Perf --> PF4[Observability]
Net --> NT1[gRPC / Kafka paths]
Net --> NT2[Backpressure]
Net --> NT3[Routing / connection management]
Net --> NT4[Tail latency investigation]
GPU --> GP1[Vulkan / WebGPU]
GPU --> GP2[Nsight / RenderDoc]
GPU --> GP3[Throughput / utilization / memory behavior]
Sys --> SY1[C++ / Python / Bash / Linux]
Sys --> SY2[Native addon + SIMD]
Sys --> SY3[Diagnostic tools]
Arch --> AR1[System design]
Arch --> AR2[Prototype validation]
Arch --> AR3[Cross-team debugging]
HW --> HW1[Advantest semiconductor test software]
HW --> HW2[Measurement data acquisition]
HW --> HW3[Hardware/software joint debugging]
3. JD/CV 匹配总表
| JD 技能点 | CV 里对应内容 | 匹配度 | 风险 | 面试应该怎么说 |
|---|---|---|---|---|
| AI inference / model serving | OKX AI risk-control、LLM strategy platform、online inference、KV cache、speculative decoding、batching | 高 | 需要讲得更像 runtime/system,而不是业务 AI | 用 request path、TTFT/TPOT/P99、prefill/decode、KV cache 来讲 |
| Data movement for inference | CV 写 data-movement awareness、communication-path diagnosis | 中 | 缺 NIXL production | 诚实说生产不直接 owning NIXL,但能把 KV/state movement 和 data path 讲清楚 |
| Dynamo / NIXL | CV 写 actively deepening Dynamo/NIXL concepts | 中低 | 容易被追问库细节 | 把重点放在用途、场景、lifecycle、NIXL vs NCCL |
| UCX | CV 写 UCX concepts、Linux systems、communication diagnosis | 中低 | 没有 RDMA 集群实战 | 讲 UCX 是 transport abstraction,并给 debug checklist |
| GPUNetIO | CV 写 GPUNetIO concepts | 低 | 没有 DOCA/NVIDIA NIC 实战 | 只能作为概念和架构边界讲,不能装 production |
| NCCL / collectives | CV 写 NCCL concepts、distributed AI systems | 中 | 缺多 GPU 实测 | 用 all-reduce、algbw/busbw、DP/TP/FSDP 通信模式支撑 |
| CUDA / GPU programming | CV 有 Vulkan/WebGPU/Nsight/RenderDoc,正在补 CUDA | 中 | CUDA production 不强 | 讲 GPU profiling 可迁移,补 CUDA experiments |
| C++ / Python / Linux | 多段经历都有 C++/Python/Linux | 高 | 需落到系统路径 | 强调 C++/CUDA/Linux 是底层主战场,Python 用于 AI/benchmark |
| Profiling optimization | perf/flamegraph/Nsight/RenderDoc/hot-path/regression | 高 | 需要连接 GPU/AI networking | 用 critical path、timeline、kernel metrics、communication metrics 讲 |
| System architecture/prototype | Staff/Architect 经验强 | 高 | 需要 NVIDIA 语境 | 用 hypothesis -> microbenchmark -> E2E validation -> roadmap |
| Virtualization/networking/storage | 分布式服务、gRPC/Kafka、routing/backpressure;storage 较弱 | 中 | virtualization/storage 细节可能薄 | 重点讲 networking/OS/data path,storage 不硬扩 |
| Computer architecture/OS | C++/Linux/perf/SIMD/ATE/diagnostics | 中 | 可能被问 cache、DMA、NUMA、PCIe | 补硬件数据路径和 GPU/NIC 拓扑 |
4. 你的核心定位
最稳的定位不是“我已经是 NVIDIA AI networking expert”,而是:
我是 systems/performance/AI infra 背景的工程师,强项是系统架构、性能定位、诊断工具、生产路径优化、AI inference 集成和通信路径分析。对这个岗位,我最相关的是把 AI inference workload、GPU profiling、communication/data movement、prototype validation 连接起来。NIXL/UCX/GPUNetIO 是我正在补深的 NVIDIA-specific stack,我会用明确的 data path、benchmark 和 debug checklist 来降低 ramp-up 风险。
5. 最重要的匹配图
flowchart LR
JD1[JD: AI inference/model serving] --> CV1[CV: LLM inference / KV cache / batching]
JD2[JD: data movement / NIXL] --> CV2[CV: communication-path diagnosis / runtime awareness]
JD3[JD: UCX / GPUNetIO / GPU networking] --> CV3[CV: networking / backpressure / Linux systems]
JD4[JD: NCCL / parallelism] --> CV4[CV: distributed AI concepts / performance diagnosis]
JD5[JD: CUDA / profiling] --> CV5[CV: Vulkan/WebGPU/Nsight/RenderDoc]
JD6[JD: architecture/prototype] --> CV6[CV: Staff/Architect / prototype / roadmap]
CV1 --> Gap1[Need: deeper runtime language]
CV2 --> Gap2[Need: NIXL lifecycle and KV movement]
CV3 --> Gap3[Need: UCX/RDMA/GPUDirect/GPUNetIO boundary]
CV4 --> Gap4[Need: collective semantics and benchmarks]
CV5 --> Gap5[Need: CUDA-specific experiments]
CV6 --> Strength[Strong: architecture interview advantage]
6. 面试优先级
| 优先级 | 主题 | 原因 |
|---|---|---|
| P0 | AI inference/model serving | JD 核心,CV 强匹配,最容易展开项目。 |
| P0 | CUDA/Nsight/GPU performance | JD 明确要求 profiling/GPU programming。 |
| P0 | NCCL/parallelism | AI networking 必问,且和 training/inference 都相关。 |
| P0 | UCX/RDMA/GPUDirect | GPU networking 底层路径,容易深挖。 |
| P1 | NIXL/Dynamo | JD 点名,必须会讲用途和边界。 |
| P1 | GPUNetIO | JD 点名,但可讲概念和使用场景。 |
| P1 | CV 项目深挖 | 面试官大概率让你讲项目。 |
| P2 | GR00T/Physical AI | 加分项,不是主线。 |