Sequence 01 - JD 要求、CV 内容、匹配关系总表

本文件是所有学习的入口。先不要跑代码，先把 JD 和 CV 的关系看清楚。

JD 来源：

jd.txt 指向 NVIDIA 官方职位 System Software Architect, AI and GPU Networking / JR2013092
官方岗位页显示该职位关注 AI Networking Research、GPU Networking、Dynamo、NIXL、UCX、GPUNetIO、inference/model serving、runtime systems、communication libraries、AI workloads、C++/Python/CUDA、NCCL/UCX/MPI、Prefill/Decode、DP/TP/FSDP 等。

1. JD 核心要求拆解

flowchart TB
    JD[System Software Architect AI and GPU Networking]

    JD --> Work[工作内容]
    JD --> Need[硬性要求]
    JD --> Plus[加分项]

    Work --> W1[增强 NVIDIA GPU Networking for AI workloads]
    Work --> W2[Dynamo / NIXL / UCX]
    Work --> W3[Data movement for inference and model serving]
    Work --> W4[Throughput / latency / memory efficiency]
    Work --> W5[Runtime systems / communication libraries]
    Work --> W6[NIXL / UCX / GPUNetIO enhancements]

    Need --> N1[System architecture / AI systems architecture]
    Need --> N2[Scaling AI / parallelism / training workloads]
    Need --> N3[Algorithm design / system programming]
    Need --> N4[Computer architecture / OS]
    Need --> N5[Virtualization / networking / storage]
    Need --> N6[Profiling and optimization]
    Need --> N7[C++ / Python / CUDA or GPU programming]

    Plus --> P1[Research track record]
    Plus --> P2[CPU/GPU/memory/storage/networking architecture]
    Plus --> P3[Deep learning frameworks]
    Plus --> P4[NCCL / UCX / MPI]
    Plus --> P5[Inference and training optimization]
    Plus --> P6[Prefill / Decode / DP / TP / FSDP]

2. CV 核心内容拆解

flowchart TB
    CV[Wenxuan CV]

    CV --> AI[AI / LLM Systems]
    CV --> Perf[Performance and Diagnostics]
    CV --> Net[Networking and Data Movement]
    CV --> GPU[GPU / Compute Diagnostics]
    CV --> Sys[Systems Programming]
    CV --> Arch[Architecture and Prototype]
    CV --> HW[Hardware-adjacent Diagnostics]

    AI --> AI1[Online inference]
    AI --> AI2[Feature pipelines]
    AI --> AI3[KV cache awareness]
    AI --> AI4[Speculative decoding]
    AI --> AI5[Batching / latency throughput tradeoff]

    Perf --> PF1[Benchmarking]
    Perf --> PF2[Hot-path analysis]
    Perf --> PF3[Regression governance]
    Perf --> PF4[Observability]

    Net --> NT1[gRPC / Kafka paths]
    Net --> NT2[Backpressure]
    Net --> NT3[Routing / connection management]
    Net --> NT4[Tail latency investigation]

    GPU --> GP1[Vulkan / WebGPU]
    GPU --> GP2[Nsight / RenderDoc]
    GPU --> GP3[Throughput / utilization / memory behavior]

    Sys --> SY1[C++ / Python / Bash / Linux]
    Sys --> SY2[Native addon + SIMD]
    Sys --> SY3[Diagnostic tools]

    Arch --> AR1[System design]
    Arch --> AR2[Prototype validation]
    Arch --> AR3[Cross-team debugging]

    HW --> HW1[Advantest semiconductor test software]
    HW --> HW2[Measurement data acquisition]
    HW --> HW3[Hardware/software joint debugging]

3. JD/CV 匹配总表

JD 技能点	CV 里对应内容	匹配度	风险	面试应该怎么说
AI inference / model serving	OKX AI risk-control、LLM strategy platform、online inference、KV cache、speculative decoding、batching	高	需要讲得更像 runtime/system，而不是业务 AI	用 request path、TTFT/TPOT/P99、prefill/decode、KV cache 来讲
Data movement for inference	CV 写 data-movement awareness、communication-path diagnosis	中	缺 NIXL production	诚实说生产不直接 owning NIXL，但能把 KV/state movement 和 data path 讲清楚
Dynamo / NIXL	CV 写 actively deepening Dynamo/NIXL concepts	中低	容易被追问库细节	把重点放在用途、场景、lifecycle、NIXL vs NCCL
UCX	CV 写 UCX concepts、Linux systems、communication diagnosis	中低	没有 RDMA 集群实战	讲 UCX 是 transport abstraction，并给 debug checklist
GPUNetIO	CV 写 GPUNetIO concepts	低	没有 DOCA/NVIDIA NIC 实战	只能作为概念和架构边界讲，不能装 production
NCCL / collectives	CV 写 NCCL concepts、distributed AI systems	中	缺多 GPU 实测	用 all-reduce、algbw/busbw、DP/TP/FSDP 通信模式支撑
CUDA / GPU programming	CV 有 Vulkan/WebGPU/Nsight/RenderDoc，正在补 CUDA	中	CUDA production 不强	讲 GPU profiling 可迁移，补 CUDA experiments
C++ / Python / Linux	多段经历都有 C++/Python/Linux	高	需落到系统路径	强调 C++/CUDA/Linux 是底层主战场，Python 用于 AI/benchmark
Profiling optimization	perf/flamegraph/Nsight/RenderDoc/hot-path/regression	高	需要连接 GPU/AI networking	用 critical path、timeline、kernel metrics、communication metrics 讲
System architecture/prototype	Staff/Architect 经验强	高	需要 NVIDIA 语境	用 hypothesis -> microbenchmark -> E2E validation -> roadmap
Virtualization/networking/storage	分布式服务、gRPC/Kafka、routing/backpressure；storage 较弱	中	virtualization/storage 细节可能薄	重点讲 networking/OS/data path，storage 不硬扩
Computer architecture/OS	C++/Linux/perf/SIMD/ATE/diagnostics	中	可能被问 cache、DMA、NUMA、PCIe	补硬件数据路径和 GPU/NIC 拓扑

4. 你的核心定位

最稳的定位不是“我已经是 NVIDIA AI networking expert”，而是：

我是 systems/performance/AI infra 背景的工程师，强项是系统架构、性能定位、诊断工具、生产路径优化、AI inference 集成和通信路径分析。对这个岗位，我最相关的是把 AI inference workload、GPU profiling、communication/data movement、prototype validation 连接起来。NIXL/UCX/GPUNetIO 是我正在补深的 NVIDIA-specific stack，我会用明确的 data path、benchmark 和 debug checklist 来降低 ramp-up 风险。

5. 最重要的匹配图

flowchart LR
    JD1[JD: AI inference/model serving] --> CV1[CV: LLM inference / KV cache / batching]
    JD2[JD: data movement / NIXL] --> CV2[CV: communication-path diagnosis / runtime awareness]
    JD3[JD: UCX / GPUNetIO / GPU networking] --> CV3[CV: networking / backpressure / Linux systems]
    JD4[JD: NCCL / parallelism] --> CV4[CV: distributed AI concepts / performance diagnosis]
    JD5[JD: CUDA / profiling] --> CV5[CV: Vulkan/WebGPU/Nsight/RenderDoc]
    JD6[JD: architecture/prototype] --> CV6[CV: Staff/Architect / prototype / roadmap]

    CV1 --> Gap1[Need: deeper runtime language]
    CV2 --> Gap2[Need: NIXL lifecycle and KV movement]
    CV3 --> Gap3[Need: UCX/RDMA/GPUDirect/GPUNetIO boundary]
    CV4 --> Gap4[Need: collective semantics and benchmarks]
    CV5 --> Gap5[Need: CUDA-specific experiments]
    CV6 --> Strength[Strong: architecture interview advantage]

6. 面试优先级

优先级	主题	原因
P0	AI inference/model serving	JD 核心，CV 强匹配，最容易展开项目。
P0	CUDA/Nsight/GPU performance	JD 明确要求 profiling/GPU programming。
P0	NCCL/parallelism	AI networking 必问，且和 training/inference 都相关。
P0	UCX/RDMA/GPUDirect	GPU networking 底层路径，容易深挖。
P1	NIXL/Dynamo	JD 点名，必须会讲用途和边界。
P1	GPUNetIO	JD 点名，但可讲概念和使用场景。
P1	CV 项目深挖	面试官大概率让你讲项目。
P2	GR00T/Physical AI	加分项，不是主线。