Sequence 03 - CV 项目深入讲解和深挖准备

面试官大概率会让你讲项目。项目不是按业务讲，而是按系统能力讲。

统一模板：

背景是什么？
系统架构是什么？
你负责什么？
关键瓶颈是什么？
你怎么定位？
你做了什么设计/优化？
怎么验证效果？
踩过什么坑？
和 NVIDIA JD 有什么关系？

1. 项目总图

flowchart TB
    CV[CV Projects] --> P1[AI Risk-Control and LLM Strategy Platform]
    CV --> P2[Rust + Vulkan Rendering Pipeline]
    CV --> P3[Diamond Renderer / WebGPU Compute]
    CV --> P4[High-Concurrency Communication Systems]
    CV --> P5[Advantest Semiconductor ATE Software]
    CV --> P6[NRI Quant / Compute / OpenCV Pipelines]

    P1 --> JD1[AI inference / serving / observability]
    P2 --> JD2[GPU profiling / memory / workload behavior]
    P3 --> JD3[GPU compute / numerical behavior]
    P4 --> JD4[networking / backpressure / tail latency]
    P5 --> JD5[HW/SW joint debugging / diagnostics]
    P6 --> JD6[C++/Python compute pipeline / correctness]

2. 项目 1：AI Risk-Control and LLM Strategy Platform

2.1 怎么讲

这是最接近 JD 的项目。不要只讲“做了 AI 风控”，要讲成 inference/runtime/system 项目：

This project built an AI risk-control and LLM strategy platform that connected feature generation, online inference, retrieval-assisted analysis, anomaly detection, and strategy execution. My focus was not only model integration, but also production inference path, observability, replayability, correctness validation, and performance-sensitive workflow design.

2.2 系统图

flowchart LR
    Event[Production Events] --> Feature[Feature Generation]
    Feature --> Queue[Queue / Stream]
    Queue --> Infer[Online Inference]
    Infer --> LLM[LLM / Retrieval / Agent]
    LLM --> Strategy[Strategy Engine]
    Strategy --> Action[Decision / Risk Action]
    Action --> Feedback[Effectiveness Feedback]
    Feedback --> Eval[Evaluation / Replay]
    Eval --> Feature

    Infer --> Obs[Metrics / Traces / Logs]
    LLM --> Obs
    Strategy --> Obs

2.3 面试会深挖

追问	你要答什么
online inference path 怎么设计？	请求进入、特征生成、模型调用、策略执行、反馈闭环。
latency 怎么控制？	cache、batching、异步化、限流、降级、关键路径 profiling。
怎么验证正确性？	replay、shadow、A/B、指标对齐、异常样本回放。
LLM 部分怎么接生产？	retrieval、agent workflow、输出校验、策略侧 guardrail。
和 NVIDIA inference 有什么关系？	同样关注 serving path、latency/throughput、KV/cache、observability。

2.4 要补到 JD 语言

把项目语言升级为：

业务 AI 风控 -> production inference path
日志监控 -> observability and replayability
策略效果 -> end-to-end validation
请求慢 -> TTFT/P99/queueing/debug
LLM 调用 -> serving runtime and scheduling

3. 项目 2：Rust + Vulkan Rendering Pipeline

3.1 怎么讲

这是你连接 CUDA/GPU profiling 的主要项目。

I built a Rust + Vulkan rendering pipeline and used RenderDoc/Nsight to study GPU workload behavior, resource utilization, memory behavior, and performance-quality tradeoffs. Although it was not CUDA production work, the transferable skills are GPU timeline analysis, resource bottleneck isolation, memory/layout reasoning, and profiling-driven optimization.

3.2 系统图

flowchart TB
    Scene[Scene Data] --> CPU[CPU Scene Prep]
    CPU --> Upload[Buffer / Texture Upload]
    Upload --> GPU[GPU Pipeline]
    GPU --> Pass1[Geometry Pass]
    GPU --> Pass2[Lighting Pass]
    GPU --> Pass3[Post Process]
    GPU --> Present[Present]

    GPU --> Metrics[GPU Time / Resource Usage / Memory Behavior]
    Metrics --> Tools[RenderDoc / Nsight]
    Tools --> Optimization[Culling / LOD / Layout / Pass Optimization]

3.3 面试会深挖

追问	回答方向
Vulkan 和 CUDA 有什么可迁移？	GPU execution、memory locality、async command、profiling、resource bottleneck。
你怎么定位 GPU 瓶颈？	先 timeline，再 pass/kernel，最后看 memory/resource/stall。
Nsight 看什么？	GPU timeline、draw/dispatch、copy、sync、GPU idle。
和 CUDA 有什么差距？	CUDA kernel metrics 更细，需要补 Nsight Compute、warp、coalescing。

3.4 不能夸大

不要说：

我有 CUDA kernel production ownership。

应该说：

My production GPU-adjacent work was Vulkan/WebGPU rather than CUDA kernel ownership, but the profiling and bottleneck isolation mindset is transferable. I am closing the CUDA-specific gap with focused CUDA/Nsight experiments.

4. 项目 3：Diamond Renderer / WebGPU Compute

4.1 怎么讲

这个项目适合证明你能处理 GPU compute、numerical behavior、performance-quality tradeoff。

flowchart LR
    Input[Geometry / Material] --> Shader[WebGPU Shader]
    Shader --> Physics[Refraction / Dispersion / TIR]
    Physics --> Render[Real-time Result]
    Render --> Tradeoff[Quality vs Performance]
    Tradeoff --> Profile[GPU Timing / Visual Validation]

面试连接点：

JD 点	项目连接
GPU programming models	WebGPU shader/compute mental model
performance-quality tradeoff	real-time rendering constraints
profiling	GPU timing and workload analysis
numerical behavior	optics simulation correctness

5. 项目 4：High-Concurrency Communication Systems

这是你连接 AI networking 的关键非 GPU 项目。

flowchart TB
    Client[Clients] --> Gateway[Gateway / Connection Management]
    Gateway --> Router[Routing]
    Router --> Service[Backend Services]
    Service --> Queue[Kafka / Queue]
    Queue --> Worker[Workers]
    Worker --> Result[Response / Events]

    Gateway --> Metrics[Latency / P99 / Connection Count]
    Queue --> Backpressure[Backpressure]
    Metrics --> Debug[Hot-path / Tail Latency Debug]

面试深挖：

追问	你要讲
100k connections 怎么支撑？	connection lifecycle、event loop/thread model、backpressure、routing、observability。
sub-50ms 怎么保证？	critical path、queueing、hot path、native acceleration、指标拆解。
和 GPU networking 有什么关系？	都是 data path、tail latency、backpressure、transport、observability，只是硬件层不同。

6. 项目 5：Advantest Semiconductor ATE Software

这是你连接 HW/SW joint debugging 的项目。

flowchart LR
    TestPlan[Test Plan] --> Control[ATE Control Software]
    Control --> Device[Device Under Test]
    Device --> Measure[Measurement Data]
    Measure --> Diagnose[Diagnostic Tooling]
    Diagnose --> Engineer[HW / Validation Engineer]
    Engineer --> Fix[Workflow / Calibration / Software Fix]

面试连接：

JD 点	项目连接
hardware features	和硬件工程师一起分析异常结果
correctness-sensitive systems	measurement correctness / repeatability
diagnostics	structured debugging / issue closure
system architecture	test workflow orchestration

7. 项目 6：NRI Quant / Compute / OpenCV Pipelines

这是你连接 C++/Python compute pipeline、correctness、traceability 的项目。

flowchart TB
    Data[Raw Data] --> Clean[Cleaning]
    Clean --> Feature[Feature Computation]
    Feature --> Model[Model / Backtest]
    Model --> Sim[Execution Simulation]
    Sim --> Eval[Evaluation]
    Eval --> Report[Reporting / Traceability]

    Image[Image Data] --> OpenCV[OpenCV Pipeline]
    OpenCV --> Classify[Classification / Recognition]

面试连接：

JD 点	项目连接
Python/C++	compute pipelines
algorithm design	feature/model/evaluation
correctness	traceability/backtesting
performance	batch computation / numerical processing

8. 项目讲解总模板

每个项目最终都按这个格式讲：

1. Context:
   这个项目解决什么生产/系统问题。

2. Architecture:
   数据流、控制流、关键模块。

3. My role:
   我负责什么，不夸大。

4. Bottleneck:
   性能/正确性/稳定性/可观测性问题是什么。

5. Approach:
   我怎么定位、怎么设计、怎么验证。

6. Result:
   指标、稳定性、可复现性、工程收益。

7. NVIDIA relevance:
   它和 AI inference、GPU profiling、networking/data movement、architecture 的关系。