Sequence 03 - CV 项目深入讲解和深挖准备
面试官大概率会让你讲项目。项目不是按业务讲,而是按系统能力讲。
统一模板:
背景是什么?
系统架构是什么?
你负责什么?
关键瓶颈是什么?
你怎么定位?
你做了什么设计/优化?
怎么验证效果?
踩过什么坑?
和 NVIDIA JD 有什么关系?
1. 项目总图
flowchart TB
CV[CV Projects] --> P1[AI Risk-Control and LLM Strategy Platform]
CV --> P2[Rust + Vulkan Rendering Pipeline]
CV --> P3[Diamond Renderer / WebGPU Compute]
CV --> P4[High-Concurrency Communication Systems]
CV --> P5[Advantest Semiconductor ATE Software]
CV --> P6[NRI Quant / Compute / OpenCV Pipelines]
P1 --> JD1[AI inference / serving / observability]
P2 --> JD2[GPU profiling / memory / workload behavior]
P3 --> JD3[GPU compute / numerical behavior]
P4 --> JD4[networking / backpressure / tail latency]
P5 --> JD5[HW/SW joint debugging / diagnostics]
P6 --> JD6[C++/Python compute pipeline / correctness]
2. 项目 1:AI Risk-Control and LLM Strategy Platform
2.1 怎么讲
这是最接近 JD 的项目。不要只讲“做了 AI 风控”,要讲成 inference/runtime/system 项目:
This project built an AI risk-control and LLM strategy platform that connected feature generation, online inference, retrieval-assisted analysis, anomaly detection, and strategy execution. My focus was not only model integration, but also production inference path, observability, replayability, correctness validation, and performance-sensitive workflow design.
2.2 系统图
flowchart LR
Event[Production Events] --> Feature[Feature Generation]
Feature --> Queue[Queue / Stream]
Queue --> Infer[Online Inference]
Infer --> LLM[LLM / Retrieval / Agent]
LLM --> Strategy[Strategy Engine]
Strategy --> Action[Decision / Risk Action]
Action --> Feedback[Effectiveness Feedback]
Feedback --> Eval[Evaluation / Replay]
Eval --> Feature
Infer --> Obs[Metrics / Traces / Logs]
LLM --> Obs
Strategy --> Obs
2.3 面试会深挖
| 追问 | 你要答什么 |
|---|---|
| online inference path 怎么设计? | 请求进入、特征生成、模型调用、策略执行、反馈闭环。 |
| latency 怎么控制? | cache、batching、异步化、限流、降级、关键路径 profiling。 |
| 怎么验证正确性? | replay、shadow、A/B、指标对齐、异常样本回放。 |
| LLM 部分怎么接生产? | retrieval、agent workflow、输出校验、策略侧 guardrail。 |
| 和 NVIDIA inference 有什么关系? | 同样关注 serving path、latency/throughput、KV/cache、observability。 |
2.4 要补到 JD 语言
把项目语言升级为:
业务 AI 风控 -> production inference path
日志监控 -> observability and replayability
策略效果 -> end-to-end validation
请求慢 -> TTFT/P99/queueing/debug
LLM 调用 -> serving runtime and scheduling
3. 项目 2:Rust + Vulkan Rendering Pipeline
3.1 怎么讲
这是你连接 CUDA/GPU profiling 的主要项目。
I built a Rust + Vulkan rendering pipeline and used RenderDoc/Nsight to study GPU workload behavior, resource utilization, memory behavior, and performance-quality tradeoffs. Although it was not CUDA production work, the transferable skills are GPU timeline analysis, resource bottleneck isolation, memory/layout reasoning, and profiling-driven optimization.
3.2 系统图
flowchart TB
Scene[Scene Data] --> CPU[CPU Scene Prep]
CPU --> Upload[Buffer / Texture Upload]
Upload --> GPU[GPU Pipeline]
GPU --> Pass1[Geometry Pass]
GPU --> Pass2[Lighting Pass]
GPU --> Pass3[Post Process]
GPU --> Present[Present]
GPU --> Metrics[GPU Time / Resource Usage / Memory Behavior]
Metrics --> Tools[RenderDoc / Nsight]
Tools --> Optimization[Culling / LOD / Layout / Pass Optimization]
3.3 面试会深挖
| 追问 | 回答方向 |
|---|---|
| Vulkan 和 CUDA 有什么可迁移? | GPU execution、memory locality、async command、profiling、resource bottleneck。 |
| 你怎么定位 GPU 瓶颈? | 先 timeline,再 pass/kernel,最后看 memory/resource/stall。 |
| Nsight 看什么? | GPU timeline、draw/dispatch、copy、sync、GPU idle。 |
| 和 CUDA 有什么差距? | CUDA kernel metrics 更细,需要补 Nsight Compute、warp、coalescing。 |
3.4 不能夸大
不要说:
我有 CUDA kernel production ownership。
应该说:
My production GPU-adjacent work was Vulkan/WebGPU rather than CUDA kernel ownership, but the profiling and bottleneck isolation mindset is transferable. I am closing the CUDA-specific gap with focused CUDA/Nsight experiments.
4. 项目 3:Diamond Renderer / WebGPU Compute
4.1 怎么讲
这个项目适合证明你能处理 GPU compute、numerical behavior、performance-quality tradeoff。
flowchart LR
Input[Geometry / Material] --> Shader[WebGPU Shader]
Shader --> Physics[Refraction / Dispersion / TIR]
Physics --> Render[Real-time Result]
Render --> Tradeoff[Quality vs Performance]
Tradeoff --> Profile[GPU Timing / Visual Validation]
面试连接点:
| JD 点 | 项目连接 |
|---|---|
| GPU programming models | WebGPU shader/compute mental model |
| performance-quality tradeoff | real-time rendering constraints |
| profiling | GPU timing and workload analysis |
| numerical behavior | optics simulation correctness |
5. 项目 4:High-Concurrency Communication Systems
这是你连接 AI networking 的关键非 GPU 项目。
flowchart TB
Client[Clients] --> Gateway[Gateway / Connection Management]
Gateway --> Router[Routing]
Router --> Service[Backend Services]
Service --> Queue[Kafka / Queue]
Queue --> Worker[Workers]
Worker --> Result[Response / Events]
Gateway --> Metrics[Latency / P99 / Connection Count]
Queue --> Backpressure[Backpressure]
Metrics --> Debug[Hot-path / Tail Latency Debug]
面试深挖:
| 追问 | 你要讲 |
|---|---|
| 100k connections 怎么支撑? | connection lifecycle、event loop/thread model、backpressure、routing、observability。 |
| sub-50ms 怎么保证? | critical path、queueing、hot path、native acceleration、指标拆解。 |
| 和 GPU networking 有什么关系? | 都是 data path、tail latency、backpressure、transport、observability,只是硬件层不同。 |
6. 项目 5:Advantest Semiconductor ATE Software
这是你连接 HW/SW joint debugging 的项目。
flowchart LR
TestPlan[Test Plan] --> Control[ATE Control Software]
Control --> Device[Device Under Test]
Device --> Measure[Measurement Data]
Measure --> Diagnose[Diagnostic Tooling]
Diagnose --> Engineer[HW / Validation Engineer]
Engineer --> Fix[Workflow / Calibration / Software Fix]
面试连接:
| JD 点 | 项目连接 |
|---|---|
| hardware features | 和硬件工程师一起分析异常结果 |
| correctness-sensitive systems | measurement correctness / repeatability |
| diagnostics | structured debugging / issue closure |
| system architecture | test workflow orchestration |
7. 项目 6:NRI Quant / Compute / OpenCV Pipelines
这是你连接 C++/Python compute pipeline、correctness、traceability 的项目。
flowchart TB
Data[Raw Data] --> Clean[Cleaning]
Clean --> Feature[Feature Computation]
Feature --> Model[Model / Backtest]
Model --> Sim[Execution Simulation]
Sim --> Eval[Evaluation]
Eval --> Report[Reporting / Traceability]
Image[Image Data] --> OpenCV[OpenCV Pipeline]
OpenCV --> Classify[Classification / Recognition]
面试连接:
| JD 点 | 项目连接 |
|---|---|
| Python/C++ | compute pipelines |
| algorithm design | feature/model/evaluation |
| correctness | traceability/backtesting |
| performance | batch computation / numerical processing |
8. 项目讲解总模板
每个项目最终都按这个格式讲:
1. Context:
这个项目解决什么生产/系统问题。
2. Architecture:
数据流、控制流、关键模块。
3. My role:
我负责什么,不夸大。
4. Bottleneck:
性能/正确性/稳定性/可观测性问题是什么。
5. Approach:
我怎么定位、怎么设计、怎么验证。
6. Result:
指标、稳定性、可复现性、工程收益。
7. NVIDIA relevance:
它和 AI inference、GPU profiling、networking/data movement、architecture 的关系。