Manager, AI Networking Performance Research and Analysis job opportunity at NVIDIA.



DatePosted 19 Days Ago bot
NVIDIA Manager, AI Networking Performance Research and Analysis
Experience: 5-years
Pattern: full-time
apply Apply Now
Salary:
Status:

AI Networking Performance Research and Analysis

Copy Link Report
degreeGeneral
loacation Israel, Yokneam, Israel
loacation Israel, Yoknea..........Israel

NVIDIA is seeking a highly skilled and versatile Performance Research and Analysis Manager to join our Performance Group. This role will drive end-to-end performance strategy and execution for next-generation NVIDIA NIC, Switch, and Networking technologies, spanning the full lifecycle from pre-silicon performance modelling (simulation and emulation) through bring-up, validation, and GA readiness. The ideal candidate will lead cross-functional performance efforts across multiple teams to evaluate and optimize low-level networking and offload capabilities, including Storage acceleration, Security protocols, NIC pipeline and steering mechanisms, Switch performance, and E2E AI Networking cluster level performance for AI WLs, distributed training, and Inference jobs. In addition, this role will play a key leadership position in building scalable telemetry frameworks, performance dashboards, and job-level monitoring solutions to enable continuous performance tracking and root cause analysis across NVIDIA supercomputing environments. The position also includes deep ownership of competitive benchmarking and performance analysis. You will work closely with a wide range of NVIDIA hardware and software platforms, including HCAs, DPUs, switches, CPUs, GPUs, and full system architectures, across multiple networking stacks and performance-critical software layers. What you'll be doing: Lead performance research and evaluation of advanced networking technologies supporting AI workloads, including LLM training and inference at supercomputing scale. Define end-to-end performance test plans and methodology for next-generation Networking HW and networking technologies, including performance expectations and target KPIs. Drive benchmarking, profiling, reporting, and deep performance characterization of networking workloads and offload features. Collaborate closely with simulation, architecture, chip-design, firmware, and software teams to assess performance tradeoffs and identify bottlenecks. Perform deep root cause analysis (RCA) for performance gaps and stability issues, and drive cross-team mitigation plans. Develop and enhance performance analysis tools, automation frameworks, and scalable methodologies for cluster-level performance evaluation. Own performance observability efforts, including telemetry pipelines, dashboards, and job-level performance analytics. What we need to see: B.Sc in Computer Science or Software Engineering 5+ years of experience with high-performance Networking technologies (RDMA, Storage, Security, OVS, MPI) 3+ years as an engineering team manager Demonstrated Performance Analysis skills and methodologies. Experience with Cluster level performance, Telemetry, NIC, DPUs, Switches, and GPUs. Fast and self-learning capabilities with strong analytical and problem solving skills Programming Languages: Python, Bash and C/C++ languages Experience with Linux OS distros Team player and a leader with good communication and interpersonal skills Ways to stand out from the crowd: Deep system-level architecture knowledge (Intel / AMD / ARM CPUs, NVIDIA GPUs, HCA/DPU architecture, memory subsystems, PCIe, storage, NVLink). Strong expertise in RDMA networking performance and AI communication stacks (e.g., NCCL). Proven experience analysing AI workload communication patterns and benchmarking distributed LLM training workloads at scale. Experience designing telemetry frameworks, monitoring pipelines, and performance dashboards for large clusters. Familiarity with modern AI tooling including performance-driven agents, automation pipelines, and RAG-based applications. NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law. #LI-Hybrid

Other Ai Matches

remote-jobserver Remote
Senior Software Architect - Deep Learning and HPC Communications Applicants are expected to have a solid experience in handling Job related tasks
Senior System Software Engineer - Tegra MODS Team Applicants are expected to have a solid experience in handling Job related tasks
Senior Product Development Engineer Applicants are expected to have a solid experience in handling Job related tasks
Senior Product Margin Data Analyst Applicants are expected to have a solid experience in handling Job related tasks
remote-jobserver Remote
Senior Solutions Architect, Public Sector Applicants are expected to have a solid experience in handling Public Sector related tasks
Senior DFX Methodology Engineer Applicants are expected to have a solid experience in handling Job related tasks
remote-jobserver Remote
Inception Regional Lead, DACH Applicants are expected to have a solid experience in handling DACH related tasks
remote-jobserver Remote
Solution Architect - Generative AI and Post-Training Applicants are expected to have a solid experience in handling Job related tasks
Senior Developer Relations Manager Applicants are expected to have a solid experience in handling Job related tasks
Data Center Network Deployment Engineer Applicants are expected to have a solid experience in handling Job related tasks
Manager, Deep Learning Algorithms Applicants are expected to have a solid experience in handling Deep Learning Algorithms related tasks
Senior Research Scientist, Multi-Modal Language Models Applicants are expected to have a solid experience in handling Multi-Modal Language Models related tasks
Senior Financial Analyst Applicants are expected to have a solid experience in handling Job related tasks
PCB Design Layout Engineer Applicants are expected to have a solid experience in handling Job related tasks
Senior GPU Compiler Development Engineer Applicants are expected to have a solid experience in handling Job related tasks
Senior Hardware Time Synchronization Architect Applicants are expected to have a solid experience in handling Job related tasks
Post Silicon Hardware System Integration Engineer' Applicants are expected to have a solid experience in handling Job related tasks
Senior Architect- Molecular Dynamics Applicants are expected to have a solid experience in handling Job related tasks
Senior Systems Software Engineer, Cloud Infrastructure and Development Applicants are expected to have a solid experience in handling Cloud Infrastructure and Development related tasks
remote-jobserver Remote
Product Marketing Manager, Quantum Computing Platform Applicants are expected to have a solid experience in handling Quantum Computing Platform related tasks
remote-jobserver Remote
Director, Global AI Initiatives - EMEA Applicants are expected to have a solid experience in handling Global AI Initiatives - EMEA related tasks
Principal Datacenter Resiliency Architect, RAS Features and Modeling Applicants are expected to have a solid experience in handling RAS Features and Modeling related tasks
Senior Compiler Engineer - Compute Front-End Applicants are expected to have a solid experience in handling Job related tasks