Senior HPC and AI Networking Performance Research and Analysis Engineer job opportunity at NVIDIA.



DatePosted 18 Days Ago bot
NVIDIA Senior HPC and AI Networking Performance Research and Analysis Engineer
Experience: 6-years
Pattern: Remote
apply Apply Now
Salary:
Status:

Job

Copy Link Report
degreeGeneral
loacation Germany, Remote, Germany
loacation Germany, Remot..........Germany

NVIDIA is looking for a talented Performance Research and Analysis Engineer to join our Performance group. The ideal candidate will profile and analyze AI workloads on large GPUs and CPUs scale clusters for distributed Deep Learning LLM training and inference focusing at the communication patterns, collectives communication, RDMA, networking and system performance. You will work and interact with many types of HW platforms such as HCAs, Switches, CPUs, GPUs, Systems and also with various SW layers and features. You will experience with simulators and developing performance analysis tools and methodologies to dive deeply into the details, understand performance expectation, limitations, and bottlenecks as part of the root cause analysis of these jobs. What you'll be doing: Experience and research AI workloads and DL models specifically tailored for large-scale deep learning LLM training on NVIDIA supercomputers with a focus on High-performance networking. Benchmarking, Profiling, and Analyzing the performance to find bottlenecks and identify areas of improvement and optimizations, with a strong emphasis on networking aspects. Implement performance analysis tools. Collaborating with many teams from HW to SW to provide performance analysis insights. Define performance test planning, set performance expectations for new technologies and solutions, and work to reach the performance targets limits. What we need to see: B.Sc in Computer Science or Software Engineering 6+ years of experience with high-performance Networking (RDMA, MPI, NCCL) Demonstrated Performance Analysis skills and methodologies. Experience with NVIDIA GPUs, CUDA library, deep learning frameworks like TensorFlow or PyTorch, Combined with expertise in networking collective communication libraries (such as NCCL) and protocols (such as RoCE and RDMA). Fast and self-learning capabilities with strong analytical and problem solving skills Programming Languages: Python, Bash and C languages Experience with Linux OS distros Team player with good communication and interpersonal skills Ways to stand out from the crowd: In-depth knowledge and experience with AI workloads benchmarking for distributed LLM training, CUDA, and NCCL libraries. In-depth System knowledge and understanding (Intel / AMD / ARM CPUs, NVIDIA GPUs, HCA, Memory, PCI) Knowledge in Congestion Control algorithms NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Other Ai Matches

remote-jobserver Remote
Senior Software Architect - Deep Learning and HPC Communications Applicants are expected to have a solid experience in handling Job related tasks
Senior System Software Engineer - Tegra MODS Team Applicants are expected to have a solid experience in handling Job related tasks
Senior Product Development Engineer Applicants are expected to have a solid experience in handling Job related tasks
Senior Product Margin Data Analyst Applicants are expected to have a solid experience in handling Job related tasks
remote-jobserver Remote
Senior Solutions Architect, Public Sector Applicants are expected to have a solid experience in handling Public Sector related tasks
Senior DFX Methodology Engineer Applicants are expected to have a solid experience in handling Job related tasks
remote-jobserver Remote
Inception Regional Lead, DACH Applicants are expected to have a solid experience in handling DACH related tasks
remote-jobserver Remote
Solution Architect - Generative AI and Post-Training Applicants are expected to have a solid experience in handling Job related tasks
Senior Developer Relations Manager Applicants are expected to have a solid experience in handling Job related tasks
Data Center Network Deployment Engineer Applicants are expected to have a solid experience in handling Job related tasks
Manager, Deep Learning Algorithms Applicants are expected to have a solid experience in handling Deep Learning Algorithms related tasks
Senior Research Scientist, Multi-Modal Language Models Applicants are expected to have a solid experience in handling Multi-Modal Language Models related tasks
Senior Financial Analyst Applicants are expected to have a solid experience in handling Job related tasks
PCB Design Layout Engineer Applicants are expected to have a solid experience in handling Job related tasks
Senior GPU Compiler Development Engineer Applicants are expected to have a solid experience in handling Job related tasks
Senior Hardware Time Synchronization Architect Applicants are expected to have a solid experience in handling Job related tasks
Post Silicon Hardware System Integration Engineer' Applicants are expected to have a solid experience in handling Job related tasks
Senior Architect- Molecular Dynamics Applicants are expected to have a solid experience in handling Job related tasks
Senior Systems Software Engineer, Cloud Infrastructure and Development Applicants are expected to have a solid experience in handling Cloud Infrastructure and Development related tasks
remote-jobserver Remote
Product Marketing Manager, Quantum Computing Platform Applicants are expected to have a solid experience in handling Quantum Computing Platform related tasks
remote-jobserver Remote
Director, Global AI Initiatives - EMEA Applicants are expected to have a solid experience in handling Global AI Initiatives - EMEA related tasks
Principal Datacenter Resiliency Architect, RAS Features and Modeling Applicants are expected to have a solid experience in handling RAS Features and Modeling related tasks
Senior Compiler Engineer - Compute Front-End Applicants are expected to have a solid experience in handling Job related tasks