AI Inference Engineer job opportunity at Quadric Inc..



Date2025-11-24T17:42:53.007Z bot
Quadric Inc. AI Inference Engineer
Experience: 5-years
Pattern: Full-time
apply Apply Now
Salary:
Status:

Job

Copy Link Report
degreeGeneral
loacation Burlingame, United States
loacation Burlingame....United States
Auto GPT Summarize Enabled

Quadric has created an innovative general purpose neural processing unit (GPNPU) architecture. Quadric's co-optimized software and hardware is targeted to run neural network (NN) inference workloads in a wide variety of edge and endpoint devices, ranging from battery operated smart-sensor systems to high-performance automotive or autonomous vehicle systems. Unlike other NPUs or neural network accelerators in the industry today that can only accelerate a portion of a machine learning graph, the Quadric GPNPU executes both NN graph code and conventional C++ DSP and control code. Role The AI Inference Engineer in Quadric is the key bridge between the world of AI/LLM models and Quadric unique platforms. The AI Inference Engineer at Quadric will [1] port AI models to Quadric platform; [2] optimize the model deployment for efficient inference; [3] profile and benchmark the model performance. This senior technical role demands deep knowledge of AI model algorithms, system architecture and AI toolchains/frameworks. This California Bay Area based role follows a hybrid schedule, with at least two in-office days per week at our Burlingame office, the ability to commute regularly, and occasional additional onsite days as needed based on team and business priorities. The team and company also gather periodically for onsite meetings and offsite events, which are valued opportunities to connect, collaborate, and align. Responsibilities Quantize, prune and convert models for deployment Port models to Quadric platform using Quadric toolchain Optimize inference deployment for latency, speed Benchmark and profile model performance and accuracy Collaborate across related areas of the AI inference stack to support team and business priorities Develop tools to scale and speed up the deployment Make Improvement to SDK and runtime Provide technical support and documents to customers and developer community Bachelor’s or Master’s in Computer Science and/or Electric Engineering. 5+ years of experience in AI/LLM model inference and deployment frameworks/tools experience with model quantization (PTQ, QAT) and tools experience with model accuracy measures experience with model inference performance profiling experience with at least one of the following frameworks: onnxruntime, Pytorch, vLLM, huggingface-transformer, neural-compressor, llamacpp Proficiency in C/C++ and Python Demonstrate good capability in problem solving, debug and communication

Other Ai Matches

Field Application Engineer (Machine Learning) Applicants are expected to have a solid experience in handling Job related tasks
Field Application Engineer (Machine Learning) Applicants are expected to have a solid experience in handling Job related tasks
Senior Hardware Engineer - Micro-Architect Applicants are expected to have a solid experience in handling Job related tasks
Software Test Manager, AI Systems Applicants are expected to have a solid experience in handling AI Systems related tasks