Hello! I am Saurabh Kumar Singh, a Senior Consultant at Deloitte, Bangalore. Experienced AIML Application Developer with 5+ years of expertise in cutting-edge AI technologies, skilled in end-to-end development and deployment with strong hands-on experience across GenAI, Computer Vision, and ML infrastructure.

Specialized in developing end-to-end GenAI RAG engines and Document Processing pipelines for enterprise applications, incorporating Knowledge Graphs, Vector Databases, and NLP techniques. Proficient in ML model optimization and production deployment โ€” including LLM fine-tuning, Triton Inference Server optimization, model distillation/quantization, and scalable inference pipelines on cloud-native platforms.

What I Do

๐Ÿค–

Generative AI & LLMs

Building enterprise RAG engines, AI agents, and chatbots using OpenAI, Gemini, and open-source LLMs with Langchain & Llama-index.

๐Ÿ“Š

ML Optimization & Serving

Triton Inference Server, model distillation/quantization, ONNX/TensorRT optimization, and scalable inference pipelines on cloud-native platforms.

๐Ÿ‘๏ธ

Computer Vision & Edge AI

Intelligent Video Analytics on NVIDIA DeepStream & Jetson. Object detection, tracking, and real-time inference at scale.

๐Ÿ”—

Knowledge Graphs & Vector DBs

Designing hybrid retrieval systems with Neo4j, ChromaDB, Pinecone, and graph-augmented RAG for enterprise search.

Tech Stack

Python PyTorch Transformers BERT LangChain Llama-index OpenAI / Gemini APIs RAG Triton Inference Server TensorRT / ONNX CUDA DeepStream FastAPI Docker GCP / AWS ClearML / MLflow VectorDBs Knowledge Graphs

Let's Connect