Saurabh Kumar Singh

Hey there! I’m Saurabh Kumar Singh 😊.

I am working as a Senior Consultant @ Deloitte, Bangalore. I have completed B.Tech in the dept. of Computer Science And Engineering @ IIIT Naya Raipur 🎓👨‍🎓.

Experienced AIML Application Developer with 5+ years of expertise in cutting-edge AI technologies, skilled in end-to-end development and deployment with strong hands-on experience across GenAI, Computer Vision, and ML infrastructure.
Specialized in developing end-to-end GenAI RAG engines and Document Processing pipelines for enterprise applications, incorporating Knowledge Graphs, Vector Databases, and NLP techniques with OpenAI/Gemini APIs and open-source LLMs, employing Prompt Engineering best practices.
Proficient in ML model optimization and production deployment — including LLM fine-tuning, Triton Inference Server optimization, model distillation/quantization, and scalable inference pipelines on cloud-native platforms.
Hands-on experience in developing Intelligent Video Analytics applications and building production-grade services solving complex computer vision and machine learning problems with NVIDIA’s technology stack.

Research Interests

Generative AI & LLM Application Development
ML Model Optimization & Inference Serving
Intelligent Video Analytics
Deep Learning and Computer Vision

Work Experience

Deloitte (Senior Consultant), Bangalore (Aug 2024 — Present)
- Retail Client Staffing ( Apr 2025 – Present): Contributing to feature development and production support for a large-scale ml pipeline with direct impact on listing and search relevance. Implemented NLP model inference optimization on Triton Inference Server; Worked on Computer Vision and NLP model development, Triton Inference Server deployment, and performance benchmarking. Built and maintained backend pipelines on cloud-native infrastructure. • Created two capability showcase POCs for a GenAI chatbot service handling unstructured documents; supporting other stakeholders across the team.
- Document Processing & RAG System (Tire Manufacturing Client, Sep 2024 — Mar 2025): Developed and refined FastAPI-based backend services for an advanced Document Processing and RAG system, delivering key services including UserService, FileUpload, S3 Monitoring/Ingestion, Chat, and Feedback. Integrated NLP techniques — keyword extraction, semantic chunking, and advanced evaluation metrics — to automate extraction, indexing, and summarization of diverse departmental documents. Stabilized system through UAT and accelerated production readiness.
- Created two capability showcase POCs for a GenAI chatbot service handling unstructured documents.
JK Tech (Senior Consultant - Data Science), Bangalore (Jan 2023 — Jul 2024)
- JIVA-EKE: Developing End-to-End GenAI RAG engines for enterprise search applications on structured and unstructured data, utilizing llama-index agents, tools, and retrievers, integrated with Knowledge Graphs and VectorDBs.
- JEKA: Developed a Persona-based RAG platform leveraging Prompt engineering techniques along with OpenAI APIs and Open Source LLMs for context-aware responses.
- JARVIS: An accelerated ML development platform based on ClearML, ensuring faster ROI for AIML projects.
- Leading the LLM R&D Group for exploring latest GenAI activities.
Mavenir (R&D Engineer - IVA), Bangalore (Apr 2021 — Jan 2023)
- IVA Platform: Played a pivotal role in developing the AI-backend for an IVA Platform built on NVIDIA’s technology stack, writing DeepStream Python applications to address object detection challenges and enhance video analytics.
- Conducted model training with custom data using Transfer Learning (TAO) for DetectNetV2, YoloV4, SSD, and Darknet architectures.
Euclid Innovations (AI Software Developer), Hyderabad (Jan 2021 — Apr 2021)
- Worked on end-to-end Dockerization of the project, researching cutting-edge technology by NVIDIA and Facebook, building POCs and working with DeepStream on high-end dGPU systems.
- Achieved latency reduction by implementing H.264 encoding of inference frames from JPEG frames and optimizing the existing pipeline.
Smartcow.ai (IVA Engineer), Hyderabad (Jan 2020 — Oct 2020)
- Collaborated within a team to develop multiple Intelligent Video Analytics applications for Jetson devices and dGPUs.
- Demonstrated proficiency in GStreamer, glib, OpenCV, CUDA, TensorRT, and nvprof to optimize video processing and analysis.

Publications

Automatic Generation of Chest X-Ray Medical Imaging Reports using LSTM-CNN — International Conference on DSMLAI, Windhoek, Namibia (Aug 2022)
Affordable AI at the Edge on NVIDIA’s Jetson Ecosystem — Major Project Thesis, IIIT Naya Raipur (July 2020)

Technical Skills

Languages: Python, C/C++, Bash
ML & AI: PyTorch, TensorFlow, Transformers, BERT, LLMs, PEFT/qLoRA, Model Distillation/Quantization, ONNX/TensorRT, Computer Vision, Object Detection (YOLOv4/v8, DetectNetV2, SSD)
MLOps & Serving: Triton Inference Server, TorchServe, ClearML, MLflow, Docker, FastAPI, REST API
GenAI & RAG: Langchain, llama-index, VectorDBs, Knowledge GraphDBs, OpenAI/Gemini APIs, Prompt Engineering, Semantic Chunking
Infrastructure: NVIDIA DeepStream, TLT/TAO, GStreamer, CUDA, NVIDIA Jetson, GCP, AWS, Cloud-native Deployment

Scholastic Achievements

2019 Industry Academia Meet-2019, IIITNR (Secured Runner UP position)
2015 Medhavi Chhatra Samman (Secured 8th Rank in State in Intermediate)

In my spare time, I love to:

Play online multiplayer mobile games 🎮 like Call Of Duty 💣 and Clash Of Clans 💎 etc.
Travel to new places and connect with new people. 🧳
Explore food and music and solo bike riding. 🎧 + 😋 + 🏍️
I love to write what I learned, So I am making the habit of writing blogs.