Hey there! I’m Saurabh Kumar Singh 😊.
I am working as a Senior Consultant @ Deloitte, Bangalore. I have completed B.Tech in the dept. of Computer Science And Engineering @ IIIT Naya Raipur 🎓👨🎓.
- Experienced AIML Application Developer with 5+ years of expertise in cutting-edge AI technologies, skilled in end-to-end development and deployment with strong hands-on experience across GenAI, Computer Vision, and ML infrastructure.
- Specialized in developing end-to-end GenAI RAG engines and Document Processing pipelines for enterprise applications, incorporating Knowledge Graphs, Vector Databases, and NLP techniques with OpenAI/Gemini APIs and open-source LLMs, employing Prompt Engineering best practices.
- Proficient in ML model optimization and production deployment — including LLM fine-tuning, Triton Inference Server optimization, model distillation/quantization, and scalable inference pipelines on cloud-native platforms.
- Hands-on experience in developing Intelligent Video Analytics applications and building production-grade services solving complex computer vision and machine learning problems with NVIDIA’s technology stack.
Research Interests
- Generative AI & LLM Application Development
- ML Model Optimization & Inference Serving
- Intelligent Video Analytics
- Deep Learning and Computer Vision
Work Experience
- Deloitte (Senior Consultant), Bangalore (Aug 2024 — Present)
- Catalog Content Quality Scoring (Retail Client, Apr 2025 — Present): Developing new features and resolving production bugs for a Catalog Content Quality Scoring system — highly impactful for search capability enhancement. Owned end-to-end deployment of scoring services and implemented NLP models inference optimization using Triton Inference Server. Worked on Image Quality Scoring Service and Content Mismatch Scoring Service end-to-end, including model building, Triton deployment, and performance benchmarking. Built end-to-end backend pipelines and deployed on cloud-native platform.
- Document Processing & RAG System (Tire Manufacturing Client, Sep 2024 — Mar 2025): Developed and refined FastAPI-based backend services for an advanced Document Processing and RAG system, delivering key services including UserService, FileUpload, S3 Monitoring/Ingestion, Chat, and Feedback. Integrated NLP techniques — keyword extraction, semantic chunking, and advanced evaluation metrics — to automate extraction, indexing, and summarization of diverse departmental documents. Stabilized system through UAT and accelerated production readiness.
- Created two capability showcase POCs for a GenAI chatbot service handling unstructured documents.
- JK Tech (Senior Consultant - Data Science), Bangalore (Jan 2023 — Jul 2024)
- JIVA-EKE: Developing End-to-End GenAI RAG engines for enterprise search applications on structured and unstructured data, utilizing llama-index agents, tools, and retrievers, integrated with Knowledge Graphs and VectorDBs.
- JEKA: Developed a Persona-based RAG platform leveraging Prompt engineering techniques along with OpenAI APIs and Open Source LLMs for context-aware responses.
- JARVIS: An accelerated ML development platform based on ClearML, ensuring faster ROI for AIML projects.
- Leading the LLM R&D Group for exploring latest GenAI activities.
- Mavenir (R&D Engineer - IVA), Bangalore (Apr 2021 — Jan 2023)
- IVA Platform: Played a pivotal role in developing the AI-backend for an IVA Platform built on NVIDIA’s technology stack, writing DeepStream Python applications to address object detection challenges and enhance video analytics.
- Conducted model training with custom data using Transfer Learning (TAO) for DetectNetV2, YoloV4, SSD, and Darknet architectures.
- Euclid Innovations (AI Software Developer), Hyderabad (Jan 2021 — Apr 2021)
- Worked on end-to-end Dockerization of the project, researching cutting-edge technology by NVIDIA and Facebook, building POCs and working with DeepStream on high-end dGPU systems.
- Achieved latency reduction by implementing H.264 encoding of inference frames from JPEG frames and optimizing the existing pipeline.
- Smartcow.ai (IVA Engineer), Hyderabad (Jan 2020 — Oct 2020)
- Collaborated within a team to develop multiple Intelligent Video Analytics applications for Jetson devices and dGPUs.
- Demonstrated proficiency in GStreamer, glib, OpenCV, CUDA, TensorRT, and nvprof to optimize video processing and analysis.
Publications
- Automatic Generation of Chest X-Ray Medical Imaging Reports using LSTM-CNN — International Conference on DSMLAI, Windhoek, Namibia (Aug 2022)
- Affordable AI at the Edge on NVIDIA’s Jetson Ecosystem — Major Project Thesis, IIIT Naya Raipur (July 2020)
Technical Skills
- Languages: Python, C/C++, Bash
- ML & AI: PyTorch, TensorFlow, Transformers, BERT, LLMs, PEFT/qLoRA, Model Distillation/Quantization, ONNX/TensorRT, Computer Vision, Object Detection (YOLOv4/v8, DetectNetV2, SSD)
- MLOps & Serving: Triton Inference Server, TorchServe, ClearML, MLflow, Docker, FastAPI, REST API
- GenAI & RAG: Langchain, llama-index, VectorDBs, Knowledge GraphDBs, OpenAI/Gemini APIs, Prompt Engineering, Semantic Chunking
- Infrastructure: NVIDIA DeepStream, TLT/TAO, GStreamer, CUDA, NVIDIA Jetson, GCP, AWS, Cloud-native Deployment
Scholastic Achievements
- 2019 Industry Academia Meet-2019, IIITNR (Secured Runner UP position)
- 2015 Medhavi Chhatra Samman (Secured 8th Rank in State in Intermediate)
In my spare time, I love to:
- Play online multiplayer mobile games 🎮 like
Call Of Duty💣 andClash Of Clans💎 etc. - Travel to new places and connect with new people. 🧳
- Explore food and music and solo bike riding. 🎧 + 😋 + 🏍️
- I love to write what I learned, So I am making the habit of writing blogs.