Hey there! I’m Saurabh Kumar Singh 😊.

I am working as a Senior Consultant @ Deloitte, Bangalore. I have completed B.Tech in the dept. of Computer Science And Engineering @ IIIT Naya Raipur 🎓👨‍🎓.

  • Experienced AIML Application Developer with 5+ years of expertise in cutting-edge AI technologies, skilled in end-to-end development and deployment with strong hands-on experience across GenAI, Computer Vision, and ML infrastructure.
  • Specialized in developing end-to-end GenAI RAG engines and Document Processing pipelines for enterprise applications, incorporating Knowledge Graphs, Vector Databases, and NLP techniques with OpenAI/Gemini APIs and open-source LLMs, employing Prompt Engineering best practices.
  • Proficient in ML model optimization and production deployment — including LLM fine-tuning, Triton Inference Server optimization, model distillation/quantization, and scalable inference pipelines on cloud-native platforms.
  • Hands-on experience in developing Intelligent Video Analytics applications and building production-grade services solving complex computer vision and machine learning problems with NVIDIA’s technology stack.

Research Interests

  • Generative AI & LLM Application Development
  • ML Model Optimization & Inference Serving
  • Intelligent Video Analytics
  • Deep Learning and Computer Vision

Work Experience


  • Deloitte (Senior Consultant), Bangalore (Aug 2024 — Present)
    • Catalog Content Quality Scoring (Retail Client, Apr 2025 — Present): Developing new features and resolving production bugs for a Catalog Content Quality Scoring system — highly impactful for search capability enhancement. Owned end-to-end deployment of scoring services and implemented NLP models inference optimization using Triton Inference Server. Worked on Image Quality Scoring Service and Content Mismatch Scoring Service end-to-end, including model building, Triton deployment, and performance benchmarking. Built end-to-end backend pipelines and deployed on cloud-native platform.
    • Document Processing & RAG System (Tire Manufacturing Client, Sep 2024 — Mar 2025): Developed and refined FastAPI-based backend services for an advanced Document Processing and RAG system, delivering key services including UserService, FileUpload, S3 Monitoring/Ingestion, Chat, and Feedback. Integrated NLP techniques — keyword extraction, semantic chunking, and advanced evaluation metrics — to automate extraction, indexing, and summarization of diverse departmental documents. Stabilized system through UAT and accelerated production readiness.
    • Created two capability showcase POCs for a GenAI chatbot service handling unstructured documents.
  • JK Tech (Senior Consultant - Data Science), Bangalore (Jan 2023 — Jul 2024)
    • JIVA-EKE: Developing End-to-End GenAI RAG engines for enterprise search applications on structured and unstructured data, utilizing llama-index agents, tools, and retrievers, integrated with Knowledge Graphs and VectorDBs.
    • JEKA: Developed a Persona-based RAG platform leveraging Prompt engineering techniques along with OpenAI APIs and Open Source LLMs for context-aware responses.
    • JARVIS: An accelerated ML development platform based on ClearML, ensuring faster ROI for AIML projects.
    • Leading the LLM R&D Group for exploring latest GenAI activities.
  • Mavenir (R&D Engineer - IVA), Bangalore (Apr 2021 — Jan 2023)
    • IVA Platform: Played a pivotal role in developing the AI-backend for an IVA Platform built on NVIDIA’s technology stack, writing DeepStream Python applications to address object detection challenges and enhance video analytics.
    • Conducted model training with custom data using Transfer Learning (TAO) for DetectNetV2, YoloV4, SSD, and Darknet architectures.
  • Euclid Innovations (AI Software Developer), Hyderabad (Jan 2021 — Apr 2021)
    • Worked on end-to-end Dockerization of the project, researching cutting-edge technology by NVIDIA and Facebook, building POCs and working with DeepStream on high-end dGPU systems.
    • Achieved latency reduction by implementing H.264 encoding of inference frames from JPEG frames and optimizing the existing pipeline.
  • Smartcow.ai (IVA Engineer), Hyderabad (Jan 2020 — Oct 2020)
    • Collaborated within a team to develop multiple Intelligent Video Analytics applications for Jetson devices and dGPUs.
    • Demonstrated proficiency in GStreamer, glib, OpenCV, CUDA, TensorRT, and nvprof to optimize video processing and analysis.

Publications

  • Automatic Generation of Chest X-Ray Medical Imaging Reports using LSTM-CNN — International Conference on DSMLAI, Windhoek, Namibia (Aug 2022)
  • Affordable AI at the Edge on NVIDIA’s Jetson Ecosystem — Major Project Thesis, IIIT Naya Raipur (July 2020)

Technical Skills

  • Languages: Python, C/C++, Bash
  • ML & AI: PyTorch, TensorFlow, Transformers, BERT, LLMs, PEFT/qLoRA, Model Distillation/Quantization, ONNX/TensorRT, Computer Vision, Object Detection (YOLOv4/v8, DetectNetV2, SSD)
  • MLOps & Serving: Triton Inference Server, TorchServe, ClearML, MLflow, Docker, FastAPI, REST API
  • GenAI & RAG: Langchain, llama-index, VectorDBs, Knowledge GraphDBs, OpenAI/Gemini APIs, Prompt Engineering, Semantic Chunking
  • Infrastructure: NVIDIA DeepStream, TLT/TAO, GStreamer, CUDA, NVIDIA Jetson, GCP, AWS, Cloud-native Deployment

Scholastic Achievements

  • 2019 Industry Academia Meet-2019, IIITNR (Secured Runner UP position)
  • 2015 Medhavi Chhatra Samman (Secured 8th Rank in State in Intermediate)

In my spare time, I love to:

  • Play online multiplayer mobile games 🎮 like Call Of Duty 💣 and Clash Of Clans 💎 etc.
  • Travel to new places and connect with new people. 🧳
  • Explore food and music and solo bike riding. 🎧 + 😋 + 🏍️
  • I love to write what I learned, So I am making the habit of writing blogs.