Data Engineer @ Infosys Limited, Pune, India August 2021 – October 2022
Designed and developed ETL pipelines using Azure Data Factory to extract data from SAP-ERP and CRM sources. Transformed the data with PySpark and Azure Databricks, applying complex business logic before loading it into Azure Data Warehouse. Integrated the processed data into Power BI for Business Intelligence (BI) reporting. Streamlined data pipelines, reducing ETL completion time from 17 hours to 5 hours, making the process over 3x faster. Collaborated within an Agile Team, consistently delivering sprint targets.
Software Developer @ Kamusi Project, USA (Remote) August 2020 – July 2021
Developed scripts in Python and Cypher for cleaning and importing data from various universities into a Neo4j Graph Database, managing a database of over 7,000 interconnected languages for translation. Redesigned database schema and optimized search queries to execute under 15ms, achieving over 65 times faster performance. Ensured smooth operation of Linux servers and automated system snapshot backups and recovery using Bash Scripts. Transitioned the development environment from Vagrant to Docker Containers.
Deep Learning Research Intern @ AjnaLens, Mumbai, India June 2020 – July 2020
Integrated TensorFlow models into Android apps using TensorFlow Lite for prototyping and benchmarking various deep learning image recognition models. Researched methods to enhance inference performance of machine learning models on embedded devices through Java and C++ optimizations, significantly reducing memory usage.
SecureRAG: Self Hosted AI Agent for Secure Document Querying | LangChain, Python July 2024 - Ongoing
I developed a completely self-hosted LLM system using the Llama 3.1 8B model for querying a knowledge base of sensitive documents, utilizing Retrieval Augmented Generation (RAG). The system operates efficiently on a single GPU, offering accuracy comparable to OpenAI’s GPT-3.5 Turbo. This project enables companies, development teams, and research groups to securely interact with confidential documents without relying on third-party cloud services, while building a scalable and customizable solution for long-term use.
AWS-Powered Data Pipeline: Analyzing the Steam Dataset | Python, Pandas, AWS Service July 2024 - Ongoing
I developed a complete ETL pipeline using AWS Glue to process a large Steam dataset, extracting key metrics and correlations. I utilized AWS S3 for securely storing large dataset files and DynamoDB to manage structured data, ensuring scalable and efficient data handling. I implemented AWS Lambda functions to query data from DynamoDB, returning dynamic data visualizations via API Gateway. This project helped me orchestrate various AWS services, creating an end-to-end cloud solution while uncovering insights into user buying habits through interactive visualizations with Plotly Express.Grand Finalist in Smart India Hackathon | Python, NumPy, Pandas, Tensorflow, Tensorflow JS June 2020
I developed a Chrome extension using TensorFlow.js to measure student engagement in online lectures as a Grand Finalist in the Smart India Hackathon. The extension adhered to strict user privacy protocols, ensuring webcam data never left the user’s system. I optimized the machine learning model for real-time inference at 60 frames per second, achieving smooth performance even on older hardware without significant system impact.