Hi! I am

Aakash Khatu

A tinkerer and programmer

Welcome to my portfolio! I am currently a student at Rochester Institute of Technology pursuing my Masters in Computer Science. I love to work on projects that make my life simpler. Feel free to browse through my collection of ideas and Hope you enjoy your stay!

About Me!

I'm Aakash Khatu, and I love working on challenges that involve making data processes faster and more efficient. I've had the chance to design data pipelines and work with cloud technologies like Azure and AWS, which helped me cut down ETL times and improve system performance. I enjoy digging into the details to find solutions that work, whether it's optimizing a machine learning model or building an end-to-end cloud solution. I'm currently pursuing my Master's in Computer Science at RIT, and Iâ€™m always excited to learn more and keep improving my skills.

Work Experience

Data Engineer @ Infosys Limited, Pune, India August 2021 â€“ October 2022

Designed and developed ETL pipelines using Azure Data Factory to extract data from SAP-ERP and CRM sources. Transformed the data with PySpark and Azure Databricks, applying complex business logic before loading it into Azure Data Warehouse. Integrated the processed data into Power BI for Business Intelligence (BI) reporting. Streamlined data pipelines, reducing ETL completion time from 17 hours to 5 hours, making the process over 3x faster. Collaborated within an Agile Team, consistently delivering sprint targets.

Software Developer @ Kamusi Project, USA (Remote) August 2020 â€“ July 2021

Developed scripts in Python and Cypher for cleaning and importing data from various universities into a Neo4j Graph Database, managing a database of over 7,000 interconnected languages for translation. Redesigned database schema and optimized search queries to execute under 15ms, achieving over 65 times faster performance. Ensured smooth operation of Linux servers and automated system snapshot backups and recovery using Bash Scripts. Transitioned the development environment from Vagrant to Docker Containers.

Deep Learning Research Intern @ AjnaLens, Mumbai, India June 2020 â€“ July 2020

Integrated TensorFlow models into Android apps using TensorFlow Lite for prototyping and benchmarking various deep learning image recognition models. Researched methods to enhance inference performance of machine learning models on embedded devices through Java and C++ optimizations, significantly reducing memory usage.

My Projects:

This section is a work in progress for now, I will be adding more of my projects soon!

SecureRAG: Self Hosted AI Agent for Secure Document Querying | LangChain, Python July 2024 - Ongoing

I developed a completely self-hosted LLM system using the Llama 3.1 8B model for querying a knowledge base of sensitive documents, utilizing Retrieval Augmented Generation (RAG). The system operates efficiently on a single GPU, offering accuracy comparable to OpenAIâ€™s GPT-3.5 Turbo. This project enables companies, development teams, and research groups to securely interact with confidential documents without relying on third-party cloud services, while building a scalable and customizable solution for long-term use.

AWS-Powered Data Pipeline: Analyzing the Steam Dataset | Python, Pandas, AWS Service July 2024 - Ongoing

I developed a complete ETL pipeline using AWS Glue to process a large Steam dataset, extracting key metrics and correlations. I utilized AWS S3 for securely storing large dataset files and DynamoDB to manage structured data, ensuring scalable and efficient data handling. I implemented AWS Lambda functions to query data from DynamoDB, returning dynamic data visualizations via API Gateway. This project helped me orchestrate various AWS services, creating an end-to-end cloud solution while uncovering insights into user buying habits through interactive visualizations with Plotly Express.

Grand Finalist in Smart India Hackathon | Python, NumPy, Pandas, Tensorflow, Tensorflow JS June 2020

I developed a Chrome extension using TensorFlow.js to measure student engagement in online lectures as a Grand Finalist in the Smart India Hackathon. The extension adhered to strict user privacy protocols, ensuring webcam data never left the userâ€™s system. I optimized the machine learning model for real-time inference at 60 frames per second, achieving smooth performance even on older hardware without significant system impact.