Hi, I'm Ronak 👋
I'm into |
I build scalable data pipelines, real-time analytics systems, and AI-infused solutions.
RM

About

I recently graduated from Northeastern University, where I focused on building data-driven systems at scale and applying analytics and GenAI to solve real business problems.

At Accenture and Metco Scientific, I led the development of cloud-based ETL pipelines, automated complex data workflows, and delivered interactive dashboards that improved decision-making across QA, compliance, and operations.

My work blends Data Engineering, Analytics, and AI - from real-time pipelines in Databricks to modular modeling with dbt and Snowflake, and now into production-ready GenAI and retrieval-augmented generation (RAG) use cases.

Outside of work, I enjoy building with open-source LLM stacks, exploring tools like LangChain and Autogen, and prototyping AI-infused data workflows.

Skills

Python
SQL
PySpark
Apache Spark
Databricks
Airflow
dbt
AWS
GCP
Azure
Snowflake
Redshift
Synapse
Delta Lake
Parquet
LangChain
RAG
HuggingFace
scikit-learn
MySQL
PostgreSQL
MongoDB
Power BI
Tableau
QuickSight
My Projects

Check out my latest work

Streaming, predicting, automating - here’s a glimpse at what I’ve been building lately.

Projects

Data_Projection

Data_Projection

Built an event-driven data pipeline using AWS to capture changes in DynamoDB and stream them in real-time using EventBridge Pipes, Kinesis, and Firehose. Data was enriched and stored in S3, then cataloged with Glue for downstream analytics via Athena.

DynamoDB
EventBridge Pipes
Kinesis Streams
Lambda
Firehose
S3
Glue
Athena
Event-Driven Dataflow

Event-Driven Dataflow

Built a real-time event-driven data pipeline using SQS, EventBridge Pipes, and AWS Lambda to ingest, enrich, and transform Airbnb booking stream data. Final data is written to S3 and made queryable through CSV exports.

SQS
EventBridge Pipes
Lambda
Python
S3
CI/CD
CodeBuild
Financial Document Summarization with RAG

Financial Document Summarization with RAG

Built a RAG-based chatbot that summarizes financial documents (10-K, 10-Q) by combining retrieval techniques with LLMs. Integrated multiple models (GPT-3.5, LLaMA 2, Gemma 1.1) and evaluated outputs on metrics like faithfulness, context recall, and answer relevancy. Designed prompt templates, UI components, and comparison experiments to reduce hallucinations and improve factual consistency.

Python
LangChain
LLMs (GPT-3.5, LLaMA 2, Gemma 1.1)
RAG
Streamlit
ChromaDB
Prompt Engineering
PDF Parsing
Churn Prediction

Churn Prediction

Developed a machine learning pipeline to predict customer churn using Random Forest and XGBoost models. Cleaned and engineered features from historical data, evaluated performance using AUC and F1 scores, and built an interactive dashboard to visualize churn risk across segments.

Python
pandas
scikit-learn
XGBoost
Random Forest
Power BI
EDA
Doordash Lambda Pipeline

Doordash Lambda Pipeline

Created a serverless data pipeline triggered by file uploads in S3. A Lambda function filters delivery events and writes clean JSON to a target S3 bucket. SNS sends notifications on success, and the entire pipeline is deployed via CI/CD using CodeBuild.

AWS Lambda
S3
SNS
CI/CD
CodeBuild
Python
pandas
EdTech DataFlow

EdTech DataFlow

A modern ELT pipeline built for an EdTech platform, showcasing transformation and orchestration best practices using Snowflake, dbt Cloud, dbt Core, and Dagster. The project includes modular, documented data modeling, automated workflows, and lineage tracking—supporting CI/CD and both managed and open-source orchestration strategies.

Snowflake
dbt Cloud
dbt Core
Dagster
Python
SQL
CI/CD
Cron Scheduling
Delhivery Analytics Dashboard

Delhivery Analytics Dashboard

Developed a real-time analytics dashboard for Delhivery's supply chain operations using Snowflake and Tableau. Migrated raw CSVs into a cloud data warehouse, modeled the data using a star schema, and built a UI that visually mirrors Delhivery's official website. Dashboard includes revenue, order flow, defect rates, and shipping cost analytics - fully filterable and exportable.

Snowflake
Tableau
Microsoft Excel
Star Schema
Sales Analytics & Strategic Planning

Sales Analytics & Strategic Planning

Designed an end-to-end sales analytics solution using Power BI and Microsoft SQL Server based on a fictional enterprise case. Migrated from static Excel reports to dynamic dashboards built with DirectQuery, enabling real-time tracking of sales KPIs, product performance, and customer insights. The project also modeled a 2021 sales budget and aligned all outputs to stakeholder user stories.

Microsoft SQL Server
T-SQL
Power BI
DirectQuery
Excel/CSV
Data Modeling
Star Schema
GAIA Model Evaluation Tool

GAIA Model Evaluation Tool

Built a Streamlit-based evaluation tool to benchmark OpenAI models using the GAIA dataset. The app enables users to select test cases, query models, compare results with ground-truth answers, collect feedback, and re-evaluate modified steps. Features include persistent feedback storage, interactive charts for visualizing outcomes, and secure API key handling via environment variables.

Streamlit
OpenAI API
Python
SQLite
Raw to Ready: dbt + Snowflake Modeling

Raw to Ready: dbt + Snowflake Modeling

Designed a modular dbt pipeline in Snowflake to transform raw e-commerce data into analytics-ready data marts. Implemented a layered architecture (Raw → Staging → Marts), with separate Snowflake databases for each layer. Used source blocks, CTEs, and dbt configs to build scalable models and document the entire data flow using DAGs and lineage graphs.

dbt
Snowflake
SQL
GitHub
Amazon Sales Analytics

Amazon Sales Analytics

Built an interactive sales and inventory analytics dashboard in Power BI using Amazon sales data. Applied Power Query for data cleaning, DAX for KPI logic, and designed multi-page dashboards covering orders, products, returns, and regional sales. Includes dynamic filters, custom tooltips, and drill-downs to support data-driven decision-making.

Power BI
Power Query
DAX
Data Modeling
Dashboard Design
Data_Projection

Data_Projection

Built an event-driven data pipeline using AWS to capture changes in DynamoDB and stream them in real-time using EventBridge Pipes, Kinesis, and Firehose. Data was enriched and stored in S3, then cataloged with Glue for downstream analytics via Athena.

DynamoDB
EventBridge Pipes
Kinesis Streams
Lambda
Firehose
S3
Glue
Athena
Event-Driven Dataflow

Event-Driven Dataflow

Built a real-time event-driven data pipeline using SQS, EventBridge Pipes, and AWS Lambda to ingest, enrich, and transform Airbnb booking stream data. Final data is written to S3 and made queryable through CSV exports.

SQS
EventBridge Pipes
Lambda
Python
S3
CI/CD
CodeBuild
Financial Document Summarization with RAG

Financial Document Summarization with RAG

Built a RAG-based chatbot that summarizes financial documents (10-K, 10-Q) by combining retrieval techniques with LLMs. Integrated multiple models (GPT-3.5, LLaMA 2, Gemma 1.1) and evaluated outputs on metrics like faithfulness, context recall, and answer relevancy. Designed prompt templates, UI components, and comparison experiments to reduce hallucinations and improve factual consistency.

Python
LangChain
LLMs (GPT-3.5, LLaMA 2, Gemma 1.1)
RAG
Streamlit
ChromaDB
Prompt Engineering
PDF Parsing
Churn Prediction

Churn Prediction

Developed a machine learning pipeline to predict customer churn using Random Forest and XGBoost models. Cleaned and engineered features from historical data, evaluated performance using AUC and F1 scores, and built an interactive dashboard to visualize churn risk across segments.

Python
pandas
scikit-learn
XGBoost
Random Forest
Power BI
EDA

Get in Touch

Whether you have a question or just want to say hi, feel free to drop a message below.