My Projects Portfolio

Explore my data science and machine learning projects showcasing skills in data analysis, visualization, predictive modeling, and AI-driven solutions.

Movie Profitability Analysis

Movie Profitability Analysis

Analyzed 3,000+ movies (1980-2022) to identify profitability factors, created interactive Power BI dashboard for visualizing trends by genre, studio, and budget.

Python SQL Power BI

Project Details

This comprehensive analysis of the film industry examined over 3,000 movies spanning four decades to identify key factors that predict box office success. The project involved cleaning and integrating data from multiple sources using Python (Pandas) and SQL, creating an interactive dashboard that helps stakeholders visualize profitability trends across different dimensions.

Technologies Used: Python, Pandas, SQL, Power BI, Matplotlib, Seaborn
Credit Card Default Prediction

Credit Card Default Prediction

Built a machine learning pipeline for predicting credit card defaults with 77% ROC-AUC score, created visualization dashboard highlighting customer risk segments.

Python Scikit-learn EDA

Project Details

This project developed a robust machine learning pipeline to predict credit card defaults using the UCI dataset. Through extensive data preprocessing and feature engineering, the model achieved a 77% ROC-AUC score, significantly improving credit risk assessment capabilities. The dashboard created enables financial institutions to visualize customer risk segments and implement targeted intervention strategies.

Technologies Used: Python, Scikit-learn, Pandas, Matplotlib, XGBoost, Tableau
Hospital Readmission Prediction

Hospital Readmission Prediction

Developed ML models to predict 30-day hospital readmissions for diabetic patients, demonstrated potential cost savings of $450,000 per 1,000 patients.

Python Machine Learning Healthcare

Project Details

This healthcare analytics project integrated data from multiple sources to create a unified dataset of over 100,000 patient records. Using advanced statistical analysis and machine learning techniques, the project identified significant predictors of hospital readmission and developed models to predict 30-day readmissions for diabetic patients. The interactive dashboard created enables hospital administrators to identify high-risk patients and implement targeted interventions.

Technologies Used: Python, Scikit-learn, SQL, RandomForest, Tableau, Pandas
Supermarket Sales Analyzer

Supermarket Sales Analyzer

Analyzed 50,000+ transactions to identify sales patterns and product correlations, created Tableau dashboards with business recommendations to increase sales by 23%.

Tableau SQL Python

Project Details

This comprehensive retail analytics project examined over 50,000 supermarket transactions to identify sales patterns, seasonal trends, and product correlations. Using advanced machine learning and pattern recognition algorithms, the project derived complex insights and created interactive Tableau dashboards showcasing key performance metrics. The actionable business recommendations developed could increase overall sales by 23% through targeted merchandising strategies.

Technologies Used: Tableau, SQL, Python, Pandas, Scikit-learn, Market Basket Analysis
E-commerce Customer Behavior Analysis

E-commerce Customer Behavior Analysis

Analyzed 100,000+ transactions across 32,000+ products, identified 4 high-value customer segments accounting for 70% of revenue.

Power BI Python SQL

Project Details

This e-commerce analytics project analyzed over 100,000 customer transactions across 32,000+ products to identify distinct customer segments and purchasing patterns. The interactive Power BI dashboard created revealed 4 high-value customer segments accounting for 70% of revenue. The targeted retention strategies developed have the potential to increase repeat purchase rate from 3.4% to 5.2%, resulting in $2.4M-$2.9M in additional annual revenue.

Technologies Used: Power BI, Python, SQL, Customer Segmentation, RFM Analysis, Cohort Analysis
Plant Disease Detection

Plant Disease Detection with CNNs

Developed a deep learning model using convolutional neural networks to detect and classify plant diseases from leaf images with 94% accuracy.

TensorFlow Computer Vision CNN

Project Details

This computer vision project developed a convolutional neural network (CNN) to automatically detect and classify plant diseases from leaf images. The model was trained on the PlantVillage dataset containing 54,000 images across 38 disease classes and achieved 94% accuracy on unseen test data. The deployment included a web application interface allowing farmers to upload images and receive instant disease diagnoses, potentially saving crops worth thousands of dollars through early intervention.

Technologies Used: TensorFlow, Keras, CNN, Transfer Learning, Flask, Image Augmentation
Pneumonia Detection from Chest X-rays

Pneumonia Detection from Chest X-rays

Developed MobileNetV2 network with 94% accuracy in pneumonia classification from X-ray images, optimized for resource and data constraints, and created an interactive application with explainability features.

TensorFlow Python Healthcare

Project Details

Developed a neural network achieving 94% accuracy in pneumonia classification from X-ray images. Implemented transfer learning with MobileNetV2 architecture optimized for resource constraints and 3x data augmentation to overcome limited training data. Engineered custom data augmentation techniques that improved model generalization by 11%. Created an interactive web application with model explainability features using Streamlit.

Technologies Used: Python, TensorFlow, CNN, MobileNetV2, Jupyter, Pandas, Matplotlib, Google Colab, Streamlit
Supply Chain Optimization for Small Businesses

Supply Chain Optimization for Small Businesses

An end-to-end data science solution that helps small businesses optimize their inventory management through demand forecasting, inventory modeling, and interactive visualizations—potentially reducing costs by up to 30% while improving service levels.

LSTM Python Time Series

Project Details

This project presents a comprehensive supply chain optimization system designed to help small businesses make data-driven inventory decisions. While large enterprises often have sophisticated inventory management systems, small businesses frequently rely on intuition and basic heuristics, leading to inefficiencies and unnecessary costs. This solution bridges that gap by providing advanced analytics capabilities without requiring specialized knowledge.

Technologies Used: Python, pandas, scikit-learn, statsmodels, Dash, plotly, Git
Customer Churn Prediction

Telecom Customer Churn Prediction

Built an ensemble machine learning model to predict customer churn with 91% accuracy, identifying key factors driving customer attrition.

XGBoost Feature Engineering Ensemble Learning

Project Details

This telecommunications analytics project leveraged customer data to build a predictive model for identifying customers at risk of churning. Through extensive feature engineering and ensemble learning techniques, the model achieved 91% accuracy in predicting customer churn. The SHAP analysis revealed key factors driving customer attrition, enabling targeted retention campaigns that could save the company an estimated $3.2M annually.

Technologies Used: Python, Scikit-learn, XGBoost, Random Forest, SHAP, Feature Engineering, Tableau
Sentiment Analysis

Social Media Sentiment Analysis Project for Product Feedback

Smartphone Sentiment Analyzer: An end-to-end data science project that collects Twitter data to analyze and visualize consumer sentiment toward competing smartphone brands with interactive dashboards and NLP-powered insights.

NLP API Topic Modeling

Project Details

This natural language processing project analyzed over 500,000 product reviews to classify sentiment and extract key product attributes mentioned by customers. Using BERT fine-tuning and topic modeling techniques, the system achieved 89% accuracy in sentiment classification and successfully identified trending product issues and strengths. The interactive dashboard created enables product managers to track sentiment trends and prioritize product improvements based on customer feedback.

Technologies Used: Python, NLTK, Twitter (X) API, LDA, TF-IDF, Streamlit
1 2 3