Projects

1. Loan Default Risk Model - Heritage Credit Union

Developed a cost-sensitive binary classification model to assess loan default risk using historical borrower data. Conducted exploratory data analysis and feature engineering to identify key risk drivers, then trained and compared multiple models including logistic regression, decision trees, random forest, and gradient boosting using cross-validation. Optimized the classification threshold based on business cost trade-offs (FN vs. FP) to minimize total financial loss and align with underwriting capacity. Final model supports risk-based loan review decisions and is projected to reduce annual default losses by approximately $450K.

Tools: Python, Pandas, scikit-learn, Logistic Regression, Decision Trees, Random Forest, Gradient Boosting, Cross-Validation

2. Demo AI Image Authenticity Detector

Developed a demo interactive AI powered web application that allows users to upload images and evaluate the likelihood that the content was AI generated or digitally manipulated. Designed a user-friendly interface with image processing and analysis output. Focused on usability and practical application of AI concepts to address real world concerns around deepfakes and falsified media.

Tools: HTML, CSS, JavaScript, AI Concepts, Image Analysis

3. PrecisionParts Manufacturing Big Data Pipeline

Designed and implemented an end-to-end big data pipeline for a supply chain manufacturing analytics platform using Hadoop HDFS, PySpark, Kafka, Spark Structured Streaming, and Apache Airflow. Built a scalable multi-layer architecture to support batch processing, real-time machine telemetry streaming, workflow orchestration, and automated data quality monitoring for PrecisionParts Manufacturing. Analyzed production defects, inventory risk, supplier performance, and equipment sensor anomalies to identify operational inefficiencies, supplier quality correlations, and predictive maintenance opportunities, delivering actionable insights to improve manufacturing reliability and supply chain performance. The project was developed collaboratively in a team environment and deployed using Docker-based infrastructure orchestration.

Tools: Hadoop HDFS, PySpark, Apache Kafka, Spark Structured Streaming, Apache Airflow, Docker, Python, Big Data Analytics, Real-Time Data Processing

4. Student Debt & Career Outcomes Research

Developed a Tableau-based research project analyzing how student debt impacts career outcomes and return on education in the United States. Cleaned and integrated datasets from FRED, College Scorecard, BLS, and IPEDS using Tableau Prep Builder, then built interactive visualizations to examine trends in student loan growth, the relationship between debt and earnings after graduation, and wage-to-debt ratios across majors. The project also analyzed how education level affects unemployment, compared outcomes across public, private nonprofit, and for-profit institutions, evaluated returns of popular vs. less common majors, and identified the top and bottom fields by ROI. Findings show that while higher education improves earnings and employment stability, financial outcomes vary significantly by major, institution type, and debt levels.

Tools: Tableau, Tableau Prep Builder, Data Cleaning, Data Visualization, Data Analysis

5. Distributed Data Pipeline & Scalable Data Architecture (ETL, Sharding & Cassandra)

Designed and implemented a scalable data pipeline integrating multiple operational databases into a centralized data warehouse. Built and deployed an automated ETL process using Docker to extract, transform, and load data from distributed PostgreSQL systems, including sharded sales databases. Enhanced the pipeline to merge data across shards and ensure consistency across sources. Extended the architecture with Cassandra to support high-performance, query-specific data access patterns using denormalized models. Evaluated distributed SQL solutions such as Citus to improve scalability, real-time analytics, and overall system performance.

Tools: SQL, PostgreSQL, Cassandra, Docker, ETL Pipelines, Python, Data Warehousing, Distributed Systems

6. Optimization of Retail Performance – Business Analytics & Visualization

Analyzed a large retail transaction dataset to uncover sales trends, customer behavior, and profitability drivers across products, regions, and time periods. Built interactive Tableau dashboards to visualize revenue performance, geographic distribution, customer segmentation, and product hierarchies. Conducted diagnostic and predictive analytics, including sales forecasting and what if scenario analysis to evaluate the potential impact of improved customer ratings. Transformed analytical findings into actionable business insights to support inventory planning, marketing strategy, and revenue growth decisions.

Tools: Tableau, Business Analytics, Data Visualization, Forecasting, What-If Analysis

7. Vehicle Pricing Analysis Across Regions & Engine Types (R)

Performed a statistical analysis on used vehicle pricing data to evaluate how prices vary across geographic regions and engine types. Conducted rigorous data preprocessing, random sampling, and factor conversion before applying inferential statistical techniques. Used variance testing, one and two way ANOVA, and post hoc Tukey comparisons to identify statistically significant price differences while controlling for multiple categorical factors. Interpreted results in a market and business context.

Tools: R, RStudio, Statistical Analysis, ANOVA, Hypothesis Testing, Data Sampling

8. Business Performance – Data Analysis & Modeling

Analyzed multidimensional business performance data to evaluate sales, profitability, market share, and employee productivity across products and regions. Performed revenue and margin analysis, target vs actual comparisons, and regional performance benchmarking. Built dynamic Excel models to support pricing optimization, profitability analysis, and scenario based decision making, turning quantitative results into business insights and even recommendations.

Tools: Microsoft Excel, Data Modeling, Financial Analysis, Scenario Analysis

9. Loan Default Prediction – Data Analysis & Modeling

Conducted a thorough analysis to predict loan default risk using historical borrower data. Performed exploratory data analysis to discover key patterns and potential risk drivers, as well as customer segmentation using k-Means clustering. Developed and evaluated predictive models using decision trees and neural networks then selected the best performing model.

Tools: Python, Pandas, scikit-learn, k-Means, Decision Trees, Neural Networks

10. Database Design and SQL Implementation

Designed and implemented a normalized relational database for a fictional makeup store business to support customer, product, and order management. Developed a complete database schema in third normal form (3NF) and created an Entity-Relationship Diagram (ERD) to clearly define entities, attributes, keys, and relationships. Implemented and tested complex SQL queries to perform full CRUD operations and advanced analytical queries, including customer behavior analysis, sales performance reporting, ranking, and cumulative metrics to help make better decisions.

Tools: SQL, Database Design, ERD Modeling, Normalization, Microsoft SQL Server

11. AMD Stock Pitch (May 2026)

Conducted an in-depth equity research and valuation analysis on Advanced Micro Devices (AMD), evaluating the company’s growth in AI, data centers, and high-performance computing through financial modeling, DCF valuation, Monte Carlo simulations, and technical analysis. Analyzed AMD’s competitive positioning against NVIDIA and Intel, assessed revenue growth, profitability, risk factors, and market sentiment indicators, and developed a professional investment thesis with a Hold/Trim recommendation based on valuation sensitivity, execution risk, and future AI infrastructure demand trends.

Tools: Financial Analysis, Statistical Analysis, & Technical Analysis