Projects

Ethereum Price Forecasting with Machine Learning

An Application of Time Series Regression Models and Neural Networks

Ethereum price time series analysis, modeling and forecasting using ARIMA models & LSTM RNNs, evaluated on RMSE and executed in Python. Performed changepoint analysis, set window size to median regime length, and differenced for stationarity. Forecasting method is one-step-ahead out of sample using a rolling window, and was done using ETH series only and then with Granger-causal exogenous drivers.

exog_results

Topics: Time Series, Forecasting, Cryptocurrency, ARIMA, LSTM, RNN, Structural Breaks, Stationarity, Exogenous drivers, Granger Causality

Toolkit: Python, Jupyter, Numpy, Pandas, Matplotlib, Seaborn, SciPy, Ruptures, FBProphet, Sci-kit Learn, Statsmodels, Tensorflow, Keras, Hyperopt, Hyperas

Predicting Residential House Prices

Regularized Linear Regression & Tree Based Ensemble Modeling with Ordinal Variables

Residential house price prediction using the Ames, Iowa Housing Market Dataset with a focus on ordinal variable treatment. EDA, address outliers, missing values, feature engineering/variable transformation. Ordinal data was treated as (1) all categorical, (2) all continuous, (3) mix of categorical/continuous. Modeled using regularized linear regression (l1/l2/elastic net), random forests, and gradient boosted decision trees (xgboost algorithm). Results evaluated on RMSE (primary metric) and model runtime (secondary).

results table

Notebook, Slides

Topics: data preprocessing, visualization, feature engineering, machine learning, regression

Toolkit: Python, Jupyter, NumPy, Pandas, Matplotlib, Seaborn, SciPy, SKLearn, XGBoost

Reuters-21578 Text Classification

NLP using Unsupervised Learning Methods for Article Classification NLP focused project tasked with utilizing unsupervised learning methods to classify topics for articles in the Reuters-21578 Dataset. Articles loaded, cleaned, classes inspected. Created featuresets and vectorized text using tf-idf. Clustering algorithms (k-means, spectral, mean-shift, affinity propagation) categorized article topics with two forms of dimension reduction (LSA & UMAP). Evaluated using ground truth clusters and ARI. Then used supervised classification algorithms (logistic regression, xgboost, KNN, random forest) and evaluated on cross-validated accuracy score.

nn_clusters

xgb results lr results

Notebook

Topics: text cleaning, tokenization, vectorization, dimensionality reduction, machine learning, clustering, classification

Toolkit: Python, NumPy, Pandas, Matplotlib, Seaborn, NLTK, SciPy, SKLearn, XGBoost, RegEx, UMAP