Projects
Ethereum Price Forecasting with Machine Learning
An Application of Time Series Regression Models and Neural Networks
- Ethereum price time series analysis, modeling and forecasting using ARIMA models & LSTM RNNs, evaluated on RMSE and executed in Python. Performed changepoint analysis, set window size to median regime length, and differenced for stationarity. Forecasting method is one-step-ahead out of sample using a rolling window, and was done using ETH series only and then with Granger-causal exogenous drivers.
Topics: Time Series, Forecasting, Cryptocurrency, ARIMA, LSTM, RNN, Structural Breaks, Stationarity, Exogenous drivers, Granger Causality
Toolkit: Python, Jupyter, Numpy, Pandas, Matplotlib, Seaborn, SciPy, Ruptures, FBProphet, Sci-kit Learn, Statsmodels, Tensorflow, Keras, Hyperopt, Hyperas
Predicting Residential House Prices
Regularized Linear Regression & Tree Based Ensemble Modeling with Ordinal Variables
- Residential house price prediction using the Ames, Iowa Housing Market Dataset with a focus on ordinal variable treatment. EDA, address outliers, missing values, feature engineering/variable transformation. Ordinal data was treated as (1) all categorical, (2) all continuous, (3) mix of categorical/continuous. Modeled using regularized linear regression (l1/l2/elastic net), random forests, and gradient boosted decision trees (xgboost algorithm). Results evaluated on RMSE (primary metric) and model runtime (secondary).
Topics: data preprocessing, visualization, feature engineering, machine learning, regression
Toolkit: Python, Jupyter, NumPy, Pandas, Matplotlib, Seaborn, SciPy, SKLearn, XGBoost
Reuters-21578 Text Classification
NLP using Unsupervised Learning Methods for Article Classification NLP focused project tasked with utilizing unsupervised learning methods to classify topics for articles in the Reuters-21578 Dataset. Articles loaded, cleaned, classes inspected. Created featuresets and vectorized text using tf-idf. Clustering algorithms (k-means, spectral, mean-shift, affinity propagation) categorized article topics with two forms of dimension reduction (LSA & UMAP). Evaluated using ground truth clusters and ARI. Then used supervised classification algorithms (logistic regression, xgboost, KNN, random forest) and evaluated on cross-validated accuracy score.
Topics: text cleaning, tokenization, vectorization, dimensionality reduction, machine learning, clustering, classification
Toolkit: Python, NumPy, Pandas, Matplotlib, Seaborn, NLTK, SciPy, SKLearn, XGBoost, RegEx, UMAP