Brian McGuckin
Data Scientist


Projects

Ethereum Price Forecasting with Machine Learning

An Application of Time Series Regression Models and Neural Networks

exog_results

Notebook, Slides

Topics: Time Series, Forecasting, Cryptocurrency, ARIMA, LSTM, RNN, Structural Breaks, Stationarity, Exogenous drivers, Granger Causality

Toolkit: Python, Jupyter, Numpy, Pandas, Matplotlib, Seaborn, SciPy, Ruptures, FBProphet, Sci-kit Learn, Statsmodels, Tensorflow, Keras, Hyperopt, Hyperas

Predicting Residential House Prices

Regularized Linear Regression & Tree Based Ensemble Modeling with Ordinal Variables

results table

Notebook, Slides

Topics: data preprocessing, visualization, feature engineering, machine learning, regression

Toolkit: Python, Jupyter, NumPy, Pandas, Matplotlib, Seaborn, SciPy, SKLearn, XGBoost

Reuters-21578 Text Classification

NLP using Unsupervised Learning Methods for Article Classification NLP focused project tasked with utilizing unsupervised learning methods to classify topics for articles in the Reuters-21578 Dataset. Articles loaded, cleaned, classes inspected. Created featuresets and vectorized text using tf-idf. Clustering algorithms (k-means, spectral, mean-shift, affinity propagation) categorized article topics with two forms of dimension reduction (LSA & UMAP). Evaluated using ground truth clusters and ARI. Then used supervised classification algorithms (logistic regression, xgboost, KNN, random forest) and evaluated on cross-validated accuracy score.

nn_clusters

xgb results lr results

Notebook

Topics: text cleaning, tokenization, vectorization, dimensionality reduction, machine learning, clustering, classification

Toolkit: Python, NumPy, Pandas, Matplotlib, Seaborn, NLTK, SciPy, SKLearn, XGBoost, RegEx, UMAP