Anime Recommendation System

Built a recommendation system for anime using user-based collaborative filtering. Performed some exploratory data analysis on the dataset and used cosine similarity to find similar users based on watch history and ratings. Included my own data and used the model to generate recommendations for myself.

Tools Used: SKLearn, Matplotlib, Seaborn, Numpy, Pandas

View Project

MLB Ticket Price Prediction

Scraped MLB season ticket prices for the San Francisco Giants and New York Mets. Used Machine Learning techniques such as Lasso Regression, Decision Trees and Random Forest to build a model for future price prediction and selected the best model based on performance with out-of-sample data.

Tools Used: Selenium, MongoDB, Stats, Statsmodels, SKLearn, Pandas

View Project

Classification of Chest X-Ray Images

Compared performance of a simple CNN versus a Transfer Learning model to classify chest X-Rays as images of COVID-19, viral pneumonia, lung opacity or normal patients. The InceptionResNetV2 model gave a high recall value of 88% for the COVID-19 patients and a high precision value of 84% for normal patients, maximizing usability in a real-world scenario.

Tools Used: Tensorflow/Keras, SKLearn, CV2, Matplotlib, Seaborn, Numpy, Pandas

View Project

Gender Prediction of Twitter Profiles

Used description of Twitter profiles to build a Machine Learning model to predict gender of the user. Performed EDA on the data to spot trends and built a function to clean text data before vectorizing it. Implemented Random Forest, Logistic Regression, Naive-Bayes and XGBoost for classification. Concluded from EDA and performance of models that twitter profiles have shifted away from gender stereotypes, allowing for more individual expression.

Tools Used: NLTK, SKLearn, WordCloud, Matplotlib, Seaborn, Numpy, Pandas

View Project

Classification of Hand-Written Characters

Built a Deep Learning model to classify images of hand-written letters and numbers from Extended-MNIST dataset. Created a Neural Network Class from scratch and used it to build a simple ANN that classified images of 0s and 1s from the dataset with an accuracy of 97%. Compared performance of ANN and CNN models built using Keras with Tensorflow and obtained best accuracy of 87% from the CNN model. Used this model to classify images of my own handwriting with a 70% accuracy.

Tools Used: Tensorflow/Keras, SKLearn, PIL, SciPy, Matplotlib, Seaborn, Numpy, Pandas

View Project

Prediction of Video Game Sales

Used Machine Learning regression models to predict global sales of video game titles based on historical data of user and critic ratings. Performed EDA on the dataset and used Principal Component Analysis to reduce dimensionality. Obtained best R-squared value of 0.65 from the Gradient Boosted Decision Tree model and found that the platform of the game, user count and critic score were the most important features.

Tools Used: SKLearn, Matplotlib, Seaborn, Numpy, Pandas

View Project