Comprehensive portfolio of data science projects demonstrating expertise in machine learning, deep learning, and predictive analytics.
This project aimed to forecast asset prices 30 minutes into the future using historical market data through Long Short-Term Memory (LSTM) recurrent neural networks. Multivariate Vector Output and Encoder-Decoder LSTM architectures were explored, with the Encoder-Decoder model ultimately selected for its superior performance. The modeling pipeline included data preprocessing, normalization, sequence generation for time-series input, model construction, training, and evaluation. The final model demonstrated strong capability in capturing short-term temporal dependencies and produced reliable forecasts. This work contributes to real-time financial modeling and offers valuable insights for algorithmic trading and investment decision-making.
Built a machine learning model to classify restaurants as having good or bad ratings using features such as cuisine type, cost, location, and user feedback. The process included exploratory data analysis, data preprocessing, feature engineering, model building, and performance evaluation. Multiple algorithms were tested, and the most effective was selected based on accuracy and reliability. The project offered insights into factors influencing restaurant ratings and customer satisfaction.
Created a deep convolutional neural network model for classifying images of cats and dogs using transfer learning with a pre-trained MobileNetV2 backbone. Applied preprocessing techniques and fine-tuned the model to enhance generalization on unseen data. The project demonstrated strong model performance and offers valuable insights for future image recognition and computer vision applications.
Conducted an exploratory analysis of about 10,000 movies from The Movie Database (TMDb), covering a period from 1960 to 2015. The project involved data wrangling, cleaning, and exploration to uncover trends in revenue, runtime, genres, and voting behavior. Visualizations and descriptive statistics were used to answer targeted questions, offering insights into factors that may contribute to a film’s success. The analysis highlights key characteristics shared by high-performing movies and provides a data-driven look into the evolution of the film industry.
Performed an exploratory analysis on a dataset containing over 183,000 individual rides from a San Francisco Bay Area bike-sharing program in February 2019. The analysis included univariate, bivariate, and multivariate techniques to uncover patterns in ride duration, user types, and time-of-day usage. Key relationships between rider demographics, trip timing, and weekday trends were visualized and interpreted to provide insights into usage behaviors and operational efficiency for bike-sharing systems.
Wrangled and analyzed the tweet archive of the popular Twitter account @dog_rates (WeRateDogs) using a structured data pipeline. This included gathering data from multiple sources, assessing quality and structure, cleaning for consistency, merging datasets, storing in a tidy format, and performing exploratory analysis and visualizations. The analysis focused on identifying the most common dog breeds featured, as well as those associated with the highest average retweets and likes. Visualizations revealed engagement trends across breeds, offering insights into audience preferences and viral patterns on social media.