Web Scraping & Image Classification of Maritime Vessels in Open Seas

Utilizing open-source information to derive insights into commercial maritime activities. The objective of this project is to create a web scraping and machine learning application to automate the process of obtaining and identifying images of ships from social media platforms by geolocation.

Model Type

Neural Network

Software/Tools/Libraries
Python, Pandas, NumPy Tweepy, PyTorch, AWS SageMaker, SQL, Twitter API, Instagram API
Authors
Tivon Johnson, Claudio Escudero, Himanvesh Maddina, Zeeshan Raza, James Worrall, Yiiang Xu

Overview

The goal of this project is to build a data mining pipeline and uses computer vision to automate the process of searching for open-source photos of commercial maritime vessels in an area of interest. The model applies web scraping and machine learning to obtain images from social media and detect and identify ships within them by geolocation. The application is able to pull posts from Twitter and other open data sources, apply computer vision models to search for Tweets that contain images of ships, and present the data in a ranked view based on confidence score.

Project Goals
  • Build a data mining pipeline that allows an analyst to search for open-source photos in an area
    of interest
  • Apply computer vision to detect ships within the photos
  • Apply advanced analytics to determine which of the photos is most likely to be the ship the
    analyst is looking for
Requirements
  • Web scraping required to scrape data from Twitter/Instagram
  • Building a machine learning model based on object detection and hashtags
  • Utilization of Optimal Character Recognition software
  • Pre-trained models to categorize and label images