top of page
Bubbles

PROJECTS

Machine Learning, Deep Learning, Natural Language Processing, Computer Vision, Distributed Big Data Analytics, Information Retrieval, Recommendation Engine

Projects: Projects

COVID-19 (SARS-COV-2) PANDEMIC ANALYSIS AND MODELING [PYTHON]

Algorithm & Techniques: Data Analysis, Data Science

Language: Python

Technology: Keras

Tools: Anaconda


Date: January - June 2020

- Analyzed Covid-19 spread across geographical locations (region/country, state/province and county) on day level and developed visualizations for all countries with 9+ derived features.
- Strived to build prediction model for pandemic spread for countries initially, followed by state, county.

CANCER DETECTION FROM MICROSCOPIC TISSUE IMAGES WITH DEEP LEARNING

Algorithm & Techniques: Image Classification, Deep Learning, Convolutional Neural Network (CNN), Transfer Learning, Medical Imaging.

Language: Python

Technology: Keras

Tools: Anaconda


Date: November 2018 - April 2019

Description:

- Detected Cancer from microscopic tissue images (histopathologic) with Google’s “NASNetLarge” model and attained testing accuracy (F1 score) of 93.72% and loss 0.30 on 277K (6.5GB+) image cancer dataset.

- Fully trained model from scratch and experimented by adding multiple custom layers to final output.

PNEUMONIA DETECTION FROM CHEST X-RAY IMAGES WITH DEEP LEARNING

Algorithm & Techniques: Image Classification, Deep Learning, Convolutional Neural Network (CNN), Transfer Learning, Medical Imaging.

Language: Python

Technology: Keras

Tools: Anaconda

Date: September - December 2018

Description:

- Detected Pneumonia from around 6K Chest X-Ray images (1.15GB) by training custom deep convolutional neural network (CNN) fully from scratch, also by retraining pretrained model “InceptionV3” with fine-tuning.

- With custom deep CNN attained testing accuracy (F1 score) - 89.53%, recall - 95.48% and precision - 88.37% and with InceptionV3 testing accuracy (F1 score) - 83.44%, and loss - 0.42.

- For fine-tuning InceptionV3, freezed first few layers and trained last two inception layers.

MALARIA PARASITE DETECTION IN THIN BLOOD SMEAR IMAGES WITH PRETRAINED CONVOLUTIONAL NEURAL NETWORK (NASHNETMOBILE)

Algorithm & Techniques: Image Classification, Deep Learning, Convolutional Neural Network (CNN), Transfer Learning, Medical Imaging.

Language: Python

Technology: Keras

Tools: Anaconda


Date: February - May 2019

Description:

- Detected Malaria Parasites from thin Blood Smear images collected from Malaria screening research activity by National Institutes of Health (NIH) with Deep Learning (Convolutional Neural Network) specifically by retraining pretrained model NaNetMobile completely from scratch.
- Before feeding data into model, preprocessed and augmented image dataset containing 27,558 images (337MB) by adding random flips, rotations and shears.
- After loading pretrainied model NasNetMobile, added global max pooling, global average pooling, flattened layer to output of trained model and concatenated them. Also added dropout and batch normalization layers for regularization before adding final output layer - a dense layer with softmax activation and compiling with optimizer-Adam with learning rate-0.0001, metric-accuracy and loss-categorical crossentropy.
- Trained for 10 iterations and attained training accuracy 96.47% and loss(categorical crossentrpy) 0.1026 and validation accuracy of 95.46% and loss 0.1385.

ARTIST IDENTIFICATION FROM ARTWORKS WITH DEEP LEARNING

Algorithm & Techniques: Image Classification, Deep Learning, Convolutional Neural Network (CNN), Transfer Learning.

​​

Language: Python

Technology: Keras, PyTorch

Tools: Anaconda, Kaggle

Date: March - September 2019

Description:

- Detected Artists from their Artworks with Deep Learning (Convolutional Neural Network) specifically by retraining pretrained model "InceptionResNetV3" completely from scratch.
- Before feeding data into model, preprocessed and augmented image dataset containing 8,446 images (2GB) of 50 different Artists by adding random horizontal flips, rotations and width and height shifts.
- After loading pretrained model "InceptionResNetV3", added global average pooling 2D with and dense layer with 512 units followed by batch normalization, dropout layers for regularization and activation for only dense layer. Finally, added final output layer - a dense layer with softmax activation and compiled with optimizer-Adam with learning rate-0.0001, metric-accuracy and loss-categorical cross-entropy.
- Trained for 15 iterations and attained training accuracy 98.36% and loss (categorical cross-entropy) 0.0820 and validation accuracy of 78.75% and loss 0.9093.

MOVIE REVENUE & RATING PREDICTION FROM IMDB MOVIE DATA

Algorithm & Techniques: Machine Learning, Supervised Learning, Regression Analysis


Language: Python

Technology: scikit-learn

Tools: Anaconda

Date: October - December 2016

Description:

- Developed regression model for predicting revenue and ratings with 5,000 movies and attained regression error (Mean Squared Error) 0.0005 on scale of 1 for revenue after 5-fold cross-validation.

- Conducted preprocessing, feature extraction (28 numerical, textual and categorical feature).

- Performed data analysis, visualization, feature extraction, cleaning (missing value, anomaly), preprocessing (rescaling, normalization, feature transformation (one hot encoding)) and trained with cross-validation.​​

WEB RETRIEVAL & SEARCH ENGINE IMPLEMENTATION FOR UNIVERSITY WEB DOMAIN

Algorithm & Techniques: Search Engine, Search Relevance, Information Retrieval, Vector Space Model, Cosine Similarity


Language: Python

Technology: Django

Tools: Anaconda

Date: August - December 2017

Description:

- Developed vector space model based end-to-end web retrieval engine for University of Memphis and evaluated performance with 10, 000 web pages and docs (text, pdf, docx and pptx) from university domain.
- Used TF-IDF vector space model and cosine similarity function for web page matching and ranking.
- Developed modules - web crawler (with memory), text preprocessor (preprocess, tokenize, stem from raw HTML/docs), page indexer, page relevance ranker and performance evaluator (F1, precision, recall).

MOVIE RECOMMENDATION ENGINE USING USER BASED COLLABORATIVE FILTERING

Algorithm & Techniques: Recommendation Engine, Recommendation Systems, Collaborative Filtering 


Language: C++, Python

Technology: NA

Tools: Sublime Text

Date: February - April 2017

Description:

- Developed user-based movie recommender system by implementing user-user collaborative filtering with runtime and space complexity optimization and separate implementation in both C++ and Python.
- Used Netflix movie dataset with 100K user records as dataset.​

RESTAURANT RECOMMENDATION SYSTEM USING RELATIONAL DATABASE

Algorithm & Techniques: Recommendation System, Relational Database

Language: Python

Technology: MySQL, Django

Tools: Anaconda


Date: October - December 2017

Description:

- Implemented restaurant recommendation system based on user (eg. location, cuisine preference) and restaurant (location, cuisine, ratings, reviews) info.
- Included features to derive review effectiveness and user trustworthiness from available data.

TOXIC COMMENT IDENTIFICATION / CLASSIFICATION

Algorithm & Techniques: Machine Learning, Supervised Learning, Classification, Natural Language Processing, Text Classification, Text Analysis

Language: Python

Technology: scikit-learn, NLTK

Tools: Anaconda

Date: August - September 2018

Description:

- Classify around 130, 000 text comments of size 34MB on categories - "Toxic", "Severe Toxic", "Obscene", "Threat", "Insult", "Identity Hate", "Any of the Above", "None of the Above".
- Used features fro AAAI 2018 paper "Anatomy of Online Hate: Developing a Taxonomy and Machine Learning Models for Identifying and Classifying Hate in Online News Media" by "Salminen, Almerekhi". 
- Built pipelines for machine learning model training for reading file, creating training testing dataset, preprocessing, extracting features, and training and evaluation in grid search approach for multiple models.
- Generated visualization and aggregated report on the performance of various models.

REGRESSION MODELING FOR HOUSING PRICE PREDICTION

Algorithm & Techniques: Machine Learning, Supervised Learning, Regression


Language: Python

Technology: scikit-learn, NLTK

Tools: Anaconda

Date: August - September 2018

Description:

- Built regression model for predicting housing price using 79 numerical and categorical features with regression error (Mean Squared Error) of 0.000685 on a scale of 1.

- Built pipelines for machine learning (regression) model training with preprocessing (normalization, label encoding of categorical features), features extraction, training and evaluation in grid search approach for multiple regression models with visualization and aggregated report on the performance.​

RENT-A-BIKE WEB APPLICATION

Algorithm & Techniques: Software Development, Web Application Development, Agile Development, Share App, Ride Share App, Model-View-Controller (MVC)

Language: Ruby on Rails, JavaScript, HTML, CSS

Technology: MVC, Bootstrap, Git

Tools: Sublime Text, Virtual Box, GitHub

Date: January - April 2017

Description:

Developed as a team member, a web application for renting, sharing, selling and buying bikes along with chat and map feature. The project was aimed at university students.

- Used MVC Architecture and CRUD operations on Rails platform. 

Source code

IMAGE RECOGNITION USING DEEP CONVOLUTIONAL NEURAL NETWORK

Algorithm & Techniques: Image Classification, Deep Learning, Convolutional Neural Network (CNN), Transfer Learning.

Language: Python

Technology: Keras, TensorFlow

Tools: Anaconda

Date: September - December 2018

Description:

- Developed image classification tools using Deep Convolutional Neural Network built from scratch with Keras and pretrained model “InceptionV3” separately for fine-tuning with new class labels.

- Trained on multiple datasets - Flower dataset (testing accuracy - 85.68%, 5 species, 4.5K images, 228 MB), 10 Monkey species (validation accuracy – 97.06%, 553MB), Dog Breed dataset (Testing accuracy - 76.41%, 120 class, 10.2K images, 344MB).

SERVER-CLIENT CHAT APPLICATION 
TECHNOLOGY: JAVA, ANDROID, TCP/ IP

Algorithm & Techniques: Machine Learning, Supervised Learning, Classification, Natural Language Processing, Text Classification, Text Analysis

Language: Java, Android

Technology: TCP/ IP

Tools: Android Studio

Date: August 2015

Description:

- Developed TCP/IP based chat server and client application where multiple clients can chat simultaneously.

- Built client application for both Android and desktop platform.

Source code
bottom of page