top of page
Abstract Glass Building

ANJANA TIHA

Research Software Engineer,  University of Pennsylvania, USA.
Specialized in Machine Learning, Deep Learning, Natural Language Processing, Computer Vision, and Distributed Big Data Analytics.

Home: Welcome
Molecules Bio

RESEARCH

Machine Learning, Deep Learning, Natural Language Processing, Computer Vision, Distributed Big Data Analytics, Information Retrieval, Recommendation Engine

Home: Research

GENERATIVE OPEN DOMAIN CHATBOT APPLICATION WITH DEEP LEARNING

Algorithm and Techniques: Machine Learning, Deep Learning, Recurrent Neural Network (RNN), Long Short Term Memory(LSTM), Bidirectional LSTM, Sequence to Sequence (Seq2Seq), Beam Search, Neural Attention Mechanism 


Language: Python

Technology: TensorFlow, PyQT

Tools: Anaconda, Linux

Date: January 2018 - May 2018

Description:

- Developed generative model based open domain conversational agent (Human vs AI) using state of the art architecture, Sequence-to-Sequence (Seq2Seq) and attained validation perplexity 46.82 and Bleu 10.6.

- Trained encoder-decoder based Seq2Seq model fully from scratch and further optimized the Recurrent Neural Network based model with Bidirectional LSTM cells, Neural Attention Mechanism and Beam Search.

- Used Cornell Movie Subtitle Corpus following data preprocessing as data, PyQT for chat interface (GUI) development and untrained Google’s Neural Machine Translation (NMT) model for Seq2Seq module.

DISTRIBUTED MACHINE LEARNING FOR BIOMARKERS DETECTION FROM WEARABLE SENSOR BIG DATA

Algorithm & Techniques: Machine Learning, Distributed Machine Learning, Classification, Supervised Learning, Mobile Health, Big Data Analytics

Language: Python

Technology: Apache Spark, scikit-learn, Git, GitHub

Tools: IntelliJ Idea, Linux

Date: January 2017 - April  2017

Description:

- Developed Machine Learning (ML) module for training ML models on multiple clusters with Apache Spark.

- Developed Grid & Random Grid Search CV module for training time and parameter search optimization.

- Detected biomarkers (psychological stress) from big stream data (accelerometer, ECG, respiration rate) from multi-modal wearable sensors with prediction accuracy (F-1 Score) of 87% with SVM radial kernel initially.

SURVEY ON MACHINE LEARNING BASED PHYSICAL ACTIVITY RECOGNITION METHODS FROM SENSOR DATA

Date: December 2016 – February 2017

- Conducted research on machine learning based algorithms for physical activity recognition (eg. walking, running, eating and drinking) from multimodal wearable sensor data.

Source code

DISTRIBUTED BIG DATA APPLICATION FOR LARGE SCALE US STOCK MARKET DATA ANALYSIS

Algorithm & Techniques: Financial Analysis, Stock Market Analysis, Anomaly Detection, Distributed Big Data Analytics, Big Data Analytics, Big Data, Data Analytics, FinTech

Language: Java, Python 

Technology: Apache Spark, Maven, Git

Tools: IntelliJ Idea, Linux

Date: May 2017 - August 2017

Description:

- Developed framework for processing and analysis of 7 years of historical US stock market data (50TB) of nanosecond granularity from 13 US exchanges on multiple clusters with Apache Spark.

- Added support for information extraction from binary files based on field spec for multiple year, file formats.

- Conducted multi market analysis (for market dominance detection), anomaly detection (for Flash crash day).

- Proposed using unsupervised learning/clustering on large-scale unlabeled stock market data for anomaly detection and general market analysis in absence of labels.

ECONOMIC MODEL DEVELOPMENT FOR COVID-19 PANDEMIC WITH MACHINE LEARNING

Algorithm & Techniques: Machine Learning, Data Analytics, Economics

Language: Python 

Technology: 

Tools: Anaconda

Date: July 2020 - September 2020

Description:

- Conducted analysis for developing economic model around COVID-19 pandemic with country level economic data of 20 years and applied machine learning algorithms.

Source Code

Would you like to learn more about my research projects?

Bubbles

PROJECTS

Machine Learning, Deep Learning, Natural Language Processing, Computer Vision, Distributed Big Data Analytics, Information Retrieval, Recommendation Engine

Home: Projects

MOVIE REVENUE & RATING PREDICTION FROM IMDB MOVIE DATA

Algorithm & Techniques: Machine Learning, Supervised Learning, Regression Analysis


Language: Python

Technology: scikit-learn

Tools: Anaconda

Date: October 2016 - December 2016

Description:

- Developed regression model for predicting revenue and ratings with 5,000 movies and attained regression error (Mean Squared Error) 0.0005 on scale of 1 for revenue after 5-fold cross-validation.

- Conducted preprocessing, feature extraction (28 numerical, textual and categorical feature).

- Performed data analysis, visualization, feature extraction, cleaning (missing value, anomaly), preprocessing (rescaling, normalization, feature transformation (one hot encoding)) and trained with cross-validation.​​

WEB RETRIEVAL & SEARCH ENGINE IMPLEMENTATION FOR UNIVERSITY WEB DOMAIN

Algorithm & Techniques: Search Engine, Search Relevance, Information Retrieval, Vector Space Model, Cosine Similarity


Language: Python

Technology: Django

Tools: Anaconda

Date: August 2017 - December 2017

Description:

- Developed vector space model based end-to-end web retrieval engine for University of Memphis and evaluated performance with 10, 000 web pages and docs (text, pdf, docx and pptx) from university domain.
- Used TF-IDF vector space model and cosine similarity function for web page matching and ranking.
- Developed modules - web crawler (with memory), text preprocessor (preprocess, tokenize, stem from raw HTML/docs), page indexer, page relevance ranker and performance evaluator (F1, precision, recall).

MOVIE RECOMMENDATION ENGINE USING USER BASED COLLABORATIVE FILTERING

Algorithm & Techniques: Recommendation Engine, Recommendation Systems, Collaborative Filtering 


Language: C++, Python

Technology: NA

Tools: Sublime Text

Date: February 2017 - April 2017

Description:

- Developed user-based movie recommender system by implementing user-user collaborative filtering with runtime and space complexity optimization and separate implementation in both C++ and Python.
- Used Netflix movie dataset with 100K user records as dataset.​

RESTAURANT RECOMMENDATION SYSTEM USING RELATIONAL DATABASE

Algorithm & Techniques: Recommendation System, Relational Database

Language: Python

Technology: MySQL, Django

Tools: Anaconda


Date: October 2017 - December 2017

Description:

- Implemented restaurant recommendation system based on user (eg. location, cuisine preference) and restaurant (location, cuisine, ratings, reviews) info.
- Included features to derive review effectiveness and user trustworthiness from available data.

TOXIC COMMENT IDENTIFICATION / CLASSIFICATION

Algorithm & Techniques: Machine Learning, Supervised Learning, Classification, Natural Language Processing, Text Classification, Text Analysis

Language: Python

Technology: scikit-learn, NLTK

Tools: Anaconda

Date: August 2018 - September 2018

Description:

- Classify around 130, 000 text comments of size 34MB on categories - "Toxic", "Severe Toxic", "Obscene", "Threat", "Insult", "Identity Hate", "Any of the Above", "None of the Above".
- Used features fro AAAI 2018 paper "Anatomy of Online Hate: Developing a Taxonomy and Machine Learning Models for Identifying and Classifying Hate in Online News Media" by "Salminen, Almerekhi". 
- Built pipelines for machine learning model training for reading file, creating training testing dataset, preprocessing, extracting features, and training and evaluation in grid search approach for multiple models.
- Generated visualization and aggregated report on the performance of various models.

REGRESSION MODELING FOR HOUSING PRICE PREDICTION

Algorithm & Techniques: Machine Learning, Supervised Learning, Regression


Language: Python

Technology: scikit-learn, NLTK

Tools: Anaconda

Date: August 2018 - September 2018

Description:

- Built regression model for predicting housing price using 79 numerical and categorical features with regression error (Mean Squared Error) of 0.000685 on a scale of 1.

- Built pipelines for machine learning (regression) model training with preprocessing (normalization, label encoding of categorical features), features extraction, training and evaluation in grid search approach for multiple regression models with visualization and aggregated report on the performance.​

IMAGE RECOGNITION USING DEEP CONVOLUTIONAL NEURAL NETWORK

Algorithm & Techniques: Image Classification, Deep Learning, Convolutional Neural Network (CNN), Transfer Learning.

Language: Python

Technology: Keras, TensorFlow

Tools: Anaconda

Date: September 2018 - December 2018

Description:

- Developed image classification tools using Deep Convolutional Neural Network built from scratch with Keras and pretrained model “InceptionV3” separately for fine-tuning with new class labels.

- Trained on multiple datasets - Flower dataset (testing accuracy - 85.68%, 5 species, 4.5K images, 228 MB), 10 Monkey species (validation accuracy – 97.06%, 553MB), Dog Breed dataset (Testing accuracy - 76.41%, 120 class, 10.2K images, 344MB).

SERVER-CLIENT CHAT APPLICATION 
TECHNOLOGY

Algorithm & Techniques: Machine Learning, Supervised Learning, Classification, Natural Language Processing, Text Classification, Text Analysis

Language: Java, Android

Technology: TCP/ IP

Tools: Android Studio

Date: August 2015

Description:

- Developed TCP/IP based chat server and client application where multiple clients can chat simultaneously.

- Built client application for both Android and desktop platform.

Source code

Subscribe Form

Stay up to date

Thanks for submitting!

Home: Subscribe
bottom of page