top of page
Molecules Bio

RESEARCH

Machine Learning, Deep Learning, Natural Language Processing, Computer Vision, Distributed Big Data Analytics, Information Retrieval, Recommendation Engine

Research: Research

GENERATIVE OPEN DOMAIN CHATBOT APPLICATION WITH DEEP LEARNING

Algorithm and Techniques: Machine Learning, Deep Learning, Recurrent Neural Network (RNN), LSTM, Bidirectional LSTM, Sequence to Sequence (Seq2Seq), Beam Search, Neural Attention Mechanism 


Language: Python

Technology: TensorFlow, PyQT

Tools: Anaconda, Linux


Date: January - May 2018

Description:

- Developed generative model based open domain conversational agent (Human vs AI) using state of the art architecture, Sequence-to-Sequence (Seq2Seq) and attained validation perplexity 46.82 and Bleu 10.6.

- Trained encoder-decoder based Seq2Seq model fully from scratch and further optimized the Recurrent Neural Network based model with Bidirectional LSTM cells, Neural Attention Mechanism and Beam Search.

- Used Cornell Movie Subtitle Corpus following data preprocessing as data, PyQT for chat interface (GUI) development and untrained Google’s Neural Machine Translation (NMT) model for Seq2Seq module.

DISTRIBUTED MACHINE LEARNING FOR BIOMARKERS DETECTION FROM WEARABLE SENSOR BIG DATA

Algorithm & Techniques: Machine Learning, Distributed Machine Learning, Classification, Supervised Learning, Mobile Health, Big Data Analytics

​

Language: Python

Technology: Apache Spark, scikit-learn, Git, GitHub

Tools: IntelliJ Idea, Linux

Date: January - April  2017

Description:

- Developed Machine Learning (ML) module for training ML models on multiple clusters with Apache Spark.

- Developed Grid & Random Grid Search CV module for training time and parameter search optimization.

- Detected biomarkers (psychological stress) from big stream data (accelerometer, ECG, respiration rate) from multi-modal wearable sensors with prediction accuracy (F-1 Score) of 87% with SVM radial kernel initially.

SURVEY ON MACHINE LEARNING BASED PHYSICAL ACTIVITY RECOGNITION METHODS FROM SENSOR DATA

Date: December, 2016 – February 2017

- Conducted research on machine learning based algorithms for physical activity recognition (eg. walking, running, eating and drinking) from multimodal wearable sensor data.

Source code

DISTRIBUTED BIG DATA APPLICATION FOR LARGE SCALE US STOCK MARKET DATA ANALYSIS

Algorithm & Techniques: Financial Analysis, Stock Market Analysis, Anomaly Detection, Distributed Big Data Analytics, Big Data Analytics, Big Data, Data Analytics, FinTech

​

Language: Java, Python 

Technology: Apache Spark, Maven, Git

Tools: IntelliJ Idea, Linux

​

Date: May - July 2017

Description:

- Developed framework for processing and analysis of 7 years of historical US stock market data (50TB) of nanosecond granularity from 13 US exchanges on multiple clusters with Apache Spark.

- Added support for information extraction from binary files based on field spec for multiple year, file formats.

- Conducted multi market analysis (for market dominance detection), anomaly detection (for Flash crash day).

- Proposed using unsupervised learning/clustering on large-scale unlabeled stock market data for anomaly detection and general market analysis in absence of labels.

ECONOMIC MODEL DEVELOPMENT FOR COVID-19 PANDEMIC WITH MACHINE LEARNING

Algorithm & Techniques: Machine Learning, Data Analytics, Economics

​

Language: Python 

Technology: 

Tools: Anaconda

​

Date: July - September 2020

Description:

- Conducted analysis for developing economic model around COVID-19 pandemic with country level economic data of 20 years and applied machine learning algorithms.

Source code

Would you like to learn more about my research projects?

bottom of page