Addressing Quora Duplicated Questions using NLP

Abstract

This project is the final graduation project of Udacity Nano Degree-Machine Learning Engineer. The task is based on the background- There are hundreds of millions of users on Quora, so these users will inevitably ask repeated questions, so there may be many questions on Quora. Quora is based on the random forest algorithm [1], based on the decision tree algorithm to determine Whether two questions are duplicates, so our goal is to calculate whether every two questions are duplicates or not. Doing so will make Quora’s questions more streamlined and more impressive, and the community will have higher quality questions and answers. Please check at full paper. Some critical techniques I used including General Analysis, TF-IDF, XGBoost.

Date
Jan 7, 2019 12:00 AM
Avatar
Dengpan Yuan
Columbia@MSCS

I’m a MSCS student at Columbia University@SEAS.

Related