Yet another text classification competition

All credit to my teammates, my team placed 22nd out of 4037 in the Quora Insincere Questions Classification. The key challenge is to weed out insincere questions on Quora, while keeping the training and inference time below the 4 GPU-hour limit.

I deserve no credit on this one, as my contribution was trivial. My attempts on the Character Embeddings never paid off as the gain in performance wasn’t worth the slow-down in training. Please refer to the nice write-up and the kernel by Ryan Chesler for our solution.

quora

Read on →

A new paradigm of neural networks based entirely on Attention

RNNs have been the state-of-the-art approach in modeling sequences. They align the symbol positions of the input and output sequences and generate a sequence of hidden states $h_t$ as a function of previous hidden state $h_{t-1}$ and the input for position $t$. This inherently sequential nature precludes parallelization .

In the paper Attention Is All You Need, Google researchers proposed the Transformer model architecture that eschews recurrence and instead relies entirely on an attention mechanism to draw global dependencies between input and output. While it achieves state-of-the-art performances on Machine Translation, its application is much broader.

P.S. the blog post by Jay Alammar has awesome illustration explaining the Transformer. the blog post on Harvard NLP also provides a working notebook type of explanation with some implementation.

Read on →

This week I joined the flock as a Machine Learning Engineer at Twitter Cortex, where I will be developing and deploying state-of-the-art Deep Learning and NLP models to automate content understanding. Super excited to see what the future holds!

Probabilistic Models for Segmenting and Labeling Sequence Data

Conditional Random Field (CRF) is a probabilistic graphical model that excels at modeling and labeling sequence data with wide applications in NLP, Computer Vision or even biological sequence modeling. In ICML 2011, it received “Test-of-Time” award for best 10-year paper, as time and hindsight proved it to be a seminal machine learning model. It is a shame that I didn’t know much about CRF till now but better late than never!

Reading summaries of the following paper:

CRF

Read on →

Learning and Reflection from the Avito Demand Prediction Challenge

After three competitions, I felt I was ready to push for my first Gold Medal. Along with my four teammates, we took on the challenge of predicting demand for classified ads. We fought tooth and nail till the last moment and were in the Gold zone on the public leaderboard. While we ended up losing the Gold Metal narrowly by 1 spot, it was overall a great learning experience. I look forward to making progress toward a real gold medal in the future.

TL;DR A failed attempt to push for my first Gold Metal on Kaggle. My team ranked top 1% but missed the Gold Metal narrowly by 1 spot!

avito

Read on →

Batch Normalization, Layer Normalization and Why They Work

Reading notes / survey of three papers related to Batch Normalization

Read on →

Learning and Reflection from the Jigsaw Toxic Comment Classification Challenge

I had lots of fun at my last Kaggle competition Mercari Price Suggestion Challenge. Without a second thought, I dived right in the Toxic Comment Classification Challenge to further practice my NLP skills.

To get a different experience, I decided to team up instead of going solo. It turned out great, as I learned a ton from my teammates Thomas, Konrad and Song, who have been doing this much longer than I have. Unknownly, I put myself in the best situation for learning - being the least experienced team member.

TL;DR The Jigsaw Toxic Comment Classification Challenge is the most nail-biting that I have participated in. I am estatic that my team ranked top 1% out of 4,500+ teams

toxic

Read on →