# Resolving Gendered Pronouns

## Learning from the Gendered Pronoun Resolution Challenge

With the goal to learning PyTorch and getting more hands-on experience with transfer learning via pre-trained language models, I took part in the Gendered Pronoun Resolution Competition on Kaggle. The learning alone was quite worth it. And I placed 30th solo out of 800+ teams with limited time invested.

# Flagging Insincere Questions on Quora

## Yet another text classification competition

All credit to my teammates, my team placed 22nd out of 4037 in the Quora Insincere Questions Classification. The key challenge is to weed out insincere questions on Quora, while keeping the training and inference time below the 4 GPU-hour limit.

I deserve no credit on this one, as my contribution was trivial. My attempts on the Character Embeddings never paid off as the gain in performance wasn’t worth the slow-down in training. Please refer to the nice write-up and the kernel by Ryan Chesler for our solution.

# The Transformer

## A new paradigm of neural networks based entirely on Attention

RNNs have been the state-of-the-art approach in modeling sequences. They align the symbol positions of the input and output sequences and generate a sequence of hidden states $h_t$ as a function of previous hidden state $h_{t-1}$ and the input for position $t$. This inherently sequential nature precludes parallelization .

In the paper Attention Is All You Need, Google researchers proposed the Transformer model architecture that eschews recurrence and instead relies entirely on an attention mechanism to draw global dependencies between input and output. While it achieves state-of-the-art performances on Machine Translation, its application is much broader.

P.S. the blog post by Jay Alammar has awesome illustration explaining the Transformer. the blog post on Harvard NLP also provides a working notebook type of explanation with some implementation.

This week I joined the flock as a Machine Learning Engineer at Twitter Cortex, where I will be developing and deploying state-of-the-art Deep Learning and NLP models to automate content understanding. Super excited to see what the future holds!

# Conditional Random Field

## Probabilistic Models for Segmenting and Labeling Sequence Data

Conditional Random Field (CRF) is a probabilistic graphical model that excels at modeling and labeling sequence data with wide applications in NLP, Computer Vision or even biological sequence modeling. In ICML 2011, it received “Test-of-Time” award for best 10-year paper, as time and hindsight proved it to be a seminal machine learning model. It is a shame that I didn’t know much about CRF till now but better late than never!

Reading summaries of the following paper:

# Word Embeddings

## Learning and Reflection from the Avito Demand Prediction Challenge

After three competitions, I felt I was ready to push for my first Gold Medal. Along with my four teammates, we took on the challenge of predicting demand for classified ads. We fought tooth and nail till the last moment and were in the Gold zone on the public leaderboard. While we ended up losing the Gold Metal narrowly by 1 spot, it was overall a great learning experience. I look forward to making progress toward a real gold medal in the future.

TL;DR A failed attempt to push for my first Gold Metal on Kaggle. My team ranked top 1% but missed the Gold Metal narrowly by 1 spot!

# On Normalization Layers

## Batch Normalization, Layer Normalization and Why They Work

Reading notes / survey of three papers related to Batch Normalization