Posts

Feb 19, 2019
Exploring BERT's Vocabulary

Deep contextualized word representations have taken word representation to the next level by assigning word vectors to words in context - typically a sentence - instead of assigning a vector to each word type. ELMO and BERT are the most popular and successful examples of these embeddings. The authors of BERT released several versions of BERT pretrained on massive amounts of data, including a multilingual version which supports 104 languages in a single model.
Dec 27, 2018
Masking attention weights in PyTorch

Attention has become ubiquitous in sequence learning tasks such as machine translation. We most often have to deal with variable length sequences but we require each sequence in the same batch (or the same dataset) to be equal in length if we want to represent them as a single tensor. Padding shorter sentences to the same length as the longest one in the batch is the most common solution for this problem.
Mar 19, 2016
Counting words in different programming languages II.

I started a word counting challenge a few months ago and it received a lot more interest than I had expected:
Nov 26, 2015
Counting words in different programming languages

Creating word frequency lists is an easy task in most programming languages but how easy it is exactly? And what are the performance trade-offs? We played around with our favorite programming languages and got surprising results. The experiment is still going, you can participate too. And please do.

Posts

Exploring BERT's Vocabulary

Masking attention weights in PyTorch

Counting words in different programming languages II.

Counting words in different programming languages