Deep contextualized word representations have taken word representation to the next level by assigning word vectors to words in context - typically a sentence - instead of assigning a vector to each word type. ELMO and BERT are the most popular and successful examples of these embeddings. The authors of BERT released several versions of BERT pretrained on massive amounts of data, including a multilingual version which supports 104 languages in a single model.
Attention has become ubiquitous in sequence learning tasks such as machine translation. We most often have to deal with variable length sequences but we require each sequence in the same batch (or the same dataset) to be equal in length if we want to represent them as a single tensor. Padding shorter sentences to the same length as the longest one in the batch is the most common solution for this problem.
I started a word counting challenge a few months ago and it received a lot more interest than I had expected:
Creating word frequency lists is an easy task in most programming languages but how easy it is exactly? And what are the performance trade-offs? We played around with our favorite programming languages and got surprising results. The experiment is still going, you can participate too. And please do.
subscribe via RSS