Attention has become ubiquitous in sequence learning tasks such as machine translation. We most often have to deal with variable length sequences but we require each sequence in the same batch (or the same dataset) to be equal in length if we want to represent them as a single tensor. Padding shorter sentences to the same length as the longest one in the batch is the most common solution for this problem.
I started a word counting challenge a few months ago and it received a lot more interest than I had expected:
Creating word frequency lists is an easy task in most programming languages but how easy it is exactly? And what are the performance trade-offs? We played around with our favorite programming languages and got surprising results. The experiment is still going, you can participate too. And please do.
subscribe via RSS