Large nnet is all you need - no matter which architecture. I've pitched something similar to @jekbradbury: Just scale up our QRNNs and you'd likely get similar performance as transformers.
Large nnet is all you need - no matter which architecture. I've pitched something similar to @jekbradbury: Just scale up our QRNNs and you'd likely get similar performance as transformers.
@RichardSocher @jekbradbury only if it was that easy
@RichardSocher @jekbradbury throwback
@RichardSocher @jekbradbury This is an interesting opinion and I share the same views on size. But I cannot understand the unreasonable effectiveness of decoder only vs not as performant encoder-decoder system with regards to generation capabilities. Anyone investigated that? Tips/pointers 😉