Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Thanks for the link!

I think that these models have to learn to efficiently use their parameters, and the best way to do that is 'evolve' (yes, a bad word for it), structures over pretraining time. Unfortunately, they don't have a way to access these structures 'from the inside'. I hope this new approach lets up boost performance in s more experimentally rigorous way

 help



I think the recurrence is a consequence of using a residual connection, seems like that makes the representation stay consistent across layers



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: