Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Great work and love the detailed breakdown. This is kind of tangential, but it reminded me of this work: https://arxiv.org/pdf/2310.12973 (Frozen Transformers in Language Models are Effective Visual Encoder Layers).

The paper puts out an interesting hypothesis that these LLM-derived transformer layers have the ability to "refine" any set of learned tokens, even in different modalities. I wonder if what you're seeing here is related?

 help



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: