RETRO is fast

mbforbes · on July 3, 2022

Did anyone else read the generations? They’re… really poor quality, right? Im not sure whether I’m misunderstanding though.

ShamelessC · on July 3, 2022

I don't think they are generations, but rather samples from The Pile that are semantically close to the input.

Actually, as far as I can tell - the RETRO arch itself isn't trained in this article. It focuses more on how to build the retrieval system with a fast KNN index over all of the Pile.

visarga · on July 3, 2022

This is great for speed, maybe we can also increase the window size if the model is so small, but how about the quality of the generated text? With a 20x smaller model does quality drop?

How many chunks do you retrieve? The paper shows best results at k=1 and then at k>50.

sanxiyn · on July 4, 2022

If you are talking about Figure 1 of https://arxiv.org/abs/2112.04426, you are reading it wrong. Y axis is bits-per-byte and lower is better.

visarga · on July 4, 2022

I saw the graph. Bits-per-byte is not clear for me (for example lower loss does not always mean better accuracy when you train a classifier). What I needed is to know if the generated text has the same "literary" qualities, and also if it can compete in few shot mode on as many tasks as GPT-3 can do.

In other words, can we rely on it instead of GPT-3 in realistic scenarios or is it good only for low BpB?

ShamelessC · on July 4, 2022

I imagine that GPT-3's ability to complete prompts it hasn't seen will make it quite a bit more versatile than using the Pile. It's the interpolation between the data points that makes deep learning so powerful.

This obviously still has plenty of use cases. Could get a batch of similar text to calculate statistics on. Or perhaps augment existing data for training something else.

One realistic use case would be to simply present a search engine interface enabling you to find "interesting" text snippets alongside metadata like book author/title matching your description, perhaps for fiction enthusiasts or what have you.

hwers · on July 3, 2022

I guess an interesting way to translate this technique to text-to-image would be to get an image from a database that matches the text query (via CLIP) and feed that + noise into a diffusion model that only does a few denoising iterations (and no clip guidance maybe). Would be a lot faster than from-scratch diffusion.

(Another way could be to redo the architecture to include a “inspired by this image” input, which is queried from an image server at inference time.) Anyone have other ideas?

ShamelessC · on July 3, 2022

This was one of the motivations for `clip-retrieval`, a faiss index over the CLIP embeddings (CLIP ViT-L/14 to be precise) for all the captions/images in the LAION5B-Aesthetic dataset.

https://rom1504.github.io/clip-retrieval

Try the reverse image search - it can be shockingly effective.

You can pretty easily rehost the index or build a lookup over your own data if you check the GitHub repo.

If you don't have any data of your own, enter a query and hit that download icon to get a CSV of `URL,Caption,CLIP score`.

DoctorOetker · on July 4, 2022

I tried "a man with shopping bags stopping a tank" and was hoping to get the Tiananmen Tank Man, but I'm having no luck with variations either.

EDIT: it does contain a blurry picture of the tank man and some LEGO re-enactments when I query "tiananmen tank man", but was hoping it would more intelligently deduce the picture from the description

ShamelessC · on July 4, 2022

There is a violence filter enabled by default that might cause that issue. Could also just be a tough query - good test!

nutanc · on July 4, 2022

Have been thinking about this idea. Are there any open source implementations of diffusion models which take an image as an input?

PKop · on July 3, 2022

I find the font contrast far too low, and therefore the text is hard to read.

dang · on July 3, 2022

"Please don't complain about tangential annoyances—things like article or website formats, name collisions, or back-button breakage. They're too common to be interesting."

https://news.ycombinator.com/newsguidelines.html

PKop · on July 3, 2022

OK I didn't know this was against the rules, I see it on here often. Small quibble: difficulty to actually read the submission doesn't seem completely tangential.

rightbyte · on July 3, 2022

I barely can read it. I got that thing where circles are oval-shaped at the eye exam.

bitforger · on July 3, 2022

Hey! Author here. I've noticed on certain mobile devices the contrast can be weird, not sure why.

I updated the default theme's text color to be 10% darker, lmk if that looks better to you.

slater · on July 3, 2022

Chiming in here w/ same complaint ;)

On macOS (display: 15.4-inch, 2880 × 1800), it's really difficult to read. I set the font to ''400 1.2rem/1.5 "Fira Sans",sans-serif'' and color to #111 in dev tools, way better readability.

(sidenote: is Fira Sans a default installed font on Linux systems? I'm on macOS and don't have that, and don't see a font embed anywhere in your source code. So that might be the issue - 'sans-serif' at 200 weight is way too faint)

PKop · on July 3, 2022

Pretty sure it's the font-weight that needs increased, when you do the current color is fine.

MollyRealized · on July 3, 2022

Unless I have something specific to my computer going on, or something was altered in the last 30 minutes, I'm seeing black on white.

Palomides · on July 3, 2022

I get #3c484e text with weight of 200