It's not an outlier. You will see more than 26x improvement if you try this on a...

It's not an outlier. You will see more than 26x improvement if you try this on an even deeper LLM with more layers. The deepest model they have applied it on has 30 billion parameters.

Edit: I apologize. The table was cut off on mobile and I didn't see that they sneaked in GPU+CPU offloading for the 25x result.