Big-O is only strictly meaningful in terms of asymptotic behavior; as soon as you start talking about specific values of N, you have to care about constant factors.
In this case, the constant factors will probably be quite different because the goal is to make better use of CPU cache.
So, in Big-O term, this only trying to optimize only small dataset (4 <= N <= 16).