Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> For the purpose of this series of articles I'll be using the O(f(N)) to mean that f(N) is an upper bound (worst case) of the time it takes to accomplish a task accessing N bytes of memory (or, equivalently, N number of equally sized elements).

That's not really valid; it's not how algorithmic analysis works. The author's conclusion for what is happening and why is correct, but I believe he is confused about how to get there.

Simply, when doing complexity analysis on an algorithm, one must always count an operation. It's not okay to point to the time taken for an implementation and say "That's our function." It is a function, but it's a function of time, not a count of how many operations are performed at given sizes of N.

However, he is correct that naive analysis of arrays and linked lists will result in this odd behavior: arrays will tend to outperform lists on real systems. The problem with the naive analysis is in what it counts. For example, on an insert, a naive analysis will count the number of elements accessed in the structure. That's naive because it assume all accesses are the same - which is what he's getting at with the "myth of RAM". Because of the memory hierarchy, they are not all equal.

But the correct response is not to give up counting operations and look at time, the correct response is to find the right thing to count. And the right thing to count is basically going to be last level cache misses - the operations that force one to go to memory. If you do that, then you will find that the operations you are counting will correlate much better to the actual time spent.

In some places, the author gets this mostly correct: "You can also use Big-O to analyze the time it takes to access a piece of memory as a function of the amount of memory you are regularly accessing." That's fine, as you're counting memory accesses.

In other places, it's not correct: "That I use Big O to analyze time and not operations is important." You can't count time, only operations. You want to count the operations that correlate with your actual running time, but the entire point of good analysis is to find those operations. You can't just shortcut it, only measure time, and then call it algorithmic analysis.

The author gets a lot right, but despite the lengthy discussion, I think he still has some confusions about algorithm complexity analysis.

For the record, these lessons should be familiar to anyone who has done serious performance analysis of computer systems, either on their own, or in the context of a course that focused on systems or architecture.



Fwiw theoretical computer scientists do this kind of analysis as well, while the post makes it sound like they don't. The equal-cost model of RAM operations is only one model used in asymptotic algorithm analysis, and there are other cost models with other properties. The introduction of this paper gives a decent concise overview of some of them: http://www.cs.cmu.edu/~rwh/papers/iolambda/short.pdf




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: