I can’t find where the 4% number comes from, but I expect that it comes from a micro benchmark that does little more than futex calls. Reason: if it is from a benchmark that represents real-world use, real-world use must spend at least 4% of its time in futex calls. If that were the case, somebody would have seen that, and work would have been spent on this.
From the lkml post, it's neither from a real-world case nor from an especially constructed futex benchmark, but from an existing kernel benchmark where it was thought it would be interesting to see how it performed with the new call.
So no, this won’t give you 4% in general.