I sat through a Xeon Phi presentation at university, about how it would revolutionize the university's "supercomputer". I left shortly after; did Phi come to nothing?
It was too difficult to get decent performance from Xeon Phi for general use cases. A few apps could make it work e.g. PGS bought up all the old stock for a big geophysics system.
Omnipath went the way infiniband is going. Ethernet has caught up, and surpassed the speeds, so using proprietary technology with fewer features isn't that attractive anymore.
But not faster than L3 cache bandwidth. Some cards can DMA to L3 cache. Granted, eventually it's flushed to main RAM, so might not help too much in the end.