And plain old overlapped I/O can lock pages in memory. With Direct I/O devices can directly DMA into these buffers, so theoretically any device can provide similar functionality and use completion ports for notification.
(or polling, since the possible high "interrupt rate" bottleneck of completion notifications is one of the things that motivated RIO)
Mapping pages is relatively expensive. Not the mapping itself, but as a consequence of such update CPU has to flush at least a portion of TLB cache. With overlapped I/O the kernel has to do that for every I/O request.
With RIO and now io_uring, kernels map buffers to both kernel and user addresses spaces just once on initial setup, and reuse the same buffer for many I/O operations.
(or polling, since the possible high "interrupt rate" bottleneck of completion notifications is one of the things that motivated RIO)