IMO the under-discussed risk here is that sites will start serving different content to verified crawlers vs real users. You're already seeing it with known search bots getting sanitized views. If your agent's context comes from a crawl the site knows is going to an AI, you have no guarantee it matches what a human sees, and that data quality problem won't surface until your agent starts acting on selectively curated information.
This could go wrong on same levels.