The truth of it is that it’s just not possible (with currently existing technology/ML architectures) to create a truly autonomous taxi without HD maps. Everyone in the robotaxi industry knows this - even Tesla builds HD maps, they just don’t call them that.
My knowledge only comes from Karpathy's talks about this (which are great, worth watching if you haven't seen them).
I found his and Tesla's arguments convincing for the general case. That doesn't mean that the narrow cases aren't super cool or valuable (I signed up for this Cruise thing in SF).
I just think that if the software is unable to make decisions based on visual data alone without up to date high resolution maps it'll never achieve true FSD in the general case (not geo locked). You'll end up trapped in a local max otherwise because there are just too many conditions in the real world that vary (and the world is too large to economically map fast enough for that approach). You have to solve the vision problem.
I don't know enough to comment on the approach differences beyond that, but my understanding was that Tesla did not rely on the same stuff that Waymo and Cruise require (largely Lidar and these high resolution maps).
> I just think that if the software is unable to make decisions based on visual data alone without up to date high resolution maps it'll never achieve true FSD in the general case (not geo locked). You'll end up trapped in a local max otherwise because there are just too many conditions in the real world that vary.
My contention is that there’s no way to actually solve for the general case with currently existing technology. The amount of novelty in the real world is too great for any system to account for it without disambiguating via HD maps or remote support.
>You have to solve the vision problem.
This isn’t a vision problem specifically - even if you had LIDAR and high resolution imaging radar and 8 A100s on every Tesla, “true generalized self driving” wouldn’t be achievable without HD maps with our current understanding of Machine Learning.
>My understanding was that Tesla did not rely on the same stuff that Waymo and Cruise require.
Tesla maps individual traffic light elements, stop signs, and lane markings, but will attempt to drive even if the area isn’t mapped.
Disparities in FSD performance in different areas is largely attributable to some areas being better mapped than others - the mapping data has a huge effect on its performance. There are key elements of the driving task (including recognizing and reacting to every single type of sign other than a stop sign) that FSD can’t do and relies entirely on maps for.
Novelty isn’t nearly as big of a problem as you might think. One of Wamo’s famous videos was someone on an electric scooter chasing a duck in the middle of the street. That’s very odd behavior, but the car followed the rather simple option of just not hitting them and going forward when possible.
Cars really don’t need to identify what something is just it’s location and movement which is a vastly easier problem. A trash can rolling down the street can be treated just like an oil drum doing the same thing etc.
> Cars really don’t need to identify what something is just it’s location and movement which is a vastly easier problem. A trash can rolling down the street can be treated just like an oil drum doing the same thing etc.
You’d think that, until you encounter something like a turn restriction sign with a bizarre conditional restriction that it’s never seen before. At which point the car needs to OCR the text, parse the semantic meaning, and apply to the scene.
Right by my house I have a four lane (on one side) intersection with a traffic signal. Each of the lanes goes straight ahead. However, each lane has its own traffic light, and when the traffic light rotation is in that direction, it alternates the two left most straight lanes red while the right most are green, and then switches (because very shortly after the intersection there is a quick lane reduction to two lanes).
I can't imagine how AI would _correctly_ see four straight arrowed lights in front of it in the intersection, some of which are red, some are green. Humans of course recognize that they correlate to the lanes, but this is a more esoteric case for AI to assimilate.
And now we’re already making concessions about the car’s abilities.
There are 10 MPH speed limit signs on Market Street in SF that specify in incredibly small text “when behind trolleys”. Assuming we take your approach, the car will just always go down market at 10 MPH.
Imagine if it’s a negative turn restriction - IE, it’s permitting turns except for during certain hours and conditions. Now the car is treating it as always permitted and turning into traffic. An edge case, but something it’s going to encounter in the real world.
And now your moving the goalposts. We are talking extreme edge cases in some random small town not common signs in a major city. They can always get updates on what some random sign in some random location means as long as their safe and don’t block traffic that’s all that’s needed.
Also, negative restrictions can again default to full restrictions. Permitting a car to say park in a snow lane doesn’t require a car to park in the snow lane.
I don’t think I’m moving the goalposts - we were discussing whether autonomous driving (which I take to mean L4-L5 driving without the need for a human in the loop) is possible without geofences or HD maps. “Edge cases in some random small town” are exactly the sort of thing you need to worry about without a geofence.
Not to mention these sorts of edge cases are way more common in large cities than small towns - one of the examples I gave was down a central avenue in San Francisco.
>They can always get updates on what some random sign in some random location means as long as their safe and don’t block traffic that’s all that’s needed.
What if it truly fails to parse the sign accurately and does something illegal or dangerous? What does sending an update out look like? Does a human take a look at a crop of the sign and review it? Why not just map it in that case?
It’s not a question of parsing a known sign, even extremely complex rules can be encoded. Further that process can take place from a photo of the sign uploaded by the car to then be encoded by the rules. The general case is stopping and having a remote driver slowly tell the car what to do.
An unknown sign in a place without cellphone reception is about the only case where it really need to just figure it out on it’s own rather than simply avoid causing an accident.
> What if it truly fails to parse a sign accurately and does something illegal or dangerous?
Not much, people regularly disobey traffic signs especially ones with complex instructions. Don’t hit stuff or jump in front of another car is generally enough.
> Further that process can take place from a photo of the sign uploaded by the car to then be encoded by the rules. The general case is stopping and having a remote driver slowly tell the car what to do.
So you’re now agreeing that you need some level of remote support to handle edge cases like this?
>An unknown sign in a place without cellphone reception is about the only case where it really need to just figure it out on it’s own rather than simply avoid causing an accident.
Yes, and again, this is the sort of thing you actually need to worry about when trying to come up with generalized self driving solution.
> Not much, people regularly disobey traffic signs especially ones with complex instructions. Don’t hit stuff or jump in front of another car is generally enough.
What if it misinterprets a one way sign at night when there’s no other signal that it’s turning on to a one way lane and it suddenly finds itself traveling opposite the direction of traffic for a long period before encountering another car? You have to consider all of these edge cases when talking about a generalized solution.
Maybe you still disagree with me in sprit, but do you see how when we really look at edge cases how you have to fall back to some level of remote operation or mapping?
> So you’re now agreeing that you need some level of remote support to handle edge cases like this?
As a bootstrap step yes, after that no just regular updates for new traffic rules and such. You can’t make a purely offline self driving system that doesn’t get updated for 30 years because laws change. But presumably a non geofenced self driving car is going to be tested by driving on every road either directly or via someone’s mapping project.
> What if it misinterprets a one way sign at night when there’s no other signal that it’s turning on to a one way lane and it suddenly finds itself traveling opposite the direction of traffic for a long period before encountering another car? You have to consider all of these edge cases when talking about a generalized solution.
You mean in some location without maps? There are a finite number of roads in the world and they don’t change that quickly. If you’re worried that the AI is going to say end up on an ice road that melts, sure that’s the kind of thing that happens once. But the threshold isn’t perfection it’s ~30,000 dead people per year in the US. Beat that and you win.