Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
NASA Uses RISC-V Vector Spec to Soup Up Space Computers (eetimes.com)
147 points by JoachimS on Nov 22, 2022 | hide | past | favorite | 38 comments


NASA probably stands to save a lot of duplicated development time if they could use just one ISA from their tiny MCUs to large HPC chips. It would also help with vendor dependency. If a vendor doesn't work out, just move to another RISC-V vendor.

Why not ARM then? Because NASA can use their own implementations in FPGA without paying a bunch of extra money for the privilege.


>If a vendor doesn't work out, just move to another RISC-V vendor.

It doesn't really work that way. NASA needs two main things that most vendors can't supply: a radiation-hardened chip, and super-high quality controls. Implementing on an FPGA is much slower than an ASIC, but the ESA did it with the LEON3 processor.

Radiation hardening can come in a variety of ways, from redundant designs (3-core voting, EDAC) to manufacturing processes (silicon-on-insulator, gate structures) to even packaging (high density ceramic). These have very little overlap with commercial applications, so the vendor choice is still very limited.


It seems like it also wouldn't be enough to just declare the fpga itself hardened. It might need to certified for a specific setup, as even small design changes could cause significant moving around of logic that could change the ability to recover from a SEU.


I dunno, it's a bit fuzzy, I don't recall general layout issues when it came to rad stuff. If you are in an environment where you are that worried about an upset, you're likely using a rad tolerant fpga. Parent comment mentions 3-core voting which is a good callout. Ime a lot (most?) rad hard fpgas have a sort of voting logic baked into the flip-flops, its sort of transparent to the synthesizer. They usually have an upset immunity to a certain energy particle. Transients are a different story, and RAM or clock resources you can find require some consideration.


> Why not ARM then? Because NASA can use their own implementations in FPGA without paying a bunch of extra money for the privilege.

It would not surprise me if ARM were to offer NASA such a license rather cheaply, possibly even for free. The marketing/P&R/etc benefits of "NASA uses our technology" likely outweigh any potential revenue – especially if the choice comes down to "NASA uses our technology for free" versus "NASA uses a competing technology instead".


There's a lot of money in the space industry. You can't just drop a commercial core into an FPGA that's destined for space and not expect any issues. Even if ARM offered a free perpetual license, there would still need to be significant work done on applying reliability and radiation tolerance features to the design. Plus, NASA would want a wide range of features available for cores. A small FPGA might not have enough resources for things like floating point operations but a large one would. NASA would need access to multiple architectures to ensure they can fit the core to the mission. Also, NASA doesn't do a lot of their work themselves. They buy cores from vendors or buy boxes from vendors that have cores made in-house or from third party vendors. Would all of these vendors have the same open access to the ARM license? Instead, NASA can just say "we want to use RISC-V for everything" thus all of the vendors can now save some money by not having to insert a bunch of license costs into the final price. NASA could even come out with their own RISC-V cores that vendors can easily drop into their FPGAs, giving NASA the piece of mind that every system they get will have common architectures. There is also the benefit of the interconnects having commonality, it's a pain in the ass having add bridges to connect older cores to newer cores. If NASA says we want to use modern AXI4 for everything, vendors may be more inclined to update their cores to be more compatible.

Even though the space industry generates a lot of money, it's nothing compared to the revenue that ARM gets from consumer electronics. NASA can contribute to RISC-V development in a way that far exceeds what kind of influence they would have with ARM.

Overall, NASA benefits from a common architecture for a variety of reasons along with the benefit of not paying the ARM license multiple times for some missions.


I doubt ARM needs any extra PR, having captured the entirety of the mobile market.


Apparently they're not in a great place since first the Softbank acquisition, then the Nvidia deal falling through, and now trying to pump up the valuation for an IPO.

I'd agree that NASA using ARM would probably be neutral for them, but NASA using SiFive is decidedly not neutral for their valuation. ARM has sort of been banking on the talking point that RISC-V is not mature enough for common use yet. A core being mature enough for NASA sends the opposite message pretty clearly.


I'm not sure how that's possible, especially with something like Tesla having a 500B valuation. If Tesla disappears, by large nothing changes. If ARM goes under, the entirety of the modern world screeches to a halt within a year.

Though it is surprising that they're even still a private company at this point.


Their valuation is somewhere between $25B to $35B. https://www.bloomberg.com/news/articles/2022-02-09/arm-s-ipo...

They won't go under, but apparently Softbank will lose money on the whole thing if it's on the low end of that spectrum.

And yeah, valuations can be about investor hype more than any basis in reality. And Tesla has almost as much hype as NFTs did a year ago.


ARM is much more than just smartphone chips, and their embedded market share is under serious threat from RISC-V.


They're as much under threat from RISC-V as the Nvidia hegemony is from Intel Arc. Hardly any at all.


I would think most, if not all, ARM use by NASA would be through their suppliers. Would they get a special provision in their ARM license? Would they bother doing the administrative work needed to report “we created x million CPUs, of which y thousand went to NASA”?


NASA has a general directive to support multiple platforms when possible. When I was there, we had to support multiple OS targets (green hills, RTEMS, VxWorks if I recall). I don’t think the agency will ever settle on a single architecture, let alone a single vendor (of course) for all microprocessors. But I could certainly be wrong, and I hope I am in this case.


I should note, this was more about vendors than platforms.


>Why not ARM then?

Putting aside other advantages (open ISA, free license, ecosystem, no vendor lock-in), there are technical reasons.

RISC-V is simpler. This matters a lot when you already have the complexity of radiation tolerance to put up with. Simplicity also lends itself to high assurance, through verification and formal proofs.

In practice, we see how SiFive and Andes offerings beat ARM's own in performance and power efficiency, with a major area advantage (sometimes less than half!).


NASA isn't exactly a high-volume customer. I'm pretty sure that the royalties for using ARM isn't going to materially change the cost of materials for their rockets.


I'm not sure if this still tracks, but a while ago there were only a handful of "space certified" cpus available. Most of them were RISC based when I dug into this before.


Soup up originated in the U.S. in the late 19th century, though it wasn't widely used until the 20th century. Its exact origins are unknown, but it could be short for supercharge, or it might come from a horse-racing slang term for injecting horses with narcotics meant to make them run faster.

Interesting. I never saw it spelled like that before. Looks like no one knows how it should be spelled, so all variations are fine.


> I never saw it spelled like that before. Looks like no one knows how it should be spelled, so all variations are fine.

Prior to your comment, I was unaware that there was any other way to spell it.

I very much get the impression that "soup up" is the more common spelling, and (by some measures at least) the more standard. The online version of Merriam-Webster knows of the "soup up" spelling only.

https://www.merriam-webster.com/dictionary/soup%20up


Fascinating how performance enhancing drug slag shifted from "soup" to "juice" and most recently "gear".


I was once told that the term originated in the south, among track racers, back in the early days of cars. Apparently there was some kind of engine modification that involved the use of a soup can. If you had modded your car for racing, you had literally “souped it up”.

Wish I could remember the details; was very young when I was told this.


That's really interesting. I knew the phrase was relevant to cars, but not knowing it's commonly accepted spelling, I assumed "supe'd up" meant a supercharger upgrade. Curious to see what this can modification is like.


Your comment entirely stole the show. Forgot why I even opened the comment section.


Probably useful to remember this post was written by SiFive's VP of AppEngineering.


All the discussion here seems focused on mitigation of upsets. What about total dose? Clever architectures and error correction don't mitigate that. Effective shielding is impractically heavy. You really need total dose hardness at the process level and that severely limits your choices and performance.


Are these hardened cpus? Do they run two in parallel executing the same code as previous cpus?


I think they are working with Microchip on the rad hardened CPUs. Microchip acquired Microsemi which used to make these kind of processors.

https://www.nasa.gov/press-release/nasa-awards-next-generati...


I thought you needed three, and you go with whichever two match?


No, it's not finding quorum, it's checking for bit flips generally. You just need dual redundant CPU's in lockstep.

"The FFB is at the heart of the Ingenuity helicopter’s avionics package. Two TI TMS570 Hercules microcontroller SoCs, originally aimed at automotive applications, operate as Ingenuity’s low-level flight controller (FC). Each TI microcontroller incorporates a dual-core, ARM Cortex-R5F processor operating in lockstep. The FFB also incorporates ECC-protected Flash memory and RAM. The dual-redundant lockstep processors and ECC memory provide some protection against radiation-induced soft errors such as SEUs (single event upsets)." https://www.eejournal.com/article/an-fpga-flies-on-mars/


To be clear, this setup is different than just 2 cpu cores in lockstep, with reset in case a fault is detected. It's also different from triple redundancy/voting where there is no reset, but continuous operation.

This is two cores times two, managed by an FPGA. If one SOC (one pair of cores running in lockstep) detects a fault, the flight controller is hot-swapped to the other SOC (also one pair of cores running in lockstep) while the first SOC is rebooted.


I still don't understand how you can check for bit-flips without a quorum? What do you do when one CPU says there was a bit flip and the other one doesn't?


> What do you do when one CPU says there was a bit flip and the other one doesn't?

Reset and erase processor internal state that was corrupted. Startup times are typically one of the metrics controlled for in these systems.


Indeed, Ingenuity can do it in under 10ms iirc, which is highly imperceptible (to us) and it can do it mid-flight too.


Dual-redundant gets you error detection without correction whereas triple-redundant with majority voting can get you error detection with correction in the case that one fails. The error detection with dual-redundancy only works if one fails. If both fail and both fail in the same way you won't be able to detect it, but you basically accept the risk that it's an improbable event. The "correction" in the dual-redundant case is typically reset into some known state rather than trying to correct and continue operation.


I also thought this. If you've just got two, how would you know which of them is the correct path?


For some applications, not necessary. You can simply detect the error, abort the operation, and retry it from the beginning. If the difference was due to a transient bit flip (due to radiation/etc), odds are high it will work the next time. Obviously that costs some recovery time, but if the application isn't overly time-critical, you can live with it. You also need to make sure your code can handle being interrupted and restarted at any point, even if it was half-way through mutating some data structure.


Fabricating and successfully validating these things must be an absolute nightmare.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: