The only reliable way to get things fixed with these large companies, is to have a direct point of contact to them. I have one. You can find my email in my profile, ping me and I will try to get a human in the loop for you.
It reminds me quite a bit of collision engines for 2D physics/games. Could probably find some additional clever optimisations for the lookup/overlap (better than kd-trees) if you dive into those. Not that it matters too much. Very cool.
There used to be a joint online project to compute these tables in a SETI like distributed system. Everyone who contributed their CPU cycles, could use the tables. And yeah, around 2005-2008.
I've had to test out various networked filesystems this year for a few use cases (satellite/geo) on a multi petabyte scale. Some of my thoughts:
* JuiceFS - Works well, for high performance it has limited use cases where privacy concerns matter. There is the open source version, which is slower. The metadata backend selection really matters if you are tuning for latency.
* Lustre - Heavily optimised for latency. Gets very expensive if you need more bandwidth, as it is tiered and tied to volume sizes. Managed solutions available pretty much everywhere.
* EFS - Surprisingly good these days, still insanely expensive. Useful for small amounts of data (few terabytes).
* FlexFS - An interesting beast. It murders on bandwidth/cost. But slightly loses on latency sensitive operations. Great if you have petabyte scale data and need to parallel process it. But struggles when you have tooling that does many small unbuffered writes.
Did you happen to look into CephFS? CERN (folks that operate Large Hadron Collider) use it to store ~30PB of scientific data. Their analysis cluster is serving ~30GB/s reads
Sure, so the use case I have requires elastic storage and elastic compute. So CephFS really isn't a good fit in the cloud environment for that case. It would get prohibitively expensive.
reply