Well, Claude 3.5 can do translation from one language to another in a fairly com...

Animats · on July 30, 2024

> But, this isn't just about rewriting code from one language to another. It's about reverse engineering complex information out of the code, which may not be immediately visible in it, and then finding a way to make it "safe" according to Rust's type system. Where's the training data for that? It'd be really hard even for skilled humans.

That might not be too bad.

A combination of a formal system and an LLM might work here. Suppose we see a C function

   void somefn(char* buf, int n);

First question: is "buf" a pointer to an array, or a pointer to a single char? That can be answered by looking at what the function does with "buf", and what callers pass to it.

If it's an array, how big is it? We don't have enough info to know that yet. But a reasonable guess, and one than an LLM might make, is that the length of buf is "n".

Following that assumption, it's reasonable to translate this to Rust as

   fn somefn(buf: &[u8])

and, if n is needed within the function, use

   buf.len()

The next step is to validate that guess. The run-time approach is to write all calls to "somefn" with

   assert!(buf.len() == n);
   somefn(buf, n);

Maybe formal methods can prove the assert true, and we can take it out. Or if a SAT solver or a fuzz tester can generate a counterexample, we know that the guess was wrong and this has to be done the hard way, as

   fn somefn(buf: &[u8], int n)

implying more subscript checks inside "somefn".

The idea is to recognize common C idioms and do clean translations to Rust for them. This should handle a high percentage of cases.

mike_hearn · on July 31, 2024

Yes, this is similar to what IntelliJ does for Java->Kotlin. Do a first pass that's extremely non-idiomatic and mechanical, then do lots of automated refactoring to bring it closer to idiomatic.

But if you're going to do it that way, the right place to start is probably to a safer form of C++ not Rust. That way code can be ported file-at-a-time or even function-at-a-time, and so you'll have a chance to run the assertions in the context of the original code. Which of course may not have good test coverage, as C codebases often don't, so you'll have to be testing your assertions in production.

Animats · on July 31, 2024

> But if you're going to do it that way, the right place to start is probably to a safer form of C++ not Rust.

There's something to be said for that. You're going to need at least an internal representation that's a safe C/C++.

pjmlp · on July 31, 2024

> Make std::vector[] properly bounds checked

Most compilers do have flags to turn this on, which I use all the time.

The issue is the "performance trumps safety" culture that pushes back against using them.

TinkersW · on July 30, 2024

std::vector [] has had bounds checking since forever if you set the correct compiler flag. Since they aren't using it this is a choice, presumably they prefer the speed gain.

mike_hearn · on July 30, 2024

You mean _GLIBCXX_DEBUG? It's got some issues. Linux only, it doesn't always work [1] and it's all or nothing. What's really needed is the ability to selectively opt-out on a per-instantiation level so very hot paths can keep the needed performance whilst all the rest gets opted into safety checks.

Microsoft has this:

https://learn.microsoft.com/en-us/cpp/standard-library/safe-...

but it doesn't seem to actually make std::vector[] safe.

It's frustrating that low hanging fruit like this doesn't get harvested.

[1] "although there are precondition checks for some string operations, e.g. operator[], they will not always be run when using the char and wchar_t specializations (std::string and std::wstring)."

TinkersW · on July 31, 2024

With MSVC you can use _CONTAINER_DEBUG_LEVEL=1 to get a fast bounds check that can be used in release builds. Or just use it in development to catch errors.

mike_hearn · on July 31, 2024

Interesting thanks. Seems the reason I couldn't find anything on that is because it's internal only and not a feature you're actually meant to use?

https://github.com/microsoft/STL/issues/586

> We talked about this at the weekly maintainer meeting and decided that we're not comfortable enough with the (lack of) design of this feature to begin documenting it for wide usage.

pjmlp · on July 31, 2024

What you want should be _ITERATOR_DEBUG_LEVEL instead, that is the public macro for bounds checking configuration.

Calavar · on July 30, 2024

As far as I am aware, the standard doesn't mandate bounds checking for std::vector::operator[] and probably never will for backwards compatibility reasons. Most standard library implementations have opt-out std::vector[] bounds checking in unoptimized builds, but not in optimized builds.

I tried a toy example with GCC [1], Clang [2], and MSVC [3], and none of them emit bounds checks with basic optimization flags.

[1] https://godbolt.org/z/W5e3n5oWM

[2] https://godbolt.org/z/Pe8nPPvEd

[3] https://godbolt.org/z/YTdv3nabn

TinkersW · on July 31, 2024

As I said you need the correct flag set.. MSVC use _CONTAINER_DEBUG_LEVEL=1 and it can be used in release. They have had this feature since 2010 or so, though the flag name has changed.

pjmlp · on July 31, 2024

The correct name is _ITERATOR_DEBUG_LEVEL.

pjmlp · on July 31, 2024

Add a "#define _ITERATOR_DEBUG_LEVEL 1" on top for VC++.