Here are some spe
cific tasks that could be good starter projects:- Test generation for C and Rust
- A C test execution framework that runs under Valgrind (or similar) to detect memory violations
- Create an AST that represents C, Rust, and higher-level concepts like collections
- Analyze C code to identify low-level implementations of high-level concepts. One example is nodes with next pointers that are better represented as a lists. Another is resizing of arrays that is better represented as a resizeable vector. There are many other examples.
- Lifetime analysis. This may be static or dynamic. It indicates when it is safe to deallocate memory.
- Shape analysis. If a data structure is a tree or DAG rather than a (possibly cyclic) graph, and there are no external pointers into it, then reference counting can be used, rather than explicit deallocation.
I have already pitched these to a potential undergraduate student; he will let me know next week what he wants to do.
-Mike
I generally agree with OpenBLAS---great in (nearly) all ways---except
doesn't it have a bunch of Fortran? (29% according to GitHub)
Perhaps it's simple enough to break out the Fortran so we're only
concerned with independent C code, or maybe it's easy enough to
translate Fortran, or maybe a very clean FFI that we're just
conceding to be OK with for the purposes of a starter example.
Tom Anderson <tom@cs.washington.edu> writes:
> Thank Amit for the recap. To amplify a couple things:
>
> Gilbert mentioned potentially recruiting some students to work on the
> project, which sounds great. To do that we need a message and what works
> for DARPA is not always what works for students. One way to put this is
> what's the most interesting part of this topic? I'd guess its lifting -
> inferring intent from code, in a way that's useful. Transpilation is one
> use case but not the only one. Can we extract a spec from code? I think
> that would be super useful.
Indeed, a good paper might be "just" be extracting a spec that could be
used for, e.g., testing or verification of a port to a new
language/platform/etc. I can see uses for this beyond automatically
translating for Rust, including manually translating to Rust (or
something else), evaluating different implementations of the same
protocol (this is actually a real real problem with 6lowpan
implementations that are just not compatible), etc.
>
> Second, my experience with using ML is that you *really* need to break the
> problem down into solvable sub-pieces. One can think of this as curricular
> learning - how do you get better at something? Start with something simple
> and add complexity as you get better at it. If the challenge is, let's try
> to infer intent, let's start with some code where that's relatively easy,
> and add complexity as we go. One reason I suggest openblas is (1) it uses
> convenience pointers (2) its functional (3) it has a spec for each function
> in comments.
>
> So it should be relatively simple (lol) to set as an initial subproblem -
> hoist openblas into a spec, then produce a verus program as output - rust
> code plus verus spec, that we use verus to prove correct so we don't need
> test generation. of course if you think the interesting part is elsewhere
> then maybe that's not a good initial step and if so lets discuss. I think
> driver transpilation is super useful, but for everyone I talk to about
> this, I get back that they wouldn't even know where to start.
>
>
> On Thu, Nov 7, 2024 at 10:53 AM Amit Levy <amit@amitlevy.com> wrote:
>
>> Hi all (Mike, Tom, Baris, Gilbert, Haoran),
>>
>> After submitting the DARPA proposal, Tom suggested we start thinking and
>> doing what we would either want to have done by the time funds arrive or
>> what we want to have done for an NSF large in 10 months or so.
>>
>> Haoran has already started looking at this problem, so would be great to
>> concentrate on some useful starting examples, and keep posted on any
>> progress.
>>
>> I suggested some libraries I'm particularly interested in:
>>
>> - TPM2, the reference implementation for the trusted platform module
>> specification.
>>
>> - OpenThread or LwIP -- both network stacks used as "userspace" libraries
>>
>> - LVGL GUI library
>>
>> - BoringSSL/aws-lc -- we have work verifying the safety of the assembly
>> against a Rust function parameter, the aws-lc people _want_ to have
>> a version that is entirely Rust and it's likely relatively
>> straightforward to port (e.g., there mostly aren't any complex data
>> structures or anything.
>>
>> Three of the four have pretty robust test suites.
>>
>> Tom helpfully pointed out that some of these are likely too high a bar
>> to start with, and that we really want to start with things that are
>> fairly simple and almost certainly do not have bugs in them.
>>
>> -Amit
>> _______________________________________________
>> Harvest mailing list -- harvest@lists.betterbytes.org
>> To unsubscribe send an email to harvest-leave@lists.betterbytes.org
>>
_______________________________________________
Harvest mailing list -- harvest@lists.betterbytes.org
To unsubscribe send an email to harvest-leave@lists.betterbytes.org
_______________________________________________
Harvest mailing list -- harvest@lists.betterbytes.org
To unsubscribe send an email to harvest-leave@lists.betterbytes.org