Hi all (Mike, Tom, Baris, Gilbert, Haoran), After submitting the DARPA proposal, Tom suggested we start thinking and doing what we would either want to have done by the time funds arrive or what we want to have done for an NSF large in 10 months or so. Haoran has already started looking at this problem, so would be great to concentrate on some useful starting examples, and keep posted on any progress. I suggested some libraries I'm particularly interested in: - TPM2, the reference implementation for the trusted platform module specification. - OpenThread or LwIP -- both network stacks used as "userspace" libraries - LVGL GUI library - BoringSSL/aws-lc -- we have work verifying the safety of the assembly against a Rust function parameter, the aws-lc people _want_ to have a version that is entirely Rust and it's likely relatively straightforward to port (e.g., there mostly aren't any complex data structures or anything. Three of the four have pretty robust test suites. Tom helpfully pointed out that some of these are likely too high a bar to start with, and that we really want to start with things that are fairly simple and almost certainly do not have bugs in them. -Amit
Thank Amit for the recap. To amplify a couple things: Gilbert mentioned potentially recruiting some students to work on the project, which sounds great. To do that we need a message and what works for DARPA is not always what works for students. One way to put this is what's the most interesting part of this topic? I'd guess its lifting - inferring intent from code, in a way that's useful. Transpilation is one use case but not the only one. Can we extract a spec from code? I think that would be super useful. Second, my experience with using ML is that you *really* need to break the problem down into solvable sub-pieces. One can think of this as curricular learning - how do you get better at something? Start with something simple and add complexity as you get better at it. If the challenge is, let's try to infer intent, let's start with some code where that's relatively easy, and add complexity as we go. One reason I suggest openblas is (1) it uses convenience pointers (2) its functional (3) it has a spec for each function in comments. So it should be relatively simple (lol) to set as an initial subproblem - hoist openblas into a spec, then produce a verus program as output - rust code plus verus spec, that we use verus to prove correct so we don't need test generation. of course if you think the interesting part is elsewhere then maybe that's not a good initial step and if so lets discuss. I think driver transpilation is super useful, but for everyone I talk to about this, I get back that they wouldn't even know where to start. On Thu, Nov 7, 2024 at 10:53 AM Amit Levy <amit@amitlevy.com> wrote:
Hi all (Mike, Tom, Baris, Gilbert, Haoran),
After submitting the DARPA proposal, Tom suggested we start thinking and doing what we would either want to have done by the time funds arrive or what we want to have done for an NSF large in 10 months or so.
Haoran has already started looking at this problem, so would be great to concentrate on some useful starting examples, and keep posted on any progress.
I suggested some libraries I'm particularly interested in:
- TPM2, the reference implementation for the trusted platform module specification.
- OpenThread or LwIP -- both network stacks used as "userspace" libraries
- LVGL GUI library
- BoringSSL/aws-lc -- we have work verifying the safety of the assembly against a Rust function parameter, the aws-lc people _want_ to have a version that is entirely Rust and it's likely relatively straightforward to port (e.g., there mostly aren't any complex data structures or anything.
Three of the four have pretty robust test suites.
Tom helpfully pointed out that some of these are likely too high a bar to start with, and that we really want to start with things that are fairly simple and almost certainly do not have bugs in them.
-Amit _______________________________________________ Harvest mailing list -- harvest@lists.betterbytes.org To unsubscribe send an email to harvest-leave@lists.betterbytes.org
I generally agree with OpenBLAS---great in (nearly) all ways---except doesn't it have a bunch of Fortran? (29% according to GitHub) Perhaps it's simple enough to break out the Fortran so we're only concerned with independent C code, or maybe it's easy enough to translate Fortran, or maybe a very clean FFI that we're just conceding to be OK with for the purposes of a starter example. Tom Anderson <tom@cs.washington.edu> writes:
Thank Amit for the recap. To amplify a couple things:
Gilbert mentioned potentially recruiting some students to work on the project, which sounds great. To do that we need a message and what works for DARPA is not always what works for students. One way to put this is what's the most interesting part of this topic? I'd guess its lifting - inferring intent from code, in a way that's useful. Transpilation is one use case but not the only one. Can we extract a spec from code? I think that would be super useful.
Indeed, a good paper might be "just" be extracting a spec that could be used for, e.g., testing or verification of a port to a new language/platform/etc. I can see uses for this beyond automatically translating for Rust, including manually translating to Rust (or something else), evaluating different implementations of the same protocol (this is actually a real real problem with 6lowpan implementations that are just not compatible), etc.
Second, my experience with using ML is that you *really* need to break the problem down into solvable sub-pieces. One can think of this as curricular learning - how do you get better at something? Start with something simple and add complexity as you get better at it. If the challenge is, let's try to infer intent, let's start with some code where that's relatively easy, and add complexity as we go. One reason I suggest openblas is (1) it uses convenience pointers (2) its functional (3) it has a spec for each function in comments.
So it should be relatively simple (lol) to set as an initial subproblem - hoist openblas into a spec, then produce a verus program as output - rust code plus verus spec, that we use verus to prove correct so we don't need test generation. of course if you think the interesting part is elsewhere then maybe that's not a good initial step and if so lets discuss. I think driver transpilation is super useful, but for everyone I talk to about this, I get back that they wouldn't even know where to start.
On Thu, Nov 7, 2024 at 10:53 AM Amit Levy <amit@amitlevy.com> wrote:
Hi all (Mike, Tom, Baris, Gilbert, Haoran),
After submitting the DARPA proposal, Tom suggested we start thinking and doing what we would either want to have done by the time funds arrive or what we want to have done for an NSF large in 10 months or so.
Haoran has already started looking at this problem, so would be great to concentrate on some useful starting examples, and keep posted on any progress.
I suggested some libraries I'm particularly interested in:
- TPM2, the reference implementation for the trusted platform module specification.
- OpenThread or LwIP -- both network stacks used as "userspace" libraries
- LVGL GUI library
- BoringSSL/aws-lc -- we have work verifying the safety of the assembly against a Rust function parameter, the aws-lc people _want_ to have a version that is entirely Rust and it's likely relatively straightforward to port (e.g., there mostly aren't any complex data structures or anything.
Three of the four have pretty robust test suites.
Tom helpfully pointed out that some of these are likely too high a bar to start with, and that we really want to start with things that are fairly simple and almost certainly do not have bugs in them.
-Amit _______________________________________________ Harvest mailing list -- harvest@lists.betterbytes.org To unsubscribe send an email to harvest-leave@lists.betterbytes.org
Here are some specific tasks that could be good starter projects: - Test generation for C and Rust - A C test execution framework that runs under Valgrind (or similar) to detect memory violations - Create an AST that represents C, Rust, and higher-level concepts like collections - Analyze C code to identify low-level implementations of high-level concepts. One example is nodes with next pointers that are better represented as a lists. Another is resizing of arrays that is better represented as a resizeable vector. There are many other examples. - Lifetime analysis. This may be static or dynamic. It indicates when it is safe to deallocate memory. - Shape analysis. If a data structure is a tree or DAG rather than a (possibly cyclic) graph, and there are no external pointers into it, then reference counting can be used, rather than explicit deallocation. I have already pitched these to a potential undergraduate student; he will let me know next week what he wants to do. -Mike On Fri, Nov 8, 2024 at 11:13 AM Amit Levy <amit@amitlevy.com> wrote:
I generally agree with OpenBLAS---great in (nearly) all ways---except doesn't it have a bunch of Fortran? (29% according to GitHub)
Perhaps it's simple enough to break out the Fortran so we're only concerned with independent C code, or maybe it's easy enough to translate Fortran, or maybe a very clean FFI that we're just conceding to be OK with for the purposes of a starter example.
Tom Anderson <tom@cs.washington.edu> writes:
Thank Amit for the recap. To amplify a couple things:
Gilbert mentioned potentially recruiting some students to work on the project, which sounds great. To do that we need a message and what works for DARPA is not always what works for students. One way to put this is what's the most interesting part of this topic? I'd guess its lifting - inferring intent from code, in a way that's useful. Transpilation is one use case but not the only one. Can we extract a spec from code? I think that would be super useful.
Indeed, a good paper might be "just" be extracting a spec that could be used for, e.g., testing or verification of a port to a new language/platform/etc. I can see uses for this beyond automatically translating for Rust, including manually translating to Rust (or something else), evaluating different implementations of the same protocol (this is actually a real real problem with 6lowpan implementations that are just not compatible), etc.
Second, my experience with using ML is that you *really* need to break
problem down into solvable sub-pieces. One can think of this as curricular learning - how do you get better at something? Start with something simple and add complexity as you get better at it. If the challenge is, let's
to infer intent, let's start with some code where that's relatively easy, and add complexity as we go. One reason I suggest openblas is (1) it uses convenience pointers (2) its functional (3) it has a spec for each function in comments.
So it should be relatively simple (lol) to set as an initial subproblem - hoist openblas into a spec, then produce a verus program as output - rust code plus verus spec, that we use verus to prove correct so we don't need test generation. of course if you think the interesting part is elsewhere then maybe that's not a good initial step and if so lets discuss. I
driver transpilation is super useful, but for everyone I talk to about this, I get back that they wouldn't even know where to start.
On Thu, Nov 7, 2024 at 10:53 AM Amit Levy <amit@amitlevy.com> wrote:
Hi all (Mike, Tom, Baris, Gilbert, Haoran),
After submitting the DARPA proposal, Tom suggested we start thinking and doing what we would either want to have done by the time funds arrive or what we want to have done for an NSF large in 10 months or so.
Haoran has already started looking at this problem, so would be great to concentrate on some useful starting examples, and keep posted on any progress.
I suggested some libraries I'm particularly interested in:
- TPM2, the reference implementation for the trusted platform module specification.
- OpenThread or LwIP -- both network stacks used as "userspace"
the try think libraries
- LVGL GUI library
- BoringSSL/aws-lc -- we have work verifying the safety of the assembly against a Rust function parameter, the aws-lc people _want_ to have a version that is entirely Rust and it's likely relatively straightforward to port (e.g., there mostly aren't any complex data structures or anything.
Three of the four have pretty robust test suites.
Tom helpfully pointed out that some of these are likely too high a bar to start with, and that we really want to start with things that are fairly simple and almost certainly do not have bugs in them.
-Amit _______________________________________________ Harvest mailing list -- harvest@lists.betterbytes.org To unsubscribe send an email to harvest-leave@lists.betterbytes.org
Harvest mailing list -- harvest@lists.betterbytes.org To unsubscribe send an email to harvest-leave@lists.betterbytes.org
This all sounds great Mike! Thanks.
On Nov 8, 2024, at 11:29 AM, Michael Ernst <mernst@cs.washington.edu> wrote:
Here are some specific tasks that could be good starter projects: Test generation for C and Rust A C test execution framework that runs under Valgrind (or similar) to detect memory violations Create an AST that represents C, Rust, and higher-level concepts like collections Analyze C code to identify low-level implementations of high-level concepts. One example is nodes with next pointers that are better represented as a lists. Another is resizing of arrays that is better represented as a resizeable vector. There are many other examples. Lifetime analysis. This may be static or dynamic. It indicates when it is safe to deallocate memory. Shape analysis. If a data structure is a tree or DAG rather than a (possibly cyclic) graph, and there are no external pointers into it, then reference counting can be used, rather than explicit deallocation. I have already pitched these to a potential undergraduate student; he will let me know next week what he wants to do.
-Mike
On Fri, Nov 8, 2024 at 11:13 AM Amit Levy <amit@amitlevy.com <mailto:amit@amitlevy.com>> wrote:
I generally agree with OpenBLAS---great in (nearly) all ways---except doesn't it have a bunch of Fortran? (29% according to GitHub)
Perhaps it's simple enough to break out the Fortran so we're only concerned with independent C code, or maybe it's easy enough to translate Fortran, or maybe a very clean FFI that we're just conceding to be OK with for the purposes of a starter example.
Tom Anderson <tom@cs.washington.edu <mailto:tom@cs.washington.edu>> writes:
Thank Amit for the recap. To amplify a couple things:
Gilbert mentioned potentially recruiting some students to work on the project, which sounds great. To do that we need a message and what works for DARPA is not always what works for students. One way to put this is what's the most interesting part of this topic? I'd guess its lifting - inferring intent from code, in a way that's useful. Transpilation is one use case but not the only one. Can we extract a spec from code? I think that would be super useful.
Indeed, a good paper might be "just" be extracting a spec that could be used for, e.g., testing or verification of a port to a new language/platform/etc. I can see uses for this beyond automatically translating for Rust, including manually translating to Rust (or something else), evaluating different implementations of the same protocol (this is actually a real real problem with 6lowpan implementations that are just not compatible), etc.
Second, my experience with using ML is that you *really* need to break the problem down into solvable sub-pieces. One can think of this as curricular learning - how do you get better at something? Start with something simple and add complexity as you get better at it. If the challenge is, let's try to infer intent, let's start with some code where that's relatively easy, and add complexity as we go. One reason I suggest openblas is (1) it uses convenience pointers (2) its functional (3) it has a spec for each function in comments.
So it should be relatively simple (lol) to set as an initial subproblem - hoist openblas into a spec, then produce a verus program as output - rust code plus verus spec, that we use verus to prove correct so we don't need test generation. of course if you think the interesting part is elsewhere then maybe that's not a good initial step and if so lets discuss. I think driver transpilation is super useful, but for everyone I talk to about this, I get back that they wouldn't even know where to start.
On Thu, Nov 7, 2024 at 10:53 AM Amit Levy <amit@amitlevy.com <mailto:amit@amitlevy.com>> wrote:
Hi all (Mike, Tom, Baris, Gilbert, Haoran),
After submitting the DARPA proposal, Tom suggested we start thinking and doing what we would either want to have done by the time funds arrive or what we want to have done for an NSF large in 10 months or so.
Haoran has already started looking at this problem, so would be great to concentrate on some useful starting examples, and keep posted on any progress.
I suggested some libraries I'm particularly interested in:
- TPM2, the reference implementation for the trusted platform module specification.
- OpenThread or LwIP -- both network stacks used as "userspace" libraries
- LVGL GUI library
- BoringSSL/aws-lc -- we have work verifying the safety of the assembly against a Rust function parameter, the aws-lc people _want_ to have a version that is entirely Rust and it's likely relatively straightforward to port (e.g., there mostly aren't any complex data structures or anything.
Three of the four have pretty robust test suites.
Tom helpfully pointed out that some of these are likely too high a bar to start with, and that we really want to start with things that are fairly simple and almost certainly do not have bugs in them.
-Amit _______________________________________________ Harvest mailing list -- harvest@lists.betterbytes.org <mailto:harvest@lists.betterbytes.org> To unsubscribe send an email to harvest-leave@lists.betterbytes.org <mailto:harvest-leave@lists.betterbytes.org>
_______________________________________________ Harvest mailing list -- harvest@lists.betterbytes.org <mailto:harvest@lists.betterbytes.org> To unsubscribe send an email to harvest-leave@lists.betterbytes.org <mailto:harvest-leave@lists.betterbytes.org>
Harvest mailing list -- harvest@lists.betterbytes.org To unsubscribe send an email to harvest-leave@lists.betterbytes.org
Having spent too much time working on BLAS implementations recently, I would recommend against focusing on BLAS. Most any good implementation will make heavy use of meta-programming, whether in the form of the C macro system, or C++ templates. Halfway reasonable BLAS implementations will almost certainly have to be written in unsafe Rust due to a need to avoid the overhead of array-bounds checking. In short, while BLAS has a simple spec, the implementations are very far from simple, or idiomatic C programs. I suspect it would be counter-productive by leading us down a variety of side paths. In our meeting on Wednesday (Mike, Haoran, myself) we discussed the following targets: * ESA libmCS (on the DARPA timeline) https://gitlab.com/gtd-gmbh/libmcs * libyaml (on the DARPA timeline) https://github.com/yaml/libyaml note: we can probably also find a number of different JSON parsers written in C * GTK (important GUI toolkit) https://github.com/GNOME/gtk * Cairo (a 2d drawing library that GTK relies on) source code here: https://cairographics.org/download/
On Nov 8, 2024, at 11:13 AM, Amit Levy <amit@amitlevy.com> wrote:
I generally agree with OpenBLAS---great in (nearly) all ways---except doesn't it have a bunch of Fortran? (29% according to GitHub)
Perhaps it's simple enough to break out the Fortran so we're only concerned with independent C code, or maybe it's easy enough to translate Fortran, or maybe a very clean FFI that we're just conceding to be OK with for the purposes of a starter example.
Tom Anderson <tom@cs.washington.edu> writes:
Thank Amit for the recap. To amplify a couple things:
Gilbert mentioned potentially recruiting some students to work on the project, which sounds great. To do that we need a message and what works for DARPA is not always what works for students. One way to put this is what's the most interesting part of this topic? I'd guess its lifting - inferring intent from code, in a way that's useful. Transpilation is one use case but not the only one. Can we extract a spec from code? I think that would be super useful.
Indeed, a good paper might be "just" be extracting a spec that could be used for, e.g., testing or verification of a port to a new language/platform/etc. I can see uses for this beyond automatically translating for Rust, including manually translating to Rust (or something else), evaluating different implementations of the same protocol (this is actually a real real problem with 6lowpan implementations that are just not compatible), etc.
Second, my experience with using ML is that you *really* need to break the problem down into solvable sub-pieces. One can think of this as curricular learning - how do you get better at something? Start with something simple and add complexity as you get better at it. If the challenge is, let's try to infer intent, let's start with some code where that's relatively easy, and add complexity as we go. One reason I suggest openblas is (1) it uses convenience pointers (2) its functional (3) it has a spec for each function in comments.
So it should be relatively simple (lol) to set as an initial subproblem - hoist openblas into a spec, then produce a verus program as output - rust code plus verus spec, that we use verus to prove correct so we don't need test generation. of course if you think the interesting part is elsewhere then maybe that's not a good initial step and if so lets discuss. I think driver transpilation is super useful, but for everyone I talk to about this, I get back that they wouldn't even know where to start.
On Thu, Nov 7, 2024 at 10:53 AM Amit Levy <amit@amitlevy.com> wrote:
Hi all (Mike, Tom, Baris, Gilbert, Haoran),
After submitting the DARPA proposal, Tom suggested we start thinking and doing what we would either want to have done by the time funds arrive or what we want to have done for an NSF large in 10 months or so.
Haoran has already started looking at this problem, so would be great to concentrate on some useful starting examples, and keep posted on any progress.
I suggested some libraries I'm particularly interested in:
- TPM2, the reference implementation for the trusted platform module specification.
- OpenThread or LwIP -- both network stacks used as "userspace" libraries
- LVGL GUI library
- BoringSSL/aws-lc -- we have work verifying the safety of the assembly against a Rust function parameter, the aws-lc people _want_ to have a version that is entirely Rust and it's likely relatively straightforward to port (e.g., there mostly aren't any complex data structures or anything.
Three of the four have pretty robust test suites.
Tom helpfully pointed out that some of these are likely too high a bar to start with, and that we really want to start with things that are fairly simple and almost certainly do not have bugs in them.
-Amit _______________________________________________ Harvest mailing list -- harvest@lists.betterbytes.org To unsubscribe send an email to harvest-leave@lists.betterbytes.org
_______________________________________________ Harvest mailing list -- harvest@lists.betterbytes.org To unsubscribe send an email to harvest-leave@lists.betterbytes.org
participants (4)
-
Amit Levy
-
Gilbert Bernstein
-
Michael Ernst
-
Tom Anderson