This Week's Progress:
*
Tried running c2rust on ESA libmCS
*
A math library that does not contain complicated data structures
*
This is the first milestone given by TRACTOR
*
Running configure generates a makefile, then running `bear -- make` generates a `compile_commands.json`, which is taken by c2rust to decide which files to transpile under what configurations
*
c2rust runs the C preprocessor before performing transpilation, which means that:
*
Macros that are used to accommodate different machine platforms are expanded, and the transpiled code would only work for the platform on which the preprocessor was run, and the configurations that are designate when running configure
*
Function-like macros are expanded, which makes the transpiled rust code less readable
*
Two possible research sub-topics related with macro expansion:
*
Configure-make-macro is a commonly used C project management paradigm, with the execution and expansion of which the code that targets a specific environment is generated, and the rest is pruned. Preprocessed code may contain platform-specific primitives, or even replace large chunks implementations containing multiple functions. For project-level C to Rust translation, it is important to keep this portability. What would be an idiomatic project management paradigm for Rust, and how do we migrate to that from a C project?
*
Without access to generics, C programmers often make use of macros to implement what is equivalent to a generic function. At preprocessing, such macros are copy-pasted to source code. This makes transpiled Rust code hard to read, and loses information about this meaningful structure. Ideally, they should be transpiled into Rust's generic functions. Here are some possible solutions:
*
Identify and promote these macros to C functions before preprocessing
*
Pros: transparent to the preprocessor, fits in C grammar so it does not need extra IR
*
Cons: Programmers have possibly chosen macro over function because it requires void * and extra params to implement a "generic" function in C, which is likely to occur in a lifted function. Later transpilation steps may need extra efforts to turn this void * type of functions into generic functions.
*
Identify these macros and tag them as generics
*
Pros: later transpliation steps can easily pick up this tag to minimize information loss and produce higher quality generic functions
*
Cons: requires extra IR above C to support this tag, requires extra preprocessor and C parser support
*
Expand them as-is and try to merge them later
*
Pros: need no effort on the preprocessor and C parser end
*
Cons: loses a lot of information, identifying sections that were the same macro before may be a harder task on the transpiler end (maybe leaving a heuristic-styled attribute tag can simplify this?)
*
Wrote a working set size statistics script for Jenga
*
Rohan proposed replacing "% of memory as fast tier" with "% of working set size as fast tier" for our re-submission
*
The script takes a trace, chops it down to equal-length intervals, counts how many pages have been accessed at least once (or pages that account for the most accesses) in this interval, and takes the max/average of all intervals as the working set size
*
The script takes several (may be controversial) parameters that may need further discussion
*
Granularity: should we use 30k or 2M as an interval length?
*
A shorter interval indicates that we assume the machine migrates pages fast enough to follow up the change of working set
*
Should we use max or average?
*
Should we cover all pages that have been accessed in an interval, or for avoiding the case where a lot of pages are only accessed several times, cover only the pages that get most accessed, ranking them down until it reaches e.g. 90% of all accesses in the interval?