svadev AT lists.siebelschool.illinois.edu
Subject: Svadev mailing list
List archive
- From: John Criswell <criswell AT illinois.edu>
- To: leckie peng <pengjingjian1016 AT gmail.com>
- Cc: svadev AT cs.uiuc.edu
- Subject: Re: [svadev] Redundent Check Elimination
- Date: Wed, 26 Sep 2012 10:48:39 -0500
- List-archive: <http://lists.cs.uiuc.edu/pipermail/svadev/>
- List-id: <svadev.cs.uiuc.edu>
- Organization: University of Illinois
On 9/26/12 12:48 AM, leckie peng wrote:
2012/9/26 John Criswell <criswell AT illinois.edu>
On 9/25/12 5:26 AM, leckie peng wrote:
Hi, No, I haven't embarked on implementing it yet. As I had
said, I didn't have much experience with LLVM. Once get
familiar with it, I probably would try it out.
Have you read the design document that Ott used as the basis of his work (attached here for reference)? If it doesn't differ significantly from Ott's work, then it may be better to try out Ott's work on other memory safety techniques like SoftBound (which is included in the SAFECode source code) or to implement something like PNaCL with the instrumentation.
Redundant check
elimination is also complicated by multi-threading (some
other thread can free an object). I am inclined to make
SAFECode work properly even if the program has races, so
eliminating redundant checks shouldn't permit a memory
safety error if some other thread frees a memory object.
Yes, heap objects are difficult to analyse, which is
further complicated by multi-threading. I am pretty
conservative in calculating the kill sets right now. Here is
my semilattice design for an intra-procedural analysis.
The data flow values are those in 2^S, where S is the set
of all pointers in the function. Namely the data flow value is
a subset of S, the members of it represent pointers that are
"safe" at one specific program point. This is a forward
analysis and obviously the meet operator would be intersection
operation on OUTs of all parent basic blocks, the transfer
function of a basic block would be OUT = gen U ( IN - kill ).
The definitions of gen and kill sets are as follows.
gen = {p : p is downward-exposedly dereferenced}. That
means if p is dereferenced and not modified till the end of
the BB, it is guaranteed to be safe. Because if it's not safe
when dereferencing it, we would add a check to make it safe.
However it would be difficult to decide whether a pointer is
modified when precedure calls come into the picture.
A few comments: 1. Keep in mind that the LLVM variables are in SSA form, so they are never modified. Modification only makes sense if you're also analyzing memory objects (which are not in SSA form and can therefore be modified using the LLVM store instruction). 2. Instead of checking dereferences, I think you want to check that the pointer has already been checked. 3. In general, SAFECode inserts all checks first and then optimizations remove them. Therefore, you wouldn't do the analysis to see where a check is needed; you would do the analysis to determine which checks to remove. This enables us to use libLTO to run SAFECode optimizations unconditionally. 4. As a nitpick, you should follow separation of concerns: there should be one pass that does the analysis and another pass that removes the run-time checks. kill = {p : p is downward-exposedly modified}. If p is
modified and not dereferenced till the end of the BB, we may
kill it, since it is potentially unsafe. There can be
optimizations here, e.g., if p is assigned another safe
pointer or undeallocable memory to it, we don't have to kill
it then.
A call to free() or a call to a function that can free the pointer should cause the pointer to be placed into the kill set. A pointer arithmetic operation or other operation should also place the value in the kill set. For example, call loadcheck (%p); %q = gep %p, 0, 1, %x, %y The %p variable is safe because it has been checked; the value %q should be considered unsafe (because it has not been checked). This turns out to a classic bitvector data flow analysis,
which means it can be efficiently implemented with bitvector
and bitwise operation(in our case, the AND operator as meet
operator). Since it's distributive, we can use the classic
iterate-utill-convergent algorithm to solve it.
The only issue here is that SSA variables have only a single assignment; the bit-vector may get rather large. It may be better to do some sort of sparse analysis that takes advantage of the SSA variables.
if (condition) check1 else check2 fi check3 If both check1 and check2 make check3 redundant, your approach can remove check3. The algorithm Ott used cannot do this, although I'm not sure if the situation occurs very often in practice. I wonder if it can subsume cases caused by domination?
Yes, it should. The question is whether your approach would, in practice, find additional optimization opportunities that Ott's analysis does not find. In theory, it should. In practice, I'm not sure that it will unless it also handles values stored in memory or is inter-procedural. Second, your approach
may be easier to adapt into an inter-procedural analysis.
It's not clear to me whether the algorithm Ott used can be
modified to be an inter-procedural analysis.
Third, if you used alias analysis, you could probably handle values that are the result of an LLVM load instruction. I don't think Ott's algorithm could be extended to handle that; it's tuned for SSA values. All of this said, I am beginning to think that an analysis which determines whether a pointer can be freed would probably be a more important first step in improving SAFECode's performance. There are a number of optimizations (the fast check optimization, redundant check elimination, loop hoisting) that could benefit from knowing that the memory object won't be deallocated by the current thread or another thread. Yes, such an analysis would be very useful to those kinds
of optimizations. I haven't got a clue about it, do you have
any idea?
I haven't put much thought into it (since I don't have time to write it myself). However, it seems that a standard data-flow or SSA-based analysis could do it. The analysis should calculate, for every pointer, whether that pointer could have been freed since the last run-time load or store check. If you're interested in working on the freed pointer analysis, you should spend some time thinking about it. My thoughts on it have so far been cursory. In any event, if you'd like to write another redundant check elimination pass and see if it improves performance, we'd be happy to include it if it improves performance. I recommend emailing the list with a design so that we can provide feedback on it (in particular, I'd like to know about your semilattice design and how you calculate your safe/kill sets). BTW, will you be at the LLVM Developer's meeting? They're having a hacking lab, and it looks like Santosh and I will be there to do some SAFECode hacking during the lab. No, I see no chance of being there. Good luck to you guys
anyway. :)
Thanks! -- John T. |
Attachment:
proposal.pdf
Description: Adobe PDF document
- [svadev] Redundent Check Elimination, leckie peng, 09/25/2012
- Re: [svadev] Redundent Check Elimination, John Criswell, 09/25/2012
- Re: [svadev] Redundent Check Elimination, leckie peng, 09/26/2012
- Re: [svadev] Redundent Check Elimination, John Criswell, 09/26/2012
- Re: [svadev] Redundent Check Elimination, leckie peng, 09/26/2012
- Re: [svadev] Redundent Check Elimination, John Criswell, 09/25/2012
Archive powered by MHonArc 2.6.16.