svadev AT lists.siebelschool.illinois.edu

Subject: Svadev mailing list

List archive

Re: [svadev] Redundent Check Elimination

From: leckie peng <pengjingjian1016 AT gmail.com>
To: John Criswell <criswell AT illinois.edu>
Cc: svadev AT cs.uiuc.edu
Subject: Re: [svadev] Redundent Check Elimination
Date: Wed, 26 Sep 2012 13:48:47 +0800
List-archive: <http://lists.cs.uiuc.edu/pipermail/svadev/>
List-id: <svadev.cs.uiuc.edu>

2012/9/26 John Criswell <criswell AT illinois.edu>

On 9/25/12 5:26 AM, leckie peng wrote:

Hi,

I am a MSc candidate and very interested in the open project that creates common infrastructure for memory safety tools. To finish my degree I've been working on it for a while and read some of the literature.

Interesting. Does your approach do anything differently from the approach that Ott implemented, and if so, what are the implications? Have you tested your approach on various memory safety enforcing techniques (like ASan, SoftBound, SAFECode, etc.)? It'd be nice to get a second opinion on the design.

No, I haven't embarked on implementing it yet. As I had said, I didn't have much experience with LLVM. Once get familiar with it, I probably would try it out.

As for eliminating redundant checks, which is among the goals of this project, I was considering if we can employ data flow analysis to transfer the "safety" of the memory address along CFG. Namely, if one memory access has been checked, it's possible that accesses to the same location down the execution paths don't need to be checked if the "safety" of the location can be transferred there. This is a simple idea and can be implemented with classic bit vector data flow analysis.

Yes, you could implement redundant check elimination that way. However, you need to be careful to calculate the kill sets correctly. You need to make sure that the memory object to which the pointer points is not deallocated on any path between the two checks. This is why Ott's optimization restricts itself to eliminating the fast version of the checks (the fast version is only used on globals, stack objects, and sometimes heap objects that are proven not to be deallocated).

Redundant check elimination is also complicated by multi-threading (some other thread can free an object). I am inclined to make SAFECode work properly even if the program has races, so eliminating redundant checks shouldn't permit a memory safety error if some other thread frees a memory object.

Yes, heap objects are difficult to analyse, which is further complicated by multi-threading. I am pretty conservative in calculating the kill sets right now. Here is my semilattice design for an intra-procedural analysis.

The data flow values are those in 2^S, where S is the set of all pointers in the function. Namely the data flow value is a subset of S, the members of it represent pointers that are "safe" at one specific program point. This is a forward analysis and obviously the meet operator would be intersection operation on OUTs of all parent basic blocks, the transfer function of a basic block would be OUT = gen U ( IN - kill ). The definitions of gen and kill sets are as follows.

gen = {p : p is downward-exposedly dereferenced}. That means if p is dereferenced and not modified till the end of the BB, it is guaranteed to be safe. Because if it's not safe when dereferencing it, we would add a check to make it safe. However it would be difficult to decide whether a pointer is modified when precedure calls come into the picture.

kill = {p : p is downward-exposedly modified}. If p is modified and not dereferenced till the end of the BB, we may kill it, since it is potentially unsafe. There can be optimizations here, e.g., if p is assigned another safe pointer or undeallocable memory to it, we don't have to kill it then.

This turns out to a classic bitvector data flow analysis, which means it can be efficiently implemented with bitvector and bitwise operation(in our case, the AND operator as meet operator). Since it's distributive, we can use the classic iterate-utill-convergent algorithm to solve it.

It seems to me that the optimizations implemented in ASan and those by Ott Tinn didn't use this approach, although they can eliminate a large part of redundant checks by now, by using much analysis. This data flow approach, however, seems to me more elegant and straightforward. I don't have much experience with Compiler Techniques, it would be great to have your opinions.

I think there are three advantages of your approach. First, it can remove redundant checks in which the redundancy isn't caused by domination. For example, the code:

if (condition)
check1
else
check2
fi

check3

If both check1 and check2 make check3 redundant, your approach can remove check3. The algorithm Ott used cannot do this, although I'm not sure if the situation occurs very often in practice.

I wonder if it can subsume cases caused by domination?

Second, your approach may be easier to adapt into an inter-procedural analysis. It's not clear to me whether the algorithm Ott used can be modified to be an inter-procedural analysis.

Third, if you used alias analysis, you could probably handle values that are the result of an LLVM load instruction. I don't think Ott's algorithm could be extended to handle that; it's tuned for SSA values.

All of this said, I am beginning to think that an analysis which determines whether a pointer can be freed would probably be a more important first step in improving SAFECode's performance. There are a number of optimizations (the fast check optimization, redundant check elimination, loop hoisting) that could benefit from knowing that the memory object won't be deallocated by the current thread or another thread.

Yes, such an analysis would be very useful to those kinds of optimizations. I haven't got a clue about it, do you have any idea?

In any event, if you'd like to write another redundant check elimination pass and see if it improves performance, we'd be happy to include it if it improves performance. I recommend emailing the list with a design so that we can provide feedback on it (in particular, I'd like to know about your semilattice design and how you calculate your safe/kill sets).

BTW, will you be at the LLVM Developer's meeting? They're having a hacking lab, and it looks like Santosh and I will be there to do some SAFECode hacking during the lab.

No, I see no chance of being there. Good luck to you guys anyway. :)

-- John T.

Best regards.

--Jian Peng

_______________________________________________
svadev mailing list
svadev AT cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/svadev

--
------------------------------------------------
Jian Peng
Key Laboratory of Service Computing Technology and System, Ministry of Education
Key Laboratory of Cluster and Grid Computing, Province of Hubei
Huazhong University of Science and Technology
Wuhan, 430074, China
Email:pengjingjian1016 AT gmail.com
------------------------------------------------

[svadev] Redundent Check Elimination, leckie peng, 09/25/2012
- Re: [svadev] Redundent Check Elimination, John Criswell, 09/25/2012
  - Re: [svadev] Redundent Check Elimination, leckie peng, 09/26/2012
    - Re: [svadev] Redundent Check Elimination, John Criswell, 09/26/2012