svadev AT lists.siebelschool.illinois.edu
Subject: Svadev mailing list
List archive
- From: John Criswell <criswell AT illinois.edu>
- To: Daniel Huang <dehuang AT fas.harvard.edu>
- Cc: "<svadev AT cs.illinois.edu>" <svadev AT cs.illinois.edu>
- Subject: Re: [svadev] run-time check clarification
- Date: Thu, 31 Jan 2013 12:53:10 -0600
- List-archive: <http://lists.cs.uiuc.edu/pipermail/svadev/>
- List-id: <svadev.cs.uiuc.edu>
- Organization: University of Illinois
|
On 1/31/13 11:05 AM, Daniel Huang
wrote:
Hi Svadev,
I don't understand how to read some of the code
generated by the SAFEcode compiler. The below is an excerpt of
emitted code.
Let's back up a minute. There are two types of run-time checks on memory objects: 1) poolcheck, which really should be called lscheck (i.e., Load/Store Check) 2) boundscheck, which really should be called gepcheck In the current SAFECode/SVA design, a poolcheck() determines whether a pointer is located in an allocated memory object belonging to the specified pool. It is placed before load and store instructions (as well as certain atomic operations) to ensure that a potentially type-unsafe pointer is pointing into a memory object of the correct pool. A boundscheck() determines whether a pointer arithmetic operation stays within the referent memory object. It takes a pool and two pointers: the base pointer of a GEP and the result of a GEP. A boundscheck() will first determine if the base pointer is pointing into an allocated memory object belonging to the pool. If it does, it will then verify that the result pointer is somewhere within the same memory object. In a nutshell, poolchecks verify that loads and stores are safe, and boundschecks ensure that pointers don't go jumping from one memory object to another. Type-safe load/store elimination can remove poolchecks on type-safe pointers as long as automatic pool allocation is used. Type-safe accesses never cause memory errors unless the pointer is a dangling pointer, and the pool allocation ensures that a dangling pointer dereference on a type-safe pointer won't cause a type violation. struct.foo = { i32; float }
PDa: struct.foo
1 %call = call i8* @poolalloc([92 x i8*]* %PDa,
8)
2 %x1 = bitcast i8* %call to i32*
3 %PDa5 = bitcast [92 x i8*]* %PDa to i8*
4 call void @poolcheckui_debug(i8* %PDa5, i8*
%call, ...)
5 store i32 42, i32* %x1
As for this wonderful bit of code, I think some of the confusion you're having is due to LLVM optimizations, and some of it is due to implementation details of SAFECode. If we remove the SAFECode checks, we get the following: 1 %call = call i8* @poolalloc([92 x i8*]* %PDa, 8)
2 %x1 = bitcast i8* %call to i32*
5 store i32 42, i32* %x1
This is probably what the code looked like after optimization but before SAFECode instrumented it. My best guess is that the code originally had a GEP %call, 0, 0 like you would expect but that an optimizer removed it. Therefore, SAFECode sees no GEP; it only sees a store, and it places a poolcheck() on the store (this check could be optimized away, but it doesn't appear to be removed at present). Looking at the SAFECode instrumentation: 3 %PDa5 = bitcast [92 x i8*]* %PDa to i8*
4 call void @poolcheckui_debug(i8* %PDa5, i8*
%call, ...)
The poolcheck() functions are polymorphic in that they will check a pointer of any type. However, we're in C, so polymorphism is handled by casting everything to void *. LLVM uses i8* for void *. That's why the checked pointer is cast to i8* before it is passed to poolcheck(). Additionally, to free the compiler from having to know the exact type of pool handles in the run-time, pool handles are passed as void * as well. That's why %PDa is casted. We could have made the compiler understand the type of a pool handle, but then if we change the pool handle structure in the run-time, we have to update the compiler. In line 1, we call poolalloc for 8 bytes of memory
in pool PDa. Thus, I deduce that pool PDa holds objects of
type struct.foo. In line 2, we cast the result of the
allocation to an i32*. At this point, I'm confused. Why is
this a bitcast instead of a getelementptr 0, 0, the first 0
saying we want the first memory object, and the second saying
the first element of the struct? I agree that per-LLVM
semantics, this is "safe", but wouldn't we do a getelementptr
if I was accessing the second element of the struct instead of
the first? In other words, why is SAFEcode doing some
special-casing if we are accessing the first element of a
struct? Line 4 is also confusing to me. We call poolcheck, but
I would have expected the run-time boundscheck. I understand
poolcheck(pool, ptr) to mean, check that ptr points into pool
at the correct alignment for the type stored in pool (in this
case struct.foo). I understand boundscheck(pool, ptr, ptrO) to
mean, check that ptr points into pool at the correct alignment
for the type stored in pool. Moreover, check that ptrO is at
the correct offset, pointing to the correct type if we offset
the appropriate amount from the type stored in the pool. As an
aside, don't all pool descriptors have type [92 x i8*]*, so
why don't all SAFEcode library functions accept that type
instead of an i8* in an argument position expecting a pool?
I expected the code to look like this (I've just
shown the delta):
2 %x1 = getelementptr %call i64 0, i32 0
2.5 %x1.casted = bitcast i32* %x1 to i8*
4 call void @boundscheckui_debug(i8* %PDa5, i8*
%call, i8* %x1.casted, ....)
Given my understanding of the run-time checks, I
would say that the emitted code will execute "safely". The
only benefit of emitting the top code instead of the bottom
one is that the top may be slightly faster, due to boundscheck
being more expensive than a poolcheck. The bitcast vs.
getelementptr should reduce to the same thing during code
generation.
If the explanation of the top code seems
reasonable, I'll update my formalism of poolcheck to
implicitly handle sub-typing. However, I would argue that the
bitcast in line 2 really should be a getelementptr, since I
expect bitcast mostly to be used because LLVM does not have
parametric polymorphism, not to index into an aggregate data
structure.
Now that I think about it, your formalism will need to have the same definition of "type-safe" as DSA does in order to work on the real implementation. I believe DSA will consider this code type safe because it accesses every offset with the same type (i.e., offset 0 is treated as an int). In case you weren't aware, DSA more or less ignores the allocation type of a memory object. DSA infers types using GEPs (to find offsets) and loads and stores (to infer the types at those offsets). Its type results provide a "flat" layout of a data structure; the type is a set of <offset,type> pairs in which the same offset can appear more than once. In this case, DSA will report that the memory object has one field at offset 0 that is an i32. It won't report any type at offset 4 because the second float field is never used in a load or store. Does this answer your questions, or have I just confused you more? :) -- John T. Any clarification would be appreciated. This is
really important if I want as "complete" a verified
type-checker as possible.
Regards,
Dan
_______________________________________________ svadev mailing list svadev AT cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/svadev |
- [svadev] run-time check clarification, Daniel Huang, 01/31/2013
- Re: [svadev] run-time check clarification, John Criswell, 01/31/2013
- Re: [svadev] run-time check clarification, Daniel Huang, 01/31/2013
- Re: [svadev] run-time check clarification, John Criswell, 01/31/2013
Archive powered by MHonArc 2.6.16.