svadev AT lists.siebelschool.illinois.edu

Subject: Svadev mailing list

List archive

Re: [svadev] safecode tests

From: geremy condra <debatem1 AT gmail.com>
To: John Criswell <criswell AT illinois.edu>
Cc: svadev AT cs.uiuc.edu
Subject: Re: [svadev] safecode tests
Date: Wed, 26 Oct 2011 14:39:58 -0700
List-archive: <http://lists.cs.uiuc.edu/pipermail/svadev>
List-id: <svadev.cs.uiuc.edu>

On Tue, Oct 25, 2011 at 2:57 PM, John Criswell
<criswell AT illinois.edu>
wrote:
> On 10/25/11 1:20 PM, geremy condra wrote:
>>
>> On Tue, Oct 25, 2011 at 8:58 AM, John
>> Criswell<criswell AT illinois.edu>
>> wrote:
>>>
>>> On 10/24/11 6:32 PM, Matthew Wala wrote:
>>>>
>>>> It looks like you've configured everything okay. As far as I know no
>>>> one's been running the tests in the mem_safety directory recently so
>>>> it's not surprising SAFECode isn't catching everything.
>>>>
>>>> There's at least two reasons I can think of that a lot of the double
>>>> frees are going unnoticed. Firstly just going through the runtime
>>>> library source code it seems that the function checkForBadFrees is
>>>> currently disabled in the debug runtime (I'm not sure why and I don't
>>>> know much about that part of the code, so something else might be
>>>> going on).
>>>
>>> That's correct. Looking over the revision history, I found out why I
>>> disabled it. Quoting from the commit log of Revision 136632:
>>>
>>> "Do not report errors for bad frees; we need complete vs. incomplete
>>> versions of pool_unregister() to report errors accurately."
>>>
>>> The short answer is that several enhancements need to be made to get it
>>> to
>>> work without generating false positives.
>>>
>>> For the curious, there are two types of checks in SAFECode: incomplete
>>> and
>>> complete. When SAFECode transforms a program one compilation unit at a
>>> time
>>> within Clang, it does not know everything about the program (because it
>>> can't do whole-program analysis), and so it inserts incomplete checks.
>>> Incomplete checks try their best to detect memory safety errors, but
>>> sometimes they conservatively allow operations to proceed if they can't
>>> find
>>> the memory object in question in the lookup tables. The assumption is
>>> that
>>> the memory object isn't registered because it was allocated by external
>>> code.
>>>
>>> SAFECode's version of libLTO (which is now available but not yet
>>> documented
>>> in the Install Guide) will do whole-program analysis and convert
>>> incomplete
>>> checks to complete checks when appropriate. It will use DSA to figure
>>> out
>>> which pointers always point to memory objects allocated within the
>>> program
>>> and which pointers can point to memory objects allocated by external
>>> library
>>> code. Checks on the former pointers will be changed to complete checks;
>>> these checks will raise a SAFECode run-time error if they can't find the
>>> memory object to which the pointer points.
>>>
>>> So, there's three issues here:
>>>
>>> 1) We need to have complete and incomplete versions of the checks for
>>> invalid frees. Real programs were flagging false positives because
>>> checkForBadFrees() always acted like a complete check.
>>>
>>> 2) We need to get libLTO polished and ready. Many of SAFECode's checks
>>> just
>>> aren't valuable without it.
>>>
>>> 3) We should modify SAFECode's libLTO to transform the program to use the
>>> automatic pool allocation memory allocator. This allocator is tolerant
>>> of
>>> invalid frees (i.e., it will detect them and ignore them).
>>
>> Very interesting, thanks for the information. Can you elaborate on
>> what other checks should or do need whole-program analysis?
>> Information on how to take advantage of the modifications you've made
>> to libLTO (apologies for my noobness) would also be very helpful.
>
> SAFECode performs three kinds of checks: checks on loads and stores, checks
> on GEPs (pointer arithmetic), and checks on indirect function calls. The
> first two checks require whole-program analysis in order to determine
> whether all memory objects to which the pointer points are allocated within
> and manipulated by the instrumented program. If they are, then SAFECode
> knows that any errors it finds are real errors. Otherwise, an error might
> be due to a pointer that was passed in from external code which SAFECode has
> not analyzed and instrumented.
>
> Indirect function call checks are pretty similar. Without the whole
> program, you can't compute an accurate and complete call-graph.
>
> Whole-program analysis is useful for optimizations, too. With automatic
> pool allocation and DSA's type inference capability, we can remove run-time
> checks on loads and stores (which doesn't break sound analysis; you can read
> about that in the PLDI 2006 paper). When we have the whole program, we can
> change checks that use splay-tree lookups into checks that do no lookup
> (this cannot be done when a check checks a pointer from a global variable
> defined in another compilation unit).
>
> To make SAFECode usable, we split it up into two components: a conservative
> set of passes in Clang that insert incomplete checks and a set of whole
> program transforms that modify/optimize the checks within libLTO. If you
> don't use libLTO, some of the checks become weaker (load/store checks and
> indirect function call checks do very little), but you can still catch quite
> a few errors (practically any buffer overflow error). Many or all of the
> bugs we caught in the Linux kernel (the SOSP 2007 paper) were caught with
> incomplete GEP checks.
>
> For publications on SAFECode and its techniques, you can take a look at our
> publications page at http://sva.cs.illinois.edu/pubs.html. You may also be
> interested in the Memory Safety Menagerie
> (http://sva.cs.illinois.edu/menagerie/) which catalogs various memory safety
> techniques.
>
> The SAFECode libLTO will probably work on most programs and is nearly ready
> to go. It's just too slow on one of our test cases (OpenSSH ssh client),
> and so I've been a little reluctant to suggest that people use it. You can
> find it in safecode/tools/LTO, and to install it, just follow the directions
> for regular libLTO for Linux (http://llvm.org/docs/GoldPlugin.html) or copy
> it into /usr/lib on Mac OS X (just be sure to backup the old
> /usr/lib/libLTO.dylib!).

Ok, I think I'm starting to understand a little better- thanks a ton
for taking the time to write all of this out. I'm going to play around
with libLTO and see if I can falsify that hypothesis before I come
back with too many more questions.

>>
>>>> Also a number of those tests use indirect function calls to
>>>> free(), which I think SAFECode doesn't handle....
>>>
>>> This is a good point. There was a transform pass called
>>> RaiseAllocationsPass that ensured that all calls to malloc() and free()
>>> were
>>> direct calls. I don't recall if this was an LLVM pass or Poolalloc pass,
>>> but we should get it working again with LLVM 3.0.
>>>
>>> Time to go file some bug reports...
>>
>> Is it possible to get links to these? I'd like to try to follow along
>> at home, if it isn't a problem.
>
> Sure! All SAFECode bugs are in the LLVM Bug Database. This particular bug
> is PR#11230 (http://llvm.org/bugs/show_bug.cgi?id=11230) and PR#11231
> (http://llvm.org/bugs/show_bug.cgi?id=11231).

Awesome, thank you. I'm a little surprised to see that there are only
17 bugs filed against safecode, but I'll try to get a sense of
development workflow from what I see.

Thanks again,
Geremy Condra

> -- John T.
>
>>
>> Thanks again!
>> Geremy Condra
>
>

[svadev] safecode tests, geremy condra, 10/24/2011
- Re: [svadev] safecode tests, Matthew Wala, 10/24/2011
  - Re: [svadev] safecode tests, John Criswell, 10/25/2011
    - Re: [svadev] safecode tests, geremy condra, 10/25/2011
      - Re: [svadev] safecode tests, John Criswell, 10/25/2011
        
        Re: [svadev] safecode tests, geremy condra, 10/26/2011