XFI === Administrivia. Lab 2 will be out soon (Thursday or Friday). We've updated the list of topics to be closer to reality; papers coming soon. What's the goal of this paper? Allow an application to safely use binary code modules in its address space. Safety defined by properties P1..P7 in the paper. How does it differ from buffer overflow protection techniques (last lecture)? Module code: mostly-trusted vs untrusted. What are the safety goals that XFI strives to provide? Module should not corrupt system state (e.g. stack pointer, eflags, ...). Module should not corrupt arbitrary memory. Module should not read arbitrary memory (sometimes). Module should not invoke arbitrary functions, or system calls. What should the module be allowed to do under XFI? Module should be able to call system support functions (perhaps XFI variants of malloc, thread_create, file_open, ...) System should be able to invoke XFI module and get results out. Called "entry points" in the paper. Module should be able to read, write some memory, use a stack, etc. Where would we want something like this? Third-party plugins in any application, which might not be too trustworthy. Device drivers in an OS kernel. Media codecs in a media player application. Browser plug-ins (Flash, Acrobat, ..). Hardening existing applications from attacks. As you've seen in lab 1, easy to write code that has bugs. Could use XFI to treat application's own code as a module. Ensures one module (regardless of who wrote it) cannot corrupt others. Running untrusted programs you might download from the Internet? Perhaps not as applicable: no real need to share address spaces. Could run the untrusted program as a separate process. Page tables provide protection. Bigger problem: preventing process from touching your files, network? (A problem for later lectures.) Why do we need something like XFI to do this? Can we use hardware protection mechanisms instead? Might be too expensive, in terms of performance overheads. Hardware protection (page tables, user-vs-kernel mode) slow to switch. Might not be available, inside OS kernel (kernel code not confined). Can we use a high-level language? If your application is written in Java, don't really need XFI. Javascript is an example of where this works (running in the browser). Doesn't work for legacy code (vs XFI which can rewrite binaries!). Doesn't work in kernel (at least if kernel not in high-level lang). Exception: restricted languages actually work well for some purposes. For example, BPF (Berkeley packet filter) language, used by tcpdump. So, ideal use case: kernel, legacy code, lots of transitions. How does XFI work at a high level? All in software; not going to use any hardware protection (see above). XFI performs static analysis on code to ensure that it behaves properly. Memory accesses are to its own data. Calls are to legit module functions or allowed host system functions. No privileged instructions (syscall, change page tables, halt, ..). For code where analysis can't guarantee safety, insert a software check. What are the situations where this is needed? Computed (indirect) jumps, computed memory accesses. Checking computed jumps: CFI (control flow integrity). Why do we need to check computed jumps? 1. We had a goal -- module should not call arbitrary system code. Does it suffice to check that module only jumps within its code? 2. Need to ensure code doesn't jump into "middle" of instruction. x86: 25 CD 80 00 00 means 'AND %eax, $0x80cd'. If you jump to 'CD 80', it means 'int $0x80', invoke Linux syscall. 3. Need to ensure code doesn't jump past our inserted checks. Need to verify that computed jumps (and static jumps -- easy) are "OK". Always goes to the start of an instruction. Never goes between a check and the subsequent code. Thus, can disassemble code linearly and be sure you saw all the code. Infact, XFI is more restrictive: only a few places can be jumped to. Compute a call graph of all possible indirect jumps ahead of time. For the entire module, figure out what functions it might call indirectly. How to do this? Conservative: any function whose addr was taken. How does XFI check that this call graph is followed at runtime? Pick some constant ID 'C' to represent "computed jumps". Put the constant at computed jump targets. Check the constant before doing a computed jump. Cute trick: encode constant in "prefetchnta" (almost nop) instruction. Why? Simpler checks, no possibility of interpreting as some other opcode. Naive impl: before: caller: call *ebx after: caller: mov mem[ebx-4], eax if eax != C: goto cfierr call *ebx ... C target: ... More precisely, see Figure 2 in the paper. Pitfalls that XFI has to avoid with this constant ID? Can you jump to the check (if eax != C), since that contains C? Check cannot contain the constant itself (so, store C-1 and add 1). What if C accidentally occurs in an instruction? Verifier rejects. What if module writes C to its data or stack? Non-executable memory. What if another module (or host) contains C? Check target is module code. What if module tries to modify its code? Code is read-only. Does this ensure module obeys CFG? Not quite, but doesn't matter for XFI's guarantees. Other computed jumps: what happens with a return? Separate stack keeps return value (and other "address-less" variables). No possibility of corruption, meaning no need to check CFI label. More on this later. Checking computed memory access: general form of SFI (software fault isolation). Inline checks: Figure 3 in the paper. Assumes CFI (cannot bypass check). Fastpath vs. slowpath memory: optimization, if we can guess likely target. What's on the slow path? Checking an array of other allowed regions. Why does XFI require two stacks? Need to ensure return addresses, stack pointers, etc is not corrupted. So why do we need a second stack? Applications use computed accesses for stack variables. XFI cannot efficiently protect individual elements on a single stack. (Would need lots of memory region entries in the slow path table.) Easier to put all the pieces XFI doesn't care about on another stack. In particular, any stack object whose address is computed. On x86, XFI uses the %ebp register for this second stack. Does XFI prevent buffer overflows? No, buffer overflows still exist, but cannot corrupt return addr. Could corrupt a function pointer (heap or allocation stack). Could then jump to another legitimate indirect call target. Doesn't matter from XFI's perspective: module vs. rest of system. How does XFI guarantee that module can't tamper with scoped stack? Verifier checks for no direct memory accesses to scoped stack. Runtime checks prohibit computed accesses to scoped stack. How does the verifier work? Break up module code into "basic blocks" -- no jumps to or from the middle. Set of "verification states" before and after every instruction. Verifier checks that, assuming the "before" holds, "after" will hold too. Example: Figure 4 in the paper. { origSSP=SSP+8 } { valid[SSP, SSP+8) } { retaddr=mem[SSP+4] } { origASP=mem[SSP] } { valid[ASP-32, ASP) } 0: mrguard(eax, 0, 8) { valid[EAX, EAX+8) } 1: edx := mem[eax] 2: mem[eax+4] := edx 3: eax := mem[asp-4] remove { valid[EAX, EAX+8) } 4: pop asp ## asp := mem[ssp]; ssp := ssp+4 5: ret ## ssp := ssp+4; jmp mem[ssp-4] Using verification states, ensure memory accesses are safe, CFI checks, .. How does the verifier trust instruction sequences like mrguard or CFI? These are hard-coded into the verifier as "known-good". How do these verification states get put together? States after I1 and before I2 have to match, if I2 follows I1. States at call instruction must match states at start of function. States at computed jump/call must match states at every target. Intuitively: check that these states "fit together" like jigsaw puzzle. What happens if, say, instruction 4 was replaced with 'JMP 1'? What are the verification states at function call boundaries (call, return)? { origSSP = SSP + 0 } { origASP = ASP + 0 } { retaddr = Mem[SSP] } { valid[SSP, SSP+4) } Guarantees that, if function returns, retaddr, SSP, and ASP are OK. What happens if either stack grows too big? Could start overlapping other memory (recent X vulnerabilty!) XFI: verification states include validity of stack: e.g., valid[SSP, SSP+n) To grow ASP, suffices to call mrguard (+ update verification state). To grow SSP, need some special instruction. See Figure 5. What happens if module function call doesn't return? XFI doesn't guarantee liveness, as described in the paper. Application would need some plan for what to do. Could we use verification states to guarantee liveness, at least sometimes? What would be the new boundary conditions for module entry/exit? What would have to be the verification state for computed jumps? What happens if a module function crashes (e.g. fails some XFI software check)? Presumably invoke some error handling routine. Not clear what the caller is going to do; probably need a custom plan. XFI itself doesn't know how to make module return something sensible. Possibly use Windows exceptions? What runtime components does XFI require? Need the slow path for mrguard(). Loader: invokes verifier, patch up fast-path mrguard checks, patch up calls to allowed host functions (some external policy). Need wrappers for entry/exit out of the module. Entry wrapper: allocate ASP, add permissions to arguments. Exit wrapper: needed because external code doesn't have CFI label, and cannot be statically checked by verifier. What are the key performance penalties that XFI imposes? Cache pressure. Extra instructions for inline checks (computed memory refs and calls). Also, slow path for above checks. Ideally, make most memory accesses to fastpath. Read protection is quite expensive (but, authors argue, not always needed). How might you compromise a system that uses XFI? (I.e., what's trusted?) Bugs in host system runtime: make unexpected calls, unexpected return val. Potentially buffer overflows in host system functions exposed to XFI module. Integer overflow in reloc records for fast-path check (A+L, B-H in Fig. 3)? The paper says mrguard is careful about integer overflows. Try to find a bug in the verifier. Exploit DMA attacks (in a device driver). Where would XFI not work so well? JITted code. Lots of accesses to shared data structures. Other interesting tid-bits: Module authentication.