Backtracking intrusions Notes by Nickolai Zeldovich for 6.983 ======================= what's the problem this paper is trying to address? machine compromises unavoidable admins want to figure out what went wrong and recover how does backtracker help? help admin look at dependencies between processes, files on their machine prototypical workflow: admin finds something isn't right (corrupted file, unexpected process) "detection point" uses tool to figure out where that file/process came from must pick some time window, based on guess + available logs looks at the information to guess how the attacker got in "entry point" admin figures out what service was vulnerable? might need a replay facility to determine something more specific e.g. look at network packet logs to find initial attack packets? how would the admin detect that something isn't right? tripwire points to a modified file (e.g. athena dialups do this) network analysis tools point to a process sending unexpected traffic false positives a big problem is this useful? what would you do once you've found the entry point? maybe prevent other similar compromises or audit other machines assess the potential damage the attacker could have done? does not directly help with recovery although perhaps can track forward, tell what files were affected? do we need this tool? what would you do without it? look at disk state, system logs, network traffic logs disk state might not have enough history to figure out what happened system logs might only include initial network requests system logs might also be deleted or falsified network traffic logs might be encrypted (SSL, SSH) what does backtracker log? objects: processes, files, file names why track files and file names separately? events: file read/write, rename/create/unlink, process fork/exec/kill/debug how does it identify the objects? pid + version# device + inode + version# pathname [why version#?] what counts as a dependency? process->process: fork, exec, signals, debug, .. process->file: write, utime process->filename: create, unlink, rename, .. file->process: read, stat filename->process: open, readdir, .. affecting vs controlling "high-control" vs "low-control" events what about missing files? attacker might remove some config files that enable protection measures is it enough to track file names? can rename/link/... files in a temporary directory then somehow trick another process into renaming that dir files in new dir don't seem to have come from the attacker maybe a more accurate model: file names in each dir, named by inode actually implemented: processes and files process fork/exec, file read/write/mmap, network recv how does the VM-based backtracker avoid any guest OS changes? things the eventlogger needs to know: notification of events [intercept system calls] what object is being accessed [track of version#s, peek inside fd/inode] what process is running [look at the kernel memory to find current PID] why the limitations with memory-mapped files? expensive to intercept every memory access hard to figure out when a memory-mapped file has been unmapped entirely so assume that a process will keep accessing memory-mapped files until exit why a VM convenience: few guest changes isolation: compromises contained hopefully within VM handy for honeypot setup security: logging in host compatible with ReVirt what is ReVirt? allows to go back in time and then replay logging mode: make checkpoint log all inputs to virtual machine (interrupts, packets, keyboard, etc.) recovery mode: go to last checkpoint deliver all inputs at the same instruction again challenge: determinism time rdtsc() etc. handy for BackTracker because one can refine what OS events to log for backtracking how does all this information help the admin track back the intrusion? backtracker presents a dependency graph how to make the dependency graph manageable? only include objects, events related to the detection point use time to avoid dependencies on events that happened later ignore well-known objects that many procs read/write (utmp, mtab, .history) filter out low-control events hide read-only files seems like a special-case of a more general principle? assumption is that attacker came from the network (socket) ignore any graph branches that had no socket deps within time window collapse multiple nodes into one look for clusters of nodes that only interact with each other common case: shell executes commands that don't read/write anything several detection points not evaluated at all? sounds potentially promising.. how could an attacker elude backtracker? use "low-control" events use events that backtracker does not monitor indirect via the network (e.g. after stealing some passwords) admin can track back to second login via network can admin find the point where the password was stolen? backtracker mostly focused on corrupted files, not leaked files compromise the OS kernel can be hard to prevent once attacker gets root compromise the event logger could be relatively hard to do with a VM-based design intertwine attack actions with other normal events write attack code to /var/log/utmp, then execute it? read lots of files modified by others before doing the attack hides true cause among irrelevant/benign dependencies prolong intrusion, since it's hard to track back over long time periods wait for a system update that replaces various system binaries? wait for /etc/passwd to be legitimately changed? does backtracker live up to its goals? easy to use? can't really tell: need to stare at graph, change filters, make guesses some evaluation: table 1 why does dependency graph have many fewer objects/events than log? can you tell what the attack was from figures 6, 7, 8? they were able to track back real intrusions on their honeypot reliable/secure? might be OK for simple compromises determined attacker can likely bypass backtracker practical? lots of things already run in VM could imagine amazon's EC2 or other VM-hosting services providing this specific to a particular OS version would need to have monitoring hooks for every OS kernel version or modify kernel to export the right data (like paravirtualization) not clear if kernel compromises are really that hard 9% CPU overhead, 1.2GB/day storage other interesting things you could do in a VM? auditing kernel data structure integrity replay for intrusion analysis overshadow do you really need a VM? probably enough to have some form of append-only log once kernel is compromised, will not get reliable events perhaps send events over network to a logging server could use TPM with late-launch to sign events, but attacker could remove log (but at least the log, if not removed, can be authenticated)