Skip to Content
CSE4303Introduction to Computer Security (Lecture 18)

CSE4303 Introduction to Computer Security (Lecture 18)

Due to lack of my attention, this lecture note is generated by AI to create continuations of the previous lecture note. I kept this warning because the note was created by AI.

Software security

Overview

Outline

  • Context
  • Prominent software vulnerabilities and exploits
  • Buffer overflows
    • Background: C code, compilation, memory layout, execution
    • Baseline exploit
    • Challenges
    • Defenses, countermeasures, counter-countermeasures

Buffer overflows

All programs are stored in memory

  • The process’s view of memory is that it owns all of it.
  • For a 32-bit process, the virtual address space runs from:
    • 0x00000000
    • to 0xffffffff
  • In reality, these are virtual addresses.
    • The OS and CPU map them to physical addresses.

The instructions themselves are in memory

  • Program text is also stored in memory.
  • The slide shows instructions such as:
0x4c2 sub $0x224,%esp 0x4c1 push %ecx 0x4bf mov %esp,%ebp 0x4be push %ebp
  • Important point:
    • code and data are both memory-resident
    • control flow therefore depends on values stored in memory

Data’s location depends on how it’s created

  • Static initialized data example
static const int y = 10;
  • Static uninitialized data example
static int x;
  • Command-line arguments and environment are set when the process starts.
  • Stack data appears when functions run.
int f() { int x; ... }
  • Heap data appears at runtime.
malloc(sizeof(long));
  • Summary from the slide
    • Known at compile time
      • text
      • initialized data
      • uninitialized data
    • Set when process starts
      • command line and environment
    • Runtime
      • stack
      • heap

We are going to focus on runtime attacks

  • Stack and heap grow in opposite directions.
  • Compiler-generated instructions adjust the stack size at runtime.
  • The stack pointer tracks the active top of the stack.
  • Repeated push instructions place values onto the stack.
  • The slides use the sequence:
    • push 1
    • push 2
    • push 3
    • return
  • Heap allocation is apportioned by the OS and managed in-process by malloc.
  • The lecture says: focusing on the stack for now.
0x00000000 0xffffffff Heap ---------------------------------> <--------------------------------- Stack

Stack layout when calling functions

Questions asked on the slide:

  • What do we do when we call a function?
    • What data need to be stored?
    • Where do they go?
  • How do we return from a function?
    • What data need to be restored?
    • Where do they come from?

Example used in the slide:

void func(char *arg1, int arg2, int arg3) { char loc1[4]; int loc2; int loc3; }

Important layout points:

  • Arguments are pushed in reverse order of code.
  • Local variables are pushed in the same order as they appear in the code.
  • The slide then introduces two unknown slots between locals and arguments.

Accessing variables

Example:

void func(char *arg1, int arg2, int arg3) { char loc1[4]; int loc2; int loc3; ... loc2++; ... }

Question from the slide:

  • Where is loc2?

Step-by-step answer developed in the slides:

  • Its absolute address is undecidable at compile time.
  • We do not know exactly where loc2 is in absolute memory.
  • We do not know how many arguments there are in general.
  • But loc2 is always a fixed offset before the frame metadata.
  • This motivates the frame pointer.

Definitions from the slide:

  • Stack frame
    • the current function call’s region on the stack
  • Frame pointer
    • %ebp
  • Example answer
    • loc2 is at -8(%ebp)

Notation

  • %ebp
    • a memory address stored in the frame-pointer register
  • (%ebp)
    • the value at memory address %ebp
    • like dereferencing a pointer

The slide sequence then shows:

pushl %ebp movl %esp, %ebp
  • Meaning:
    • first save the old frame pointer on the stack
    • then set the new frame pointer to the current stack pointer

Returning from functions

Example caller:

int main() { ... func("Hey", 10, -3); ... }

Questions from the slides:

  • How do we restore %ebp?
  • How do we resume execution at the correct place?

Slide answers:

  • Push %ebp before locals.
  • Set %ebp to current %esp.
  • Set %ebp to (%ebp) at return.
  • Push next %eip before call.
  • Set %eip to 4(%ebp) at return.

Stack and functions: Summary

  • Calling function
    • push arguments onto the stack in reverse order
    • push the return address
      • the address of the instruction that should run after control returns
    • jump to the function’s address
  • Called function
    • push old frame pointer %ebp onto the stack
    • set frame pointer %ebp to current %esp
    • push local variables onto the stack
    • access locals as offsets from %ebp
  • Returning function
    • reset previous stack frame
      • %ebp = (%ebp)
    • jump back to return address
      • %eip = 4(%ebp)

Quick overview (again)

  • Buffer
    • contiguous set of a given data type
    • common in C
      • all strings are buffers of char
  • Overflow
    • put more into the buffer than it can hold
  • Question
    • where does the extra data go?
  • Slide answer
    • now that we know memory layouts, we can reason about where the overwrite lands

A buffer overflow example

Example 1 from the slide:

void func(char *arg1) { char buffer[4]; strcpy(buffer, arg1); ... } int main() { char *mystr = "AuthMe!"; func(mystr); ... }

Step-by-step effect shown in the slides:

  • Initial stack region includes:
    • buffer
    • saved %ebp
    • saved %eip
    • &arg1
  • First 4 bytes copied:
    • A u t h
  • Remaining bytes continue writing:
    • M e ! \0
  • Because strcpy keeps copying until it sees \0, bytes go past the end of the buffer.
  • In the example, upon return:
    • %ebp becomes 0x0021654d
  • Result:
    • segmentation fault
    • shown as SEGFAULT (0x00216551) in the slide sequence

A buffer overflow example: changing control data vs. changing program data

Example 2 from the slide:

void func(char *arg1) { int authenticated = 0; char buffer[4]; strcpy(buffer, arg1); if (authenticated) { ... } } int main() { char *mystr = "AuthMe!"; func(mystr); ... }

Step-by-step effect shown in the slides:

  • Initial stack contains:
    • buffer
    • authenticated
    • saved %ebp
    • saved %eip
    • &arg1
  • Overflow writes:
    • A u t h into buffer
    • M e ! \0 into authenticated
  • Result:
    • code still runs
    • user now appears “authenticated”

Important lesson:

  • A buffer overflow does not need to crash.
  • It may silently change program data or logic.

gets vs fgets

Unsafe function shown in the slide:

void vulnerable() { char buf[80]; gets(buf); }

Safer version shown in the slide:

void safe() { char buf[80]; fgets(buf, 64, stdin); }

Even safer pattern from the next slide:

void safer() { char buf[80]; fgets(buf, sizeof(buf), stdin); }

Reference from slide:

User-supplied strings

  • In the toy examples, the strings are constant.
  • In reality they come from users in many ways:
    • text input
    • packets
    • environment variables
    • file input
  • Validating assumptions about user input is extremely important.

What’s the worst that could happen?

Using:

char buffer[4]; strcpy(buffer, arg1);
  • strcpy will let you write as much as you want until a \0.
  • If attacker-controlled input is long enough, the memory past the buffer becomes “all ours” from the attacker’s perspective.
  • That raises the key question from the slide:
    • what could you write to memory to wreak havoc?

Code injection

  • Title-only transition slide.
  • It introduces the move from accidental overwrite to deliberate attacker payloads.

High-level idea

Example used in the slide:

void func(char *arg1) { char buffer[4]; sprintf(buffer, arg1); ... }

Two-step plan shown in the slides:

    1. Load my own code into memory.
    1. Somehow get %eip to point to it.

The slide sequence draws this as:

  • vulnerable buffer on stack
  • attacker-controlled bytes placed in memory
  • %eip redirected toward those bytes

This is nontrivial

  • Pulling off this attack requires getting a few things really right, and some things only sorta right.
  • The lecture says to think about what is tricky about the attack.
  • Main security idea:
    • the key to defending it is to make the hard parts really hard

Challenge 1: Loading code into memory

  • The attacker payload must be machine-code instructions.
    • already compiled
    • ready to run
  • We have to be careful in how we construct it.
    • It cannot contain all-zero bytes.
      • otherwise sprintf, gets, scanf, and similar routines stop copying
    • It cannot make use of the loader.
      • because we are injecting the bytes directly
    • It cannot use the stack.
      • because we are in the process of smashing it
  • The lecture then gives the name:
    • shellcode

What kind of code would we want to run?

  • Goal: full-purpose shell
    • code to launch a shell is called shellcode
    • it is nontrivial to write shellcode that works as injected code
      • no zeroes
      • cannot use the stack
      • no loader dependence
    • there are many shellcodes already written
    • there are even competitions for writing the smallest shellcode
  • Goal: privilege escalation
    • ideally, attacker goes from guest or non-user to root

Shellcode

High-level C version shown in the slides:

#include <stdio.h> int main() { char *name[2]; name[0] = "/bin/sh"; name[1] = NULL; execve(name[0], name, NULL); }

Assembly version shown in the slides:

xorl %eax, %eax pushl %eax pushl $0x68732f2f pushl $0x6e69622f movl %esp, %ebx pushl %eax ...

Machine-code bytes shown in the slides:

"\x31\xc0" "\x50" "\x68""//sh" "\x68""/bin" "\x89\xe3" "\x50" ...

Important point from the slide:

  • those machine-code bytes can become part of the attacker’s input

Challenge 2: Getting our injected code to run

  • We cannot insert a fresh “jump into my code” instruction.
  • We must use whatever code is already running.

Hijacking the saved %eip

  • Strategy:
    • overwrite the saved return address
    • make it point into the injected bytes
  • Core idea:
    • when the function returns, the CPU loads the overwritten return address into %eip

Question raised by the slides:

  • But how do we know the address?

Failure mode shown in the slide sequence:

  • if the guessed address is wrong, the CPU tries to execute data bytes
  • this is most likely not valid code
  • result:
    • invalid instruction
    • CPU “panic” / crash

Challenge 3: Finding the return address

  • If we do not have the code, we may not know how far the buffer is from the saved %ebp.
  • One approach:
    • try many different values
  • Worst case:
    • 2^32 possible addresses on 32-bit
    • 2^64 possible addresses on 64-bit
  • But without address randomization:
    • the stack always starts from the same fixed address
    • the stack grows, but usually not very deeply unless heavily recursive

Improving our chances: nop sleds

  • nop is a single-byte instruction.
  • Definition:
    • it does nothing except move execution to the next instruction
  • NOP sled idea:
    • put a long sequence of nop bytes before the real malicious code
    • now jumping anywhere in that region still works
    • execution slides down into the payload

Why this helps:

  • it increases the chance that an approximate address guess still succeeds
  • the slides explicitly state:
    • now we improve our chances of guessing by a factor of #nops
[padding][saved return address guess][nop nop nop ...][malicious code]

Putting it all together

  • Payload components shown in the slides:
    • padding
    • guessed return address
    • NOP sled
    • malicious code
  • Constraint noted by the lecture:
    • input has to start wherever the vulnerable gets / similar function begins writing

Buffer overflow defense #1: use secure bounds-checking functions

  • User-level protection
  • Replace unbounded routines with bounded ones.
  • Prefer secure languages where possible:
    • Java
    • Rust
    • etc.

Buffer overflow defense #2: Address Space Layout Randomization (ASLR)

  • Randomize starting address of program regions.
  • Goal:
    • prevent attacker from guessing / finding the correct address to put in the return-address slot
  • OS-level protection

Buffer overflow counter-technique: NOP sled

  • Counter-technique against uncertain addresses
  • By jumping somewhere into a wide sled, exact address knowledge becomes less necessary

Buffer overflow defense #3: Canary

  • Put a guard value between vulnerable local data and control-flow data.
  • If overflow changes the canary, the program can detect corruption before returning.
  • OS-level / compiler-assisted protection in the lecture framing

Buffer overflow defense #4: No-execute bits (NX)

  • Mark the stack as not executable.
  • Requires hardware support.
  • OS / hardware-level protection

Buffer overflow counter-technique: ret-to-libc and ROP

  • Code in the C library is already stored at consistent addresses.
  • Attacker can find code in the C library that has the desired effect.
    • possibly heavily fragmented
  • Then return to the necessary address or addresses in the proper order.
  • This is the motivation behind:
    • ret-to-libc
    • Return-Oriented Programming (ROP)

We will continue from defenses / exploitation follow-ups in the next lecture.

Last updated on