Skip to Content
CSE4303Introduction to Computer Security (Lecture 17)

CSE4303 Introduction to Computer Security (Lecture 17)

Due to lack of my attention, this lecture note is generated by AI to create continuations of the previous lecture note. I kept this warning because the note was created by AI.

Software security

Administrative notes

Project details

  • Project plan
    • Thursday, 4/9 at the end of class
    • 5%
  • Written document and presentation recording
    • Thursday, 4/30 at 11:30 AM
    • 15%
  • View peer presentations and provide feedback
    • Wednesday, 5/6 at 11:59 PM
    • 5%

Upcoming schedule

  • This week (3/20)
    • software security lecture
    • studio
    • some time for studio on Tuesday
  • Next week (4/6)
    • fuzzing
    • some time to discuss project ideas
  • 4/13
    • Web security
  • 4/20
    • Privacy and ethics overview
    • time to work on projects
    • course wrap-up

Overview

Outline

  • Context
  • Prominent software vulnerabilities and exploits
  • Buffer overflows
    • Background: C code, compilation, memory layout, execution
    • Baseline exploit
    • Challenges
    • Defenses, countermeasures, counter-countermeasures

Sources:

  • SEED lab book
  • Gilbert/Tamassia book
  • Slides from Bryant/O’Hallaron (CMU), Dan Boneh (Stanford), Michael Hicks (UMD)

Context

Context: computing stack (informal)

LayerExample
Applicationweb server, standalone app
Compiler / assemblergcc, clang
OS: syscallsexecve(), setuid(), write(), open(), fork()
OS: processes, mem layoutLinux virtual memory layout
Architecture (ISA, execution)x86, x86_64, ARM
HardwareIntel Sky Lake processor
  • User control is strongest near the application / compiler level.
  • System control becomes more important as we move down toward OS, architecture, and hardware.

Prominent software vulnerabilities and exploits

Software security: categories

  • Race conditions
  • Privilege escalation
  • Path traversal
  • Environment variable modification
  • Language-specific vulnerabilities
    • Format string attack
    • Buffer overflows

Buffer Overflows (BoFs)

  • A buffer overflow is a bug that affects low-level code, typically in C and C++, with significant security implications.
  • Normally, a program with this bug will simply crash.
  • But an attacker can alter the situations that cause the program to do much worse.
    • Steal private information
      • e.g. Heartbleed
    • Corrupt valuable information
    • Run code of the attacker’s choice

Application behavior

  • Slide contains a figure only.
  • Intended point: normal application behavior can become attacker-controlled if input handling is unsafe.

BoFs: why do we care?

Critical systems in C/C++

  • Most OS kernels and utilities
    • fingerd
    • X windows server
    • shell
  • Many high-performance servers
    • Microsoft IIS
    • Apache httpd
    • nginx
    • Microsoft SQL Server
    • MySQL
    • redis
    • memcached
  • Many embedded systems
    • Mars rover
    • industrial control systems
    • automobiles

A successful attack on these systems can be particularly dangerous.

Morris Worm

  • Slide contains a figure / historical reference only.
  • It is included as an example of how memory-corruption vulnerabilities mattered in practice.

Why do we still care?

  • The slide references the NVD search page: NVD vulnerability search 
  • Why the drop?
    • Memory-safe languages
      • Rust
      • Go
    • Stronger defenses
    • Fuzzing
      • find bugs before release
    • Change in development practices
      • code review
      • static analysis tools
      • related engineering improvements

MITRE Top 25 2025

Buffer overflows

Outline

  • System Basics
    • Application memory layout
    • How does function call work under the hood
      • 32-bit x86 only
      • 64-bit x86_64 similar, but with important differences
  • Buffer overflow
    • Overwriting the return address pointer
    • Point it to shell code injected

Buffer Overflows (BoFs)

  • 2-minute version first, then all background / full version

Process memory layout: virtual address space

Process memory layout: function calls

Process memory layout: compromised frame

Computer System

High-level examples used in the slide:

car *c = malloc(sizeof(car)); c->miles = 100; c->gals = 17; float mpg = get_mpg(c); free(c);
Car c = new Car(); c.setMiles(100); c.setGals(17); float mpg = c.getMPG();

Assembly-language example used in the slide:

get_mpg: pushq %rbp movq %rsp, %rbp ... popq %rbp ret
  • The same computation can be viewed at multiple levels:
    • C / Java source
    • assembly language
    • machine code
    • operating system context

Little Theme 1: Representation

  • All digital systems represent everything as 0s and 1s.
    • The 0 and 1 are really two different voltage ranges in wires.
    • Or magnetic positions on a disk, hole depths on a DVD, or even DNA.
  • “Everything” includes:
    • numbers
      • integers and floating point
    • characters
      • building blocks of strings
    • instructions
      • directives to the CPU that make up a program
    • pointers
      • addresses of data objects stored in memory
  • These encodings are stored throughout the computer system.
    • registers
    • caches
    • memories
    • disks
  • They all need addresses.
    • find an item
    • find a place for a new item
    • reclaim memory when data is no longer needed

Little Theme 2: Translation

  • There is a big gap between how we think about programs / data and the 0s and 1s of computers.
  • We need languages to describe what we mean.
  • These languages must be translated one level at a time.
  • Example point from the slide:
    • we know Java as a programming language
    • but we must work down to the 0s and 1s of computers
    • we try not to lose anything in translation
    • we encounter Java bytecode, C, assembly, and machine code

Little Theme 3: Control Flow

  • How do computers orchestrate everything they are doing?
  • Within one program:
    • How are if/else, loops, and switches implemented?
    • How do we track nested procedure calls?
    • How do we know what to do upon return?
  • At the operating-system level:
    • library loading
    • sharing system resources
      • memory
      • I/O
      • disks

HW/SW Interface: Code / Compile / Run Times

  • Code time
    • user program in C
    • .c file
  • Compile time
    • C compiler
    • assembler
  • Run time
    • executable .exe file
    • hardware executes it
  • Note from slide:
    • the compiler and assembler are themselves just programs developed using this same process

Assembly Programmer’s View

  • Programmer-visible CPU / memory state
    • Program counter
      • address of next instruction
      • called RIP in x86-64
    • Named registers
      • heavily used program data
      • together called the register file
    • Condition codes
      • store status information about most recent arithmetic operation
      • used for conditional branching
  • Memory
    • byte-addressable array
    • contains code and user data
    • includes the stack for supporting procedures

Turning C into Object Code

  • Code in files p1.c and p2.c
  • Compile with:
gcc -Og p1.c p2.c -o p
  • Notes from the slide
    • -Og uses basic optimizations
    • resulting machine code goes into file p
  • Translation chain
    • C program -> assembly program -> object program -> executable program
  • Associated tools
    • compiler
    • assembler
    • linker
    • static libraries (.a)

Machine Instruction Example

  • C code
*dest = t;
  • Meaning
    • store value t where designated by dest
  • Assembly
movq %rsi, (%rdx)
  • Interpretation
    • move 8-byte value to memory
    • operands
      • t is in register %rsi
      • dest is in register %rdx
      • *dest means memory M[%rdx]
  • Object code
0x400539: 48 89 32
  • It is a 3-byte instruction stored at address 0x400539.

IA32 Registers - 32 bits wide

  • General-purpose register families shown in the slide
    • %eax, %ax, %ah, %al
    • %ecx, %cx, %ch, %cl
    • %edx, %dx, %dh, %dl
    • %ebx, %bx, %bh, %bl
    • %esi, %si
    • %edi, %di
    • %esp, %sp
    • %ebp, %bp
  • Roles highlighted in the slide
    • accumulate
    • counter
    • data
    • base
    • source index
    • destination index
    • stack pointer
    • base pointer

Data Sizes

  • Slide is primarily a figure summarizing common integer widths and sizes.

Assembly Data Types

  • “Integer” data of 1, 2, 4, or 8 bytes
    • data values
    • addresses / untyped pointers
  • No aggregate types such as arrays or structures at the assembly level
    • just contiguous bytes in memory
  • Two common syntaxes
    • AT&T
      • used in the course, slides, textbook, GNU tools
    • Intel
      • used in Intel documentation and Intel tools
  • Need to know which syntax you are reading because operand order may be reversed.

Three Basic Kinds of Instructions

  • Transfer data between memory and register
    • load
      • %reg = Mem[address]
    • store
      • Mem[address] = %reg
  • Perform arithmetic on register or memory data
    • examples: addition, shifting, bitwise operations
  • Control flow
    • unconditional jumps to / from procedures
    • conditional branches

Abstract Memory Layout

High addresses Stack <- local variables, procedure context Dynamic Data <- heap, new / malloc Static Data <- globals / static variables Literals <- large constants such as strings Instructions Low addresses

The ELF File Format

  • ELF = Executable and Linkable Format
  • One of the most widely used binary object formats
  • ELF is architecture-independent
  • ELF file types
    • Relocatable
      • must be fixed by the linker before execution
    • Executable
      • ready for execution
    • Shared
      • shared libraries with linking information
    • Core
      • core dumps created when a program terminates with a fault
  • Tools mentioned on slide
    • readelf
    • file
    • objdump -D

Process Memory Layout (32-bit x86 machine)

  • This slide is primarily a diagram.
  • Key idea: a 32-bit x86 process has a standard virtual memory layout with code, static data, heap, and stack arranged in distinct regions.

We continue with the concrete runtime layout and the actual overflow mechanics in Lecture 18.

Last updated on