5. Error Handling, Strings, Buffer Overflow

SoftSys Lecture Notes, Wednesday 2/17/21 #

Announcements #

  • The class website has moved to a new URL: https://softsys.olin.edu
    • Please use this link from now on. (It’s been updated on Canvas.)
  • There is a new link for notes on all readings (see Canvas for the link).
    • It’s hugely helpful to us if you’re willing to share your notes.
    • You will be credited if we publish a new book for the course.
  • Homework 2 is due tonight at 11:59 pm EST.

UNIX Fun #

Let’s start off with a few fun UNIX tricks.

You can use ls -t to view files in the order they were most recently modified.

If you’ve got a large folder with many files and want to see the most recently changed few:

ls -t | head

macOS uses open to open any file with its default application. Most macOS programs are stored in /Applications.

So if you have a Mac and want to see it spin up every application at once:

open /Applications/*

Don’t do this on other people’s machines without permission - it will probably cause their machine to run very slowly until enough applications are closed.

If you have spaces in filenames, you can put quotes around them:

mv "OK Boomer.docx" ok_boomer.docx

If you have special Bash characters in the name, like * or !, though, this won’t work. In this case, you can use single quotes:

ls '~'

or a backslash:

rm \'

Sleepsort is a popular joke sorting algorithm popularized on 4chan.

#!/usr/bin/env bash

function f() {
    sleep "$1"
    echo "$1"
}
while [ -n "$1" ]
do
    f "$1" &
    shift
done
wait

It’s not very efficient if you try to sort large numbers, and doesn’t work if you try to sort very small floats.

Quiz 1 Recap #

The average score was 7.3 - pretty good.

Make sure you are precise in your solutions. Many of you talked about whether a string pointer could be modified. But if you define a variable

char *string_pointer = "String pointer";

Then string_pointer can be assigned to a new string literal - it can be modified to point somewhere else. However, the characters of the string literal "String pointer" cannot be modified. Make sure you describe distinctions like this clearly.

Try not to rely on the compiler too much. For questions where you had describe a code snippet’s output or its error, a good number of you wrote that the snippet wouldn’t run unless in a main function (which is true of all of them and not the focus of the question), or you pasted the compiler or output without further explanation. Try to think about the higher-level things that the code is doing, and explain the error in those terms.

Error Handling #

One of the answers to Quiz 1 was this function:

int operate(int x, int y, char c) {
  switch(c) {
    case '+':
      return x + y;
    case '*':
      return x * y;
    default:
      return -1;
  }
}

The problem is that you can only accept positive results as valid, since you return -1 to indicate an error. This isn’t a great design.

It’s not easy to return multiple values (for the result and to indicate success) as you might in other languages. You can instead pass a pointer to the function to store the result, and have the function’s return value indicate whether it was successful or not:

int operate(int x, int y, char c, int *p) {
  switch(c) {
    case '+':
      *p = x + y;
      return 1;
    case '*':
      *p = x * y;
      return 1;
    default:
      return 0;
  }
}

This is a common pattern used in C, so it’s good to get used to this technique.

Strings #

In Chapter 2.5 of Head First C, you saw how to use string.h for various convenient string functions, and how to store arrays of strings as two-dimensional arrays or as an array of character pointers.

Let’s start off with an exercise to explore a few aspects of C and strings.

Exercise #

Write a short test program in C (like Hello World), and make sure you include string.h.

Once you do this, run gcc -E on the file you wrote to get the result of preprocessing the file. This output should tell you where on your system the actual string.h file is located.

Open that file in a text editor (remember that Atom is preinstalled on the course VM). What do you notice about the implementation of the various string functions you saw in Chapter 2.5?

At what point of compilation do you think the body of the string functions is being included in your program?

Now go through the functions below and try to write a brief documentation of (1) what each function takes as input, (2) what each function does, and (3) what each function returns. A general Web search or look through the existing course materials may help. Personally, I prefer CPP Reference, which is geared towards C++ but includes documentation on C functions. Feel free to collaboratively add your documentation here.

  • strcat
    • Args: char* dest, char* src
    • Copies information in src to the end of dest (starting by replacing the null-terminator byte in dest with src[0]). Do this until the null-terminator byte of src is reached, and then add a null-terminator to dest.
    • Returns: char *, a copy of dest
  • strlen
    • Args: const char* str
    • Finds the length of the string passed in.
    • Returns the length of a given string.
  • strcpy
    • Args: char* dest, const char* src
    • Copies the string pointed to by src, including the null-terminator byte, to the buffer pointed to by dest.
    • Returns: A pointer to the destination string dest.
  • strstr
    • Args: const char* haystack, const char* needle
    • Finds the first occurrence of the substring needle in the string haystack. The terminating null bytes are not compared.
    • Returns: Either a pointer to the beginning of the located substring, or NULL if the substring is not found.
  • strcmp
    • Args: const char *s1, const char *s2, pointers to the two strings to compare
    • Function: recieves two strings and compares them, then returns an int representing how similar.different they are
    • Returns: an int representing the comparison of the two strings
  • strchr
    • Args: const char *str, int c
    • Finds the first occurence of the character c in the string str.
    • Returns: Either a pointer to the located character or NULL if the character is not found.

Includes #

When you do #include, the preprocessor searches through a default set of directories for the relevant files.

Usually, you include header files, which only contain the signatures of functions (what type they return, and what types they take as parameters).

Sometimes, the implementations are elsewhere, in which case the linker has to find them.

Note on Header Files #

  • Provides abstraction from implementation so that when implementation is updated, only that implementation (could be thought of as the source node) needs to be recompiled - not the entire cascading set of dependencies.
  • If the header files included implementation, all dependent files would have to be recompiled upon updating the implementation of that one function.

The Right-to-Left Rule #

Quick question: how do you know what type char *lots_of_strings[] is?

One reliable way to figure out the type of a variable is to use the right-to-left rule.

Starting at the variable name, proceed outward to the right and read the symbols to figure out the beginning of the type: lots_of_strings is an array.

Then, once you’ve reached the right end, return to the variable name and proceed left to figure out the rest of the type: lots_of_strings is an array of pointers to chars.

The String Library #

The string.h library is fairly low-level - the functions are small, simple, and fast, and they have little to no safeguards, like error handling or memory bounds checking.

This can cause a few issues with development: you might have to implement your own convenience functions because they are common operations, but not included in string.h. When you do this, it’s easy for a lot of subtle bugs to occur.

Buffer Overflow #

Subtle bugs in the string.h library can lead to buffer overflow. Essentially, if you try to copy a string into a buffer that is too small for it, some string functions will just let you do it.

So what issues does this cause? To get a sense of this, you should have an idea of how the runtime stack works. You might remember this stack from when we talked about the memory layout.

Each time you call a function, the machine creates a stack frame, which is a region of the stack that tracks information about a function call, such as its local variables, parameters that it will pass to call further functions, and other temporary space. Calling a function pushes a new frame onto the stack, while returning from a function pops a frame off the stack.

The top item in a stack frame is the return address, which tells a function what memory address to go to once it has finished executing (usually a part of the function that called it). But if you copy a very long string into a very small buffer, you can write enough data that you can alter this return address.

If you cleverly set the return address, you can get the function to return and jump to anywhere you tell it to. This might be to a function in the program that would otherwise be inaccessible, or a function from the C standard library, or some assembly code that you wrote yourself. Usually, attackers spawn a shell so they can do other things.

Buffer overflow is still a commonly exploited bug. If you find this interesting, consider doing a project in it.

For Next Time #

  • Finish Homework 2, due tonight.
  • Read Think OS, Chapter 3 and do the reading quiz on Canvas.
  • Start Homework 2.5, due next Wednesday 2/24.