From a21e6832dd4bddad0f55bf22f2d75e5eaab43543 Mon Sep 17 00:00:00 2001 From: Remzi Arpaci-Dusseau Date: Thu, 1 Feb 2018 14:54:05 -0600 Subject: [PATCH] Initial cut at shell; missing lots of stuff --- README.md | 6 +- processes-shell/README.md | 364 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 368 insertions(+), 2 deletions(-) create mode 100644 processes-shell/README.md diff --git a/README.md b/README.md index bec9fad..11a6ef3 100644 --- a/README.md +++ b/README.md @@ -18,11 +18,13 @@ Realize the best thing you can do to learn to program in any environment is to program **a lot**. These small projects are only the beginning of that journey; you'll have to do more on your own to truly become proficient. -* [Unix Utilities](https://github.com/remzi-arpacidusseau/ostep-projects/tree/master/initial-utilities) +* [Unix Utilities](https://github.com/remzi-arpacidusseau/ostep-projects/tree/master/initial-utilities) (cat, grep, zip/unzip) +* Sort (text-based) +* Sort (binary) ### Processes and Scheduling -* Shell +* [Shell] ### Virtual Memory diff --git a/processes-shell/README.md b/processes-shell/README.md new file mode 100644 index 0000000..733821c --- /dev/null +++ b/processes-shell/README.md @@ -0,0 +1,364 @@ + +# Unix Shell + +In this project, you'll build a simple Unix shell. The shell is the heart of +the command-line interface, and thus is central to the Unix/C programming +environment. Mastering use of the shell is necessary to become proficient in +this world; knowing how the shell itself is built is the focus of this +project. + +There are three specific objectives to this assignment: + +* To further familiarize yourself with the Linux programming environment. +* To learn how processes are created, destroyed, and managed. +* To gain exposure to the necessary functionality in shells. + +## Overview + +In this assignment, you will implement a *command line interpreter (CLI)* or, +as it is more commonly known, a *shell*. The shell should operate in this +basic way: when you type in a command (in response to its prompt), the shell +creates a child process that executes the command you entered and then prompts +for more user input when it has finished. + +The shells you implement will be similar to, but simpler than, the one you run +every day in Unix. You can find out which shell you are running by typing +**echo $SHELL**] at a prompt. You may then wish to look at the man pages for +the shell you are running (probably bash) to learn more about all of the +functionality that can be present. For this project, you do not need to +implement too much functionality. + +## Program Specifications## + +### Basic Shell: WiSH + +Your basic shell, called **wish**, is basically an interactive loop: it +repeatedly prints a prompt `wish> ` (note the space after the +greater-than sign), parses the input, executes the command specified on that +line of input, and waits for the command to finish. This is repeated until the +user types `exit`. The name of your final executable should be `wish`: + +``` +prompt> ./wish +``` + +You should structure your shell such that it creates a new process for each +new command (note that there are a few exceptions to this, which we discuss +below). There are two advantages of creating a new process. First, it protects +the main shell process from any errors that occur in the new command. Second, +it allows for concurrency; that is, multiple commands can be started and +allowed to execute simultaneously. + +Your basic shell should be able to parse a command, and run the program +corresponding to the command. For example, if the user types `ls -la /tmp`, +your shell should run the program `/bin/ls` with all the given arguments. + +You might be wondering how the shell knows to run `/bin/ls` (which means the +program binary `ls` is found in the directory `/bin`) when you type `ls`. The +shells knows this thanks to a **path** variable that the user sets. The path +variable contains the list of all directories to search, in order, when the +user types a command. We'll learn more about how to deal with the path below. + +**Important:** Note that the shell itself does not *implement* `code ls` or +really many other commands at all. All it does is find those executables in +one of the directories specified by `path` and create a new process to +run them. More on this below. + +## Built-in Commands + +Whenever your shell accepts a command, it should check whether the command is +a **built-in command** or not. If it is, it should not be executed like other +programs. Instead, your shell will invoke your implementation of the built-in +command. For example, to implement the `exit` built-in command, you simply +call `exit(0);` in your C program. + +So far, you have added your own `exit` built-in command. Most Unix shells have +many others such as `cd`, `pwd`, etc. In this project, you should implement +`exit`, `cd`, `pwd`, and `path`. + +The formats for `exit`, `cd`, and `pwd` are: + +``` +[optional-space]exit[optional-space] +[optional-space]pwd[optional-space] +[optional-space]cd[optional-space] +[optional-space]cd[oneOrMoreSpace]dir[optional-space] +``` + +When you run `cd` (without arguments), your shell should change the working +directory to the path stored in the $HOME environment variable. Use the call +`getenv("HOME")` in your `wish` source code to obtain this value. + +You do not have to support tilde (~). Although in a typical Unix shell you +could go to a user's directory by typing `cd ~username`, in this project you +do not have to deal with tilde. You should treat it like a common character, +i.e., you should just pass the whole word (e.g. "~username") to chdir(), and +chdir will return an error. + +Basically, when a user types `pwd`, you simply call getcwd(), and show the +result. When a user changes the current working directory (e.g. \"cd +somepath\"), you simply call chdir(). Hence, if you run your shell, and then +run pwd, it should look like this: + +``` +% cd +% pwd +/afs/cs.wisc.edu/u/m/j/username +% echo $PWD +/u/m/j/username +% ./wish +wish> pwd +/afs/cs.wisc.edu/u/m/j/username +``` + +The format of the `path` built-in command is: +``` + [optionalSpace]path[oneOrMoreSpace]dir[optionalSpace] (and possibly more directories, space separated) +``` + +A typical usage would be like this: + +``` +wish> path /bin /usr/bin +``` + +By doing this, your shell will know to look in `/bin` and `/usr/bin` +when a user types a command, to see if it can find the proper binary to +execute. If the user sets path to be empty, then the shell should not be able +to run any programs unless XXX (but built-in commands, such as path, should +still work). + +## Redirection + +Many times, a shell user prefers to send the output of his/her program to a +file rather than to the screen. Usually, a shell provides this nice feature +with the `>` character. Formally this is named as redirection of standard +output. To make your shell users happy, your shell should also include this +feature, but with a slight twist (explained below). + +For example, if a user types `ls -la /tmp > output`, nothing should be printed +on the screen. Instead, the standard output of the `ls` program should be +rerouted to the `output.out` file. In addition, the standard error output of +the file should be rerouted to the file `output.err` (the twist is that this +is a little different than standard redirection). + +If the `output.out` or `output.err` files already exists before you run your +program, you should simple overwrite them (after truncating). If the output +file is not specified (e.g., the user types `ls >` without a file), you should +print an error message and not run the program `ls`. + +Here are some redirections that should **not** work: +``` +ls > out1 out2 +ls > out1 out2 out3 +ls > out1 > out2 +``` + +Note: don't worry about redirection for built-in commands (e.g., we will +not test what happens when you type `path /bin > file`). + +## Parallel Commands + +Your shell will also allow the user to launch parallel commands. + + +## Program Errors + +**The one and only error message.** You should print this one and only error +message whenever you encounter an error of any type: + +``` + char error_message[30] = "An error has occurred\n"; + write(STDERR_FILENO, error_message, strlen(error_message)); +``` + +The error message should be printed to stderr (standard error). Also, +do not add whitespaces or tabs or extra error messages. + +There is a difference between errors that your shell catches and those that +the program catches. Your shell should catch all the syntax errors specified +in this project page. If the syntax of the command looks perfect, you simply +run the specified program. If there is any program-related errors (e.g., +invalid arguments to `ls` when you run it, for example), let the program +prints its specific error messages in any manner it desires (e.g., could be +stdout or stderr). + +## White Spaces + +The `>` operator will be separated by spaces. Valid input may include the +following: + +``` +wish> ls +wish> ls > a +wish> ls > a +``` + +But not this (it is ok if this works, it just doesn't have to): + +``` +wish> ls>a +``` + + +## Defensive Programming and Error Messages + +Defensive programming is good for you, so do it! It is also required. Your +program should check all parameters, error-codes, etc. before it trusts +them. In general, there should be no circumstances in which your C program +will core dump, hang indefinitely, or prematurely terminate. Therefore, your +program must respond to all input in a reasonable manner; by "reasonable", +we mean print the error message (as specified in the next paragraph) and +either continue processing or exit, depending upon the situation. + +Since your code will be graded with automated testing, you should print this +*one and only error message* whenever you encounter an error of any type: + +``` + char error_message\[30\] = \"An error has occurred\\n\"; + write(STDERR_FILENO, error_message, strlen(error_message)); + +For this project, the error message should be printed to **stderr**. Also, do +not attempt to add whitespaces or tabs or extra error messages. + +You should consider the following situations as errors; in each case, your +shell should print the error message to stderr and exit gracefully: + +* An incorrect number of command line arguments to your shell program. + +For the following situation, you should print the error message to +stderr and continue processing: + +* A command does not exist or cannot be executed. +* A very long command line (over 128 bytes). + +Your shell should also be able to handle the following scenarios below, which +are *not errors.* + +* An empty command line. +* Multiple white spaces on a command line. + +## Hints + +Writing your shell in a simple manner is a matter of finding the relevant +library routines and calling them properly. To simplify things for you in +this assignment, we will suggest a few library routines you may want to use to +make your coding easier. You are free to use these routines if you want or to +disregard our suggestions. To find information on these library routines, look +at the manual pages.] + +### Basic Shell + +**Parsing:** For reading lines of input, once again check out `getline()`. To +open a file and get a handle with type `FILE *`, look into `fopen()`. Be sure +to check the return code of these routines for errors! You may find the +`strtok()` routine useful for parsing the command line (i.e., for extracting +the arguments within a command separated by whitespaces). + +**Executing Commands:** Look into `fork`, `exec`, and `wait/waitpid`. See the +man pages for these functions, and also read [book chapter](http://www.ostep.org/cpu-api.pdf). + +You will note that there are a variety of commands in the `exec` family; for +this project, you must use `execv`. You should **not** use the `system()` +library function call to run a command. Remember that if `execv()` is +successful, it will not return; if it does return, there was an error (e.g., +the command does not exist). The most challenging part is getting the +arguments correctly specified. The first argument specifies the program that +should be executed, with the full path specified; this is +straight-forward. The second argument, `char *argv[]` matches those +that the program sees in its function prototype: + +```c +int main(int argc, char *argv[]); +``` + +Note that this argument is an array of strings, or an array of +pointers to characters. For example, if you invoke a program with: + +``` +foo 205 535 +``` + +Assuming that you find `foo` in directory `/bin` (or elsewhere in the defined +path), then argv[0] = "/bin/foo", argv[1] = "205" and argv[2] = "535". + +Important: the list of arguments must be terminated with a NULL pointer; in +our example, this means argv[3] = NULL. We strongly recommend that you +carefully check that you are constructing this array correctly! + +### Built-in Commands + +For the `exit` built-in command, you should simply call `exit()` from within +your source code. The corresponding shell process will exit, and the parent +(i.e. your shell) will be notified. + +For managing the current working directory, you should use `getenv(), +`chdir()`, and `getcwd()`. The `getenv()` call is useful when you want to go +to your HOME directory. The `getcwd()` call is useful to know the current +working directory, i.e., if a user types `pwd`, you simply call `getcwd()` and +use those results. Finally, `chdir` is useful for moving to different +directories. For more information on these topics, read the man pages or the +Advanced Unix Programming book (Chapters 4 and 7) or look around online. + +### Redirection + +Redirection is relatively easy to implement. For example, to redirect standard +output to a file, just use `close()` on stdout, and then `open()` on a +file. More on this below. + +With a file descriptor, you can perform read and write to a file. Maybe in +your life so far, you have only used `fopen()`, `fread()`, and `fwrite()` for +reading and writing to a file. Unfortunately, these functions work on `FILE +*`, which is more of a C library support; the file descriptors are hidden. + +To work on a file descriptor, you should use `open()`, `read()`, and `write()` +system calls. These functions perform their work by using file descriptors. +To understand more about file I/O and file descriptors you can read the +Advanced Unix Programming book (Chapter 3) (specifically, 3.2 to 3.5, 3.7, +3.8, and 3.12), or just read the man pages. Before reading forward, at this +point, you should become more familiar file descriptors. + +The idea of redirection is to make the stdout descriptor point to your output +file descriptor. First of all, let's understand the STDOUT_FILENO file +descriptor. When a command `ls -la /tmp` runs, the `ls` program prints its +output to the screen. But obviously, the ls program does not know what a +screen is. All it knows is that the screen is basically pointed by the +STDOUT_FILENO file descriptor. In other words, you could rewrite +`printf("hi");` in this way: `write(STDOUT_FILENO, "hi", 2);`. + +To check if a particular file exists in a directory, use the `stat()` system +call. For example, when the user types `ls`, and path is set to include both +`/bin` and `/usr/bin`, try `stat("/bin/ls")`. If that fails, try +`stat("/usr/bin/ls")`. If that fails too, print the **only error message**. + +### Miscellaneous Hints + +Remember to get the **basic functionality** of your shell working before +worrying about all of the error conditions and end cases. For example, first +get a single command running (probably first a command with no arguments, such +as `ls`). Then try adding more arguments. + +Next, try working on multiple commands. Make sure that you are correctly +handling all of the cases where there is miscellaneous white space around +commands or missing commands. Next, add built-in commands. Finally, add +redirection support. + +We strongly recommend that you check the return codes of all system +calls from the very beginning of your work. This will often catch +errors in how you are invoking these new system calls. And, it's just good +programming sense. + +Beat up your own code! You are the best (and in this case, the +only) tester of this code. Throw lots of junk at it and make sure the +shell behaves well. Good code comes through testing -- you must run +all sorts of different tests to make sure things work as +desired. Don't be gentle -- other users certainly won't be. Break it +now so we don't have to break it later. + +Keep versions of your code. More advanced programmers will use a source +control system such as git. Minimally, when you get a piece of functionality +working, make a copy of your .c file (perhaps a subdirectory with a version +number, such as v1, v2, etc.). By keeping older, working versions around, you +can comfortably work on adding new functionality, safe in the knowledge you +can always go back to an older, working version if need be. +