From 271c1346ebebe116c56a3d9588eab7d8432fd33d Mon Sep 17 00:00:00 2001 From: Remzi Arpaci-Dusseau Date: Mon, 5 Feb 2018 10:45:56 -0600 Subject: [PATCH] Next rev of shell project --- processes-shell/README.md | 377 +++++++++++++------------------------- 1 file changed, 124 insertions(+), 253 deletions(-) diff --git a/processes-shell/README.md b/processes-shell/README.md index c5b8b81..c3135d9 100644 --- a/processes-shell/README.md +++ b/processes-shell/README.md @@ -39,130 +39,147 @@ should be `wish`: ``` prompt> ./wish +wish> +``` + +At this point, `wish` is running, and ready to accept commands. Type away! + +The mode above is called *interactive* mode, and allows the user to type +commands directly. The shell also supports a *batch mode*, which instead reads +input from a batch file and executes commands from therein. Here is how you +run the shell with a batch file named `batch.txt`: + +``` +prompt> ./wish batch.txt ``` You should structure your shell such that it creates a new process for each -new command (note that there are a few exceptions to this, which we discuss -below). Your basic shell should be able to parse a command and run the -program corresponding to the command. For example, if the user types `ls --la /tmp`, your shell should run the program `/bin/ls` with the given -arguments `-la` and `/tmp`. +new command (the exception are *built-in commands*, discussed below). Your +basic shell should be able to parse a command and run the program +corresponding to the command. For example, if the user types `ls -la /tmp`, +your shell should run the program `/bin/ls` with the given arguments `-la` and +`/tmp` (how does the shell know to run `/bin/ls`? It's something called the +shell **path**; more on this below). -You might be wondering how the shell knows to run `/bin/ls` (which means the -program binary `ls` is found in the directory `/bin`) when you type `ls`. The -shells knows this thanks to a **path** variable that the user sets. The path -variable contains the list of all directories to search, in order, when the -user types a command. We'll learn more about how to deal with the path below. +## Structure -**Important:** Note that the shell itself does not *implement* `ls` or really -many other commands at all (it does implement a few, called *built-ins*, -described further below). All it does is find those executables in one of the -directories specified by `path` and create a new process to run them. More on -this below. +### Basic Shell +The shell is very simple (conceptually): it runs in a while loop, repeatedly +asking for input to tell it what command to execute. It then executes that +command. The loop continues indefinitely, until the user types the built-in +command `exit`, at which point it exits. That's it! +For reading lines of input, you should use `getline()`. This allows you to +obtain arbitrarily long input lines with ease. Generally, the shell will be +run in *interactive mode*, where the user types a command (one at a time) and +the shell acts on it. However, your shell will also support *batch mode*, in +which the shell is given an input file of commands; in this case, the shell +should not read user input (from `stdin`) but rather from this file to get the +commands to execute. +To parse the input line into constituent pieces, you might want to use +`strtok()`. Read the man page (carefully) for more details. +To execute commands, look into `fork()`, `exec()`, and `wait()/waitpid()`. +See the man pages for these functions, and also read the relevant [book +chapter](http://www.ostep.org/cpu-api.pdf) for a brief overview. -## Built-in Commands +You will note that there are a variety of commands in the `exec` family; for +this project, you must use `execv`. You should **not** use the `system()` +library function call to run a command. Remember that if `execv()` is +successful, it will not return; if it does return, there was an error (e.g., +the command does not exist). The most challenging part is getting the +arguments correctly specified. + +### Paths + +In our example above, the user typed `ls` but the shell knew to execute the +program `/bin/ls`. How does your shell know this? + +It turns out that the user must specify a **path** variable to describe the +set of directories to search for executables; the set of directories that +comprise the path are sometimes called the *search path* of the shell. The +path variable contains the list of all directories to search, in order, when +the user types a command. + +**Important:** Note that the shell itself does not *implement* `ls` or other +commands (except built-ins). All it does is find those executables in one of +the directories specified by `path` and create a new process to run them. + +To check if a particular file exists in a directory and is executable, +consider the `access()` system call. For example, when the user types `ls`, +and path is set to include both `/bin` and `/usr/bin`, try `access("/bin/ls", +X_OK)`. If that fails, try "/usr/bin/ls". If that fails too, it is an error. + +### Built-in Commands Whenever your shell accepts a command, it should check whether the command is a **built-in command** or not. If it is, it should not be executed like other programs. Instead, your shell will invoke your implementation of the built-in command. For example, to implement the `exit` built-in command, you simply -call `exit(0);` in your C program. +call `exit(0);` in your wish source code, which then will exit the shell. -So far, you have added your own `exit` built-in command. Most Unix shells have -many others such as `cd`, `pwd`, etc. In this project, you should implement -`exit`, `cd`, `pwd`, and `path`. +In this project, you should implement `exit`, `cd`, `pwd`, and `path` as +built-in commands. -The formats for `exit`, `cd`, and `pwd` are: +* `exit`: When the user types `exit`, your shell should simply call the `exit` + system call with 0 as a parameter. It is an error to pass any arguments to + `exit`. -``` -[optional-space]exit[optional-space] -[optional-space]pwd[optional-space] -[optional-space]cd[optional-space] -[optional-space]cd[oneOrMoreSpace]dir[optional-space] -``` +* `cd`: `cd` always take one argument (0 or >1 args should be signaled as an +error). To change directories, use the `chdir()` system call with the argument +supplied by the user; if `chdir` fails, that is also an error. -When you run `cd` (without arguments), your shell should change the working -directory to the path stored in the $HOME environment variable. Use the call -`getenv("HOME")` in your `wish` source code to obtain this value. +* `pwd`: When a user types `pwd`, your shell should call getcwd() and show the +result. It is an error to pass any arguments to `pwd`. -You do not have to support tilde (~). Although in a typical Unix shell you -could go to a user's directory by typing `cd ~username`, in this project you -do not have to deal with tilde. You should treat it like a common character, -i.e., you should just pass the whole word (e.g. "~username") to chdir(), and -chdir will return an error. +* `path`: The `path` command takes 0 or more arguments, with each argument + separated by whitespace from the others. A typical usage would be like this: + `wish> path /bin /usr/bin`, which would add `/bin` and `/usr/bin` to the + search path of the shell. If the user sets path to be empty, then the shell + should not be able to run any programs (except built-in commands). -Basically, when a user types `pwd`, you simply call getcwd(), and show the -result. When a user changes the current working directory (e.g. \"cd -somepath\"), you simply call chdir(). Hence, if you run your shell, and then -run pwd, it should look like this: +### Redirection -``` -% cd -% pwd -/afs/cs.wisc.edu/u/m/j/username -% echo $PWD -/u/m/j/username -% ./wish -wish> pwd -/afs/cs.wisc.edu/u/m/j/username -``` - -The format of the `path` built-in command is: -``` - [optionalSpace]path[oneOrMoreSpace]dir[optionalSpace] (and possibly more directories, space separated) -``` - -A typical usage would be like this: - -``` -wish> path /bin /usr/bin -``` - -By doing this, your shell will know to look in `/bin` and `/usr/bin` -when a user types a command, to see if it can find the proper binary to -execute. If the user sets path to be empty, then the shell should not be able -to run any programs unless XXX (but built-in commands, such as path, should -still work). - -## Redirection - -Many times, a shell user prefers to send the output of his/her program to a -file rather than to the screen. Usually, a shell provides this nice feature -with the `>` character. Formally this is named as redirection of standard +Many times, a shell user prefers to send the output of a program to a file +rather than to the screen. Usually, a shell provides this nice feature with +the `>` character. Formally this is named as redirection of standard output. To make your shell users happy, your shell should also include this feature, but with a slight twist (explained below). For example, if a user types `ls -la /tmp > output`, nothing should be printed on the screen. Instead, the standard output of the `ls` program should be -rerouted to the `output.out` file. In addition, the standard error output of -the file should be rerouted to the file `output.err` (the twist is that this +rerouted to the file `output`. In addition, the standard error output of +the file should be rerouted to the file `output` (the twist is that this is a little different than standard redirection). -If the `output.out` or `output.err` files already exists before you run your -program, you should simple overwrite them (after truncating). If the output -file is not specified (e.g., the user types `ls >` without a file), you should -print an error message and not run the program `ls`. +If the `output` file exists before you run your program, you should simple +overwrite them (after truncating it). -Here are some redirections that should **not** work: -``` -ls > out1 out2 -ls > out1 out2 out3 -ls > out1 > out2 -``` +The exact format of redirection is a command (and possibly some arguments) +followed by the redirection symbol followed by a filename. Multiple +redirection operators or multiple files to the right of the redirection sign +are errors. Note: don't worry about redirection for built-in commands (e.g., we will not test what happens when you type `path /bin > file`). -## Parallel Commands +### Parallel Commands -Your shell will also allow the user to launch parallel commands. +Your shell will also allow the user to launch parallel commands. This is +accomplished with the ampersand operator as follows: + +``` +wish> cmd1 & cmd2 args1 args2 & cmd3 args1 +``` + +In this case, instead of running `cmd1` and then waiting for it to finish, +your shell should run `cmd1`, `cmd2`, and `cmd3` (each with whatever arguments +the user has passed to it). -## Program Errors +### Program Errors **The one and only error message.** You should print this one and only error message whenever you encounter an error of any type: @@ -172,8 +189,8 @@ message whenever you encounter an error of any type: write(STDERR_FILENO, error_message, strlen(error_message)); ``` -The error message should be printed to stderr (standard error). Also, -do not add whitespaces or tabs or extra error messages. +The error message should be printed to stderr (standard error), as shown +above. There is a difference between errors that your shell catches and those that the program catches. Your shell should catch all the syntax errors specified @@ -183,178 +200,32 @@ invalid arguments to `ls` when you run it, for example), let the program prints its specific error messages in any manner it desires (e.g., could be stdout or stderr). -## White Spaces - -The `>` operator will be separated by spaces. Valid input may include the -following: - -``` -wish> ls -wish> ls > a -wish> ls > a -``` - -But not this (it is ok if this works, it just doesn't have to): - -``` -wish> ls>a -``` - - -## Defensive Programming and Error Messages - -Defensive programming is good for you, so do it! It is also required. Your -program should check all parameters, error-codes, etc. before it trusts -them. In general, there should be no circumstances in which your C program -will core dump, hang indefinitely, or prematurely terminate. Therefore, your -program must respond to all input in a reasonable manner; by "reasonable", -we mean print the error message (as specified in the next paragraph) and -either continue processing or exit, depending upon the situation. - -Since your code will be graded with automated testing, you should print this -*one and only error message* whenever you encounter an error of any type: - -``` - char error_message\[30\] = \"An error has occurred\\n\"; - write(STDERR_FILENO, error_message, strlen(error_message)); -``` - -For this project, the error message should be printed to **stderr**. Also, do -not attempt to add whitespaces or tabs or extra error messages. - -You should consider the following situations as errors; in each case, your -shell should print the error message to stderr and exit gracefully: - -* An incorrect number of command line arguments to your shell program. - -For the following situation, you should print the error message to -stderr and continue processing: - -* A command does not exist or cannot be executed. -* A very long command line (over 128 bytes). - -Your shell should also be able to handle the following scenarios below, which -are *not errors.* - -* An empty command line. -* Multiple white spaces on a command line. - -## Hints - -Writing your shell in a simple manner is a matter of finding the relevant -library routines and calling them properly. To simplify things for you in -this assignment, we will suggest a few library routines you may want to use to -make your coding easier. You are free to use these routines if you want or to -disregard our suggestions. To find information on these library routines, look -at the manual pages.] - -### Basic Shell - -**Parsing:** For reading lines of input, once again check out `getline()`. To -open a file and get a handle with type `FILE *`, look into `fopen()`. Be sure -to check the return code of these routines for errors! You may find the -`strtok()` routine useful for parsing the command line (i.e., for extracting -the arguments within a command separated by whitespaces). - -**Executing Commands:** Look into `fork`, `exec`, and `wait/waitpid`. See the -man pages for these functions, and also read [book chapter](http://www.ostep.org/cpu-api.pdf). - -You will note that there are a variety of commands in the `exec` family; for -this project, you must use `execv`. You should **not** use the `system()` -library function call to run a command. Remember that if `execv()` is -successful, it will not return; if it does return, there was an error (e.g., -the command does not exist). The most challenging part is getting the -arguments correctly specified. The first argument specifies the program that -should be executed, with the full path specified; this is -straight-forward. The second argument, `char *argv[]` matches those -that the program sees in its function prototype: - -```c -int main(int argc, char *argv[]); -``` - -Note that this argument is an array of strings, or an array of -pointers to characters. For example, if you invoke a program with: - -``` -foo 205 535 -``` - -Assuming that you find `foo` in directory `/bin` (or elsewhere in the defined -path), then argv[0] = "/bin/foo", argv[1] = "205" and argv[2] = "535". - -Important: the list of arguments must be terminated with a NULL pointer; in -our example, this means argv[3] = NULL. We strongly recommend that you -carefully check that you are constructing this array correctly! - -### Built-in Commands - -For the `exit` built-in command, you should simply call `exit()` from within -your source code. The corresponding shell process will exit, and the parent -(i.e. your shell) will be notified. - -For managing the current working directory, you should use `getenv(), -`chdir()`, and `getcwd()`. The `getenv()` call is useful when you want to go -to your HOME directory. The `getcwd()` call is useful to know the current -working directory, i.e., if a user types `pwd`, you simply call `getcwd()` and -use those results. Finally, `chdir` is useful for moving to different -directories. For more information on these topics, read the man pages or the -Advanced Unix Programming book (Chapters 4 and 7) or look around online. - -### Redirection - -Redirection is relatively easy to implement. For example, to redirect standard -output to a file, just use `close()` on stdout, and then `open()` on a -file. More on this below. - -With a file descriptor, you can perform read and write to a file. Maybe in -your life so far, you have only used `fopen()`, `fread()`, and `fwrite()` for -reading and writing to a file. Unfortunately, these functions work on `FILE -*`, which is more of a C library support; the file descriptors are hidden. - -To work on a file descriptor, you should use `open()`, `read()`, and `write()` -system calls. These functions perform their work by using file descriptors. -To understand more about file I/O and file descriptors you can read the -Advanced Unix Programming book (Chapter 3) (specifically, 3.2 to 3.5, 3.7, -3.8, and 3.12), or just read the man pages. Before reading forward, at this -point, you should become more familiar file descriptors. - -The idea of redirection is to make the stdout descriptor point to your output -file descriptor. First of all, let's understand the STDOUT_FILENO file -descriptor. When a command `ls -la /tmp` runs, the `ls` program prints its -output to the screen. But obviously, the ls program does not know what a -screen is. All it knows is that the screen is basically pointed by the -STDOUT_FILENO file descriptor. In other words, you could rewrite -`printf("hi");` in this way: `write(STDOUT_FILENO, "hi", 2);`. - -To check if a particular file exists in a directory, use the `stat()` system -call. For example, when the user types `ls`, and path is set to include both -`/bin` and `/usr/bin`, try `stat("/bin/ls")`. If that fails, try -`stat("/usr/bin/ls")`. If that fails too, print the **only error message**. - ### Miscellaneous Hints Remember to get the **basic functionality** of your shell working before worrying about all of the error conditions and end cases. For example, first get a single command running (probably first a command with no arguments, such -as `ls`). Then try adding more arguments. +as `ls`). -Next, try working on multiple commands. Make sure that you are correctly -handling all of the cases where there is miscellaneous white space around -commands or missing commands. Next, add built-in commands. Finally, add -redirection support. +Next, add built-in commands. Then, try working on redirection. Finally, think +about parallel commands. Each of these requires a little more effort on +parsing, but each should not be too hard to implement. -We strongly recommend that you check the return codes of all system -calls from the very beginning of your work. This will often catch -errors in how you are invoking these new system calls. And, it's just good -programming sense. +At some point, you should make sure your code is robust to white space of +various kinds, including spaces (` `) and tabs (`\t`). In general, the user +should be able to put variable amounts of white space before and after +commands, arguments, and various operators; however, the operators +(redirection and parallel commands) do not require whitespace. -Beat up your own code! You are the best (and in this case, the -only) tester of this code. Throw lots of junk at it and make sure the -shell behaves well. Good code comes through testing -- you must run -all sorts of different tests to make sure things work as -desired. Don't be gentle -- other users certainly won't be. Break it -now so we don't have to break it later. +Check the return codes of all system calls from the very beginning of your +work. This will often catch errors in how you are invoking these new system +calls. It's also just good programming sense. + +Beat up your own code! You are the best (and in this case, the only) tester of +this code. Throw lots of junk at it and make sure the shell behaves well. Good +code comes through testing -- you must run all sorts of different tests to +make sure things work as desired. Don't be gentle -- other users certainly +won't be. Break it now so we don't have to break it later. Keep versions of your code. More advanced programmers will use a source control system such as git. Minimally, when you get a piece of functionality