Next rev of shell project

2018-02-05 10:45:56 -06:00
parent ec810a344b
commit 271c1346eb
1 changed files with 124 additions and 253 deletions
--- a/processes-shell/README.md
+++ b/processes-shell/README.md
@@ -39,130 +39,147 @@ should be `wish`:

 ```
 prompt> ./wish
+wish> 
+```
+
+At this point, `wish` is running, and ready to accept commands. Type away!
+
+The mode above is called *interactive* mode, and allows the user to type
+commands directly. The shell also supports a *batch mode*, which instead reads
+input from a batch file and executes commands from therein. Here is how you
+run the shell with a batch file named `batch.txt`:
+
+```
+prompt> ./wish batch.txt
 ```

 You should structure your shell such that it creates a new process for each
-new command (note that there are a few exceptions to this, which we discuss
-below).  Your basic shell should be able to parse a command and run the
-program corresponding to the command.  For example, if the user types `ls
-la /tmp`, your shell should run the program `/bin/ls` with the given
-arguments `-la` and `/tmp`.
+new command (the exception are *built-in commands*, discussed below).  Your
+basic shell should be able to parse a command and run the program
+corresponding to the command.  For example, if the user types `ls -la /tmp`,
+your shell should run the program `/bin/ls` with the given arguments `-la` and
+`/tmp` (how does the shell know to run `/bin/ls`? It's something called the
+shell **path**; more on this below).

-You might be wondering how the shell knows to run `/bin/ls` (which means the
-program binary `ls` is found in the directory `/bin`) when you type `ls`. The
-shells knows this thanks to a **path** variable that the user sets. The path
-variable contains the list of all directories to search, in order, when the
-user types a command. We'll learn more about how to deal with the path below.
+## Structure

-**Important:** Note that the shell itself does not *implement* `ls` or really
-many other commands at all (it does implement a few, called *built-ins*,
-described further below). All it does is find those executables in one of the
-directories specified by `path` and create a new process to run them. More on
-this below.
+### Basic Shell

+The shell is very simple (conceptually): it runs in a while loop, repeatedly
+asking for input to tell it what command to execute. It then executes that
+command. The loop continues indefinitely, until the user types the built-in
+command `exit`, at which point it exits. That's it!

+For reading lines of input, you should use `getline()`. This allows you to
+obtain arbitrarily long input lines with ease. Generally, the shell will be
+run in *interactive mode*, where the user types a command (one at a time) and
+the shell acts on it. However, your shell will also support *batch mode*, in
+which the shell is given an input file of commands; in this case, the shell
+should not read user input (from `stdin`) but rather from this file to get the
+commands to execute.

+To parse the input line into constituent pieces, you might want to use
+`strtok()`. Read the man page (carefully) for more details.

+To execute commands, look into `fork()`, `exec()`, and `wait()/waitpid()`.
+See the man pages for these functions, and also read the relevant [book
+chapter](http://www.ostep.org/cpu-api.pdf) for a brief overview.

-## Built-in Commands
+You will note that there are a variety of commands in the `exec` family; for
+this project, you must use `execv`. You should **not** use the `system()`
+library function call to run a command.  Remember that if `execv()` is
+successful, it will not return; if it does return, there was an error (e.g.,
+the command does not exist). The most challenging part is getting the
+arguments correctly specified. 
+
+### Paths
+
+In our example above, the user typed `ls` but the shell knew to execute the
+program `/bin/ls`. How does your shell know this?
+
+It turns out that the user must specify a **path** variable to describe the
+set of directories to search for executables; the set of directories that
+comprise the path are sometimes called the *search path* of the shell. The
+path variable contains the list of all directories to search, in order, when
+the user types a command. 
+
+**Important:** Note that the shell itself does not *implement* `ls` or other
+commands (except built-ins). All it does is find those executables in one of
+the directories specified by `path` and create a new process to run them.
+
+To check if a particular file exists in a directory and is executable,
+consider the `access()` system call. For example, when the user types `ls`,
+and path is set to include both `/bin` and `/usr/bin`, try `access("/bin/ls",
+X_OK)`. If that fails, try "/usr/bin/ls". If that fails too, it is an error.
+
+### Built-in Commands

 Whenever your shell accepts a command, it should check whether the command is
 a **built-in command** or not. If it is, it should not be executed like other
 programs. Instead, your shell will invoke your implementation of the built-in
 command. For example, to implement the `exit` built-in command, you simply
-call `exit(0);` in your C program.
+call `exit(0);` in your wish source code, which then will exit the shell.

-So far, you have added your own `exit` built-in command. Most Unix shells have
-many others such as `cd`, `pwd`, etc.  In this project, you should implement
-`exit`, `cd`, `pwd`, and `path`.
+In this project, you should implement `exit`, `cd`, `pwd`, and `path` as
+built-in commands.

-The formats for `exit`, `cd`, and `pwd` are:
+* `exit`: When the user types `exit`, your shell should simply call the `exit`
+  system call with 0 as a parameter. It is an error to pass any arguments to
+  `exit`. 

-```
-[optional-space]exit[optional-space]
-[optional-space]pwd[optional-space]
-[optional-space]cd[optional-space]
-[optional-space]cd[oneOrMoreSpace]dir[optional-space]
-```
+* `cd`: `cd` always take one argument (0 or >1 args should be signaled as an
+error). To change directories, use the `chdir()` system call with the argument
+supplied by the user; if `chdir` fails, that is also an error.

-When you run `cd` (without arguments), your shell should change the working
-directory to the path stored in the $HOME environment variable. Use the call
-`getenv("HOME")` in your `wish` source code to obtain this value.
+* `pwd`: When a user types `pwd`, your shell should call getcwd() and show the 
+result. It is an error to pass any arguments to `pwd`.

-You do not have to support tilde (~). Although in a typical Unix shell you
-could go to a user's directory by typing `cd ~username`, in this project you
-do not have to deal with tilde. You should treat it like a common character,
-i.e., you should just pass the whole word (e.g. "~username") to chdir(), and
-chdir will return an error.
+* `path`: The `path` command takes 0 or more arguments, with each argument
+  separated by whitespace from the others. A typical usage would be like this:
+  `wish> path /bin /usr/bin`, which would add `/bin` and `/usr/bin` to the
+  search path of the shell. If the user sets path to be empty, then the shell
+  should not be able to run any programs (except built-in commands).

-Basically, when a user types `pwd`, you simply call getcwd(), and show the
-result. When a user changes the current working directory (e.g. \"cd
-somepath\"), you simply call chdir(). Hence, if you run your shell, and then
-run pwd, it should look like this:
+### Redirection

-```
-% cd
-% pwd
-/afs/cs.wisc.edu/u/m/j/username
-% echo $PWD
-/u/m/j/username
-% ./wish
-wish> pwd
-/afs/cs.wisc.edu/u/m/j/username
-```
-
-The format of the `path` built-in command is:
-```
-    [optionalSpace]path[oneOrMoreSpace]dir[optionalSpace] (and possibly more directories, space separated)
-```
-
-A typical usage would be like this:
-
-```
-wish> path /bin /usr/bin
-```
-
-By doing this, your shell will know to look in `/bin` and `/usr/bin`
-when a user types a command, to see if it can find the proper binary to
-execute. If the user sets path to be empty, then the shell should not be able
-to run any programs unless XXX (but built-in commands, such as path, should
-still work).
-
-## Redirection
-
-Many times, a shell user prefers to send the output of his/her program to a
-file rather than to the screen. Usually, a shell provides this nice feature
-with the `>` character. Formally this is named as redirection of standard
+Many times, a shell user prefers to send the output of a program to a file
+rather than to the screen. Usually, a shell provides this nice feature with
+the `>` character. Formally this is named as redirection of standard
 output. To make your shell users happy, your shell should also include this
 feature, but with a slight twist (explained below).

 For example, if a user types `ls -la /tmp > output`, nothing should be printed
 on the screen. Instead, the standard output of the `ls` program should be
-rerouted to the `output.out` file. In addition, the standard error output of
-the file should be rerouted to the file `output.err` (the twist is that this
+rerouted to the file `output`. In addition, the standard error output of
+the file should be rerouted to the file `output` (the twist is that this
 is a little different than standard redirection).

-If the `output.out` or `output.err` files already exists before you run your
-program, you should simple overwrite them (after truncating).  If the output
-file is not specified (e.g., the user types `ls >` without a file), you should
-print an error message and not run the program `ls`.
+If the `output` file exists before you run your program, you should simple
+overwrite them (after truncating it).  

-Here are some redirections that should **not** work:
-```
-ls > out1 out2
-ls > out1 out2 out3
-ls > out1 > out2
-```
+The exact format of redirection is a command (and possibly some arguments)
+followed by the redirection symbol followed by a filename. Multiple
+redirection operators or multiple files to the right of the redirection sign
+are errors.

 Note: don't worry about redirection for built-in commands (e.g., we will
 not test what happens when you type `path /bin > file`).

-## Parallel Commands
+### Parallel Commands

-Your shell will also allow the user to launch parallel commands. 
+Your shell will also allow the user to launch parallel commands. This is
+accomplished with the ampersand operator as follows:
+
+```
+wish> cmd1 & cmd2 args1 args2 & cmd3 args1
+```
+
+In this case, instead of running `cmd1` and then waiting for it to finish,
+your shell should run `cmd1`, `cmd2`, and `cmd3` (each with whatever arguments
+the user has passed to it).


-## Program Errors
+### Program Errors

 **The one and only error message.** You should print this one and only error
 message whenever you encounter an error of any type:
@@ -172,8 +189,8 @@ message whenever you encounter an error of any type:
    write(STDERR_FILENO, error_message, strlen(error_message)); 
 ```

-The error message should be printed to stderr (standard error). Also, 
-do not add whitespaces or tabs or extra error messages.
+The error message should be printed to stderr (standard error), as shown
+above. 

 There is a difference between errors that your shell catches and those that
 the program catches. Your shell should catch all the syntax errors specified
@@ -183,178 +200,32 @@ invalid arguments to `ls` when you run it, for example), let the program
 prints its specific error messages in any manner it desires (e.g., could be
 stdout or stderr).

-## White Spaces
-
-The `>` operator will be separated by spaces.  Valid input may include the
-following:
-
-```
-wish> ls
-wish> ls > a
-wish>    ls    > a
-```
-
-But not this (it is ok if this works, it just doesn't have to):
-
-```
-wish> ls>a
-```
-
-
-## Defensive Programming and Error Messages
-
-Defensive programming is good for you, so do it! It is also required. Your
-program should check all parameters, error-codes, etc. before it trusts
-them. In general, there should be no circumstances in which your C program
-will core dump, hang indefinitely, or prematurely terminate. Therefore, your
-program must respond to all input in a reasonable manner; by "reasonable",
-we mean print the error message (as specified in the next paragraph) and
-either continue processing or exit, depending upon the situation. 
-
-Since your code will be graded with automated testing, you should print this
-*one and only error message* whenever you encounter an error of any type:
-
-```
-    char error_message\[30\] = \"An error has occurred\\n\";
-    write(STDERR_FILENO, error_message, strlen(error_message)); 
-```
-
-For this project, the error message should be printed to **stderr**.  Also, do
-not attempt to add whitespaces or tabs or extra error messages.
-
-You should consider the following situations as errors; in each case, your
-shell should print the error message to stderr and exit gracefully:
-
-* An incorrect number of command line arguments to your shell program.
-
-For the following situation, you should print the error message to
-stderr and continue processing:
-
-*  A command does not exist or cannot be executed.
-*  A very long command line (over 128 bytes).
-
-Your shell should also be able to handle the following scenarios below, which
-are *not errors.*
-
-* An empty command line.
-* Multiple white spaces on a command line.
-
-## Hints
-
-Writing your shell in a simple manner is a matter of finding the relevant
-library routines and calling them properly.  To simplify things for you in
-this assignment, we will suggest a few library routines you may want to use to
-make your coding easier. You are free to use these routines if you want or to
-disregard our suggestions. To find information on these library routines, look
-at the manual pages.]
-
-### Basic Shell
-
-**Parsing:** For reading lines of input, once again check out `getline()`. To
-open a file and get a handle with type `FILE *`, look into `fopen()`. Be sure
-to check the return code of these routines for errors!  You may find the
-`strtok()` routine useful for parsing the command line (i.e., for extracting
-the arguments within a command separated by whitespaces).  
-
-**Executing Commands:** Look into `fork`, `exec`, and `wait/waitpid`.  See the
-man pages for these functions, and also read [book chapter](http://www.ostep.org/cpu-api.pdf).
-
-You will note that there are a variety of commands in the `exec` family; for
-this project, you must use `execv`. You should **not** use the `system()`
-library function call to run a command.  Remember that if `execv()` is
-successful, it will not return; if it does return, there was an error (e.g.,
-the command does not exist). The most challenging part is getting the
-arguments correctly specified. The first argument specifies the program that
-should be executed, with the full path specified; this is
-straight-forward. The second argument, `char *argv[]` matches those
-that the program sees in its function prototype: 
-
-```c
-int main(int argc, char *argv[]);
-```
-
-Note that this argument is an array of strings, or an array of
-pointers to characters. For example, if you invoke a program with:
-
-```
-foo 205 535 
-```
-
-Assuming that you find `foo` in directory `/bin` (or elsewhere in the defined
-path), then argv[0] = "/bin/foo", argv[1] = "205" and argv[2] = "535".
-
-Important: the list of arguments must be terminated with a NULL pointer; in
-our example, this means argv[3] = NULL. We strongly recommend that you
-carefully check that you are constructing this array correctly! 
-
-### Built-in Commands
-
-For the `exit` built-in command, you should simply call `exit()` from within
-your source code.  The corresponding shell process will exit, and the parent
-(i.e. your shell) will be notified.
-
-For managing the current working directory, you should use `getenv(),
-`chdir()`, and `getcwd()`. The `getenv()` call is useful when you want to go
-to your HOME directory. The `getcwd()` call is useful to know the current
-working directory, i.e., if a user types `pwd`, you simply call `getcwd()` and
-use those results. Finally, `chdir` is useful for moving to different
-directories. For more information on these topics, read the man pages or the
-Advanced Unix Programming book (Chapters 4 and 7) or look around online.
-
-### Redirection
-
-Redirection is relatively easy to implement. For example, to redirect standard
-output to a file, just use `close()` on stdout, and then `open()` on a
-file. More on this below.
-
-With a file descriptor, you can perform read and write to a file. Maybe in
-your life so far, you have only used `fopen()`, `fread()`, and `fwrite()` for
-reading and writing to a file. Unfortunately, these functions work on `FILE
-*`, which is more of a C library support; the file descriptors are hidden. 
-
-To work on a file descriptor, you should use `open()`, `read()`, and `write()`
-system calls. These functions perform their work by using file descriptors.
-To understand more about file I/O and file descriptors you can read the
-Advanced Unix Programming book (Chapter 3) (specifically, 3.2 to 3.5, 3.7,
-3.8, and 3.12), or just read the man pages. Before reading forward, at this
-point, you should become more familiar file descriptors.
-
-The idea of redirection is to make the stdout descriptor point to your output
-file descriptor. First of all, let's understand the STDOUT_FILENO file
-descriptor.  When a command `ls -la /tmp` runs, the `ls` program prints its
-output to the screen. But obviously, the ls program does not know what a
-screen is. All it knows is that the screen is basically pointed by the
-STDOUT_FILENO file descriptor. In other words, you could rewrite
-`printf("hi");` in this way: `write(STDOUT_FILENO, "hi", 2);`.
-
-To check if a particular file exists in a directory, use the `stat()` system
-call. For example, when the user types `ls`, and path is set to include both
-`/bin` and `/usr/bin`, try `stat("/bin/ls")`. If that fails, try
-`stat("/usr/bin/ls")`. If that fails too, print the **only error message**.
-
 ### Miscellaneous Hints

 Remember to get the **basic functionality** of your shell working before
 worrying about all of the error conditions and end cases. For example, first
 get a single command running (probably first a command with no arguments, such
-as `ls`). Then try adding more arguments.
+as `ls`). 

-Next, try working on multiple commands.  Make sure that you are correctly
-handling all of the cases where there is miscellaneous white space around
-commands or missing commands. Next, add built-in commands. Finally, add
-redirection support. 
+Next, add built-in commands. Then, try working on redirection. Finally, think
+about parallel commands. Each of these requires a little more effort on
+parsing, but each should not be too hard to implement.

-We strongly recommend that you check the return codes of all system
-calls from the very beginning of your work. This will often catch
-errors in how you are invoking these new system calls. And, it's just good
-programming sense.
+At some point, you should make sure your code is robust to white space of
+various kinds, including spaces (` `) and tabs (`\t`). In general, the user
+should be able to put variable amounts of white space before and after
+commands, arguments, and various operators; however, the operators
+(redirection and parallel commands) do not require whitespace.

-Beat up your own code! You are the best (and in this case, the
-only) tester of this code. Throw lots of junk at it and make sure the
-shell behaves well. Good code comes through testing -- you must run
-all sorts of different tests to make sure things work as
-desired. Don't be gentle -- other users certainly won't be. Break it
-now so we don't have to break it later.
+Check the return codes of all system calls from the very beginning of your
+work. This will often catch errors in how you are invoking these new system
+calls. It's also just good programming sense.
+
+Beat up your own code! You are the best (and in this case, the only) tester of
+this code. Throw lots of junk at it and make sure the shell behaves well. Good
+code comes through testing -- you must run all sorts of different tests to
+make sure things work as desired. Don't be gentle -- other users certainly
+won't be. Break it now so we don't have to break it later.

 Keep versions of your code. More advanced programmers will use a source
 control system such as git. Minimally, when you get a piece of functionality