Next rev of shell project

2018-02-05 10:45:56 -06:00
parent ec810a344b
commit 271c1346eb
1 changed files with 124 additions and 253 deletions
--- a/processes-shell/README.md
+++ b/processes-shell/README.md
@@ -39,130 +39,147 @@ should be `wish`:
 ```
 prompt> ./wish
 wish> 
 ```
 At this point, `wish` is running, and ready to accept commands. Type away!
 The mode above is called *interactive* mode, and allows the user to type
 commands directly. The shell also supports a *batch mode*, which instead reads
 input from a batch file and executes commands from therein. Here is how you
 run the shell with a batch file named `batch.txt`:
 ```
 prompt> ./wish batch.txt
 ```
 You should structure your shell such that it creates a new process for each
-new command (note that there are a few exceptions to this, which we discuss
+new command (the exception are *built-in commands*, discussed below).  Your
-below).  Your basic shell should be able to parse a command and run the
+basic shell should be able to parse a command and run the program
-program corresponding to the command.  For example, if the user types `ls
+corresponding to the command.  For example, if the user types `ls -la /tmp`,
-la /tmp`, your shell should run the program `/bin/ls` with the given
+your shell should run the program `/bin/ls` with the given arguments `-la` and
-arguments `-la` and `/tmp`.
+`/tmp` (how does the shell know to run `/bin/ls`? It's something called the
 shell **path**; more on this below).
-You might be wondering how the shell knows to run `/bin/ls` (which means the
+## Structure
 program binary `ls` is found in the directory `/bin`) when you type `ls`. The
 shells knows this thanks to a **path** variable that the user sets. The path
 variable contains the list of all directories to search, in order, when the
 user types a command. We'll learn more about how to deal with the path below.
-**Important:** Note that the shell itself does not *implement* `ls` or really
+### Basic Shell
 many other commands at all (it does implement a few, called *built-ins*,
 described further below). All it does is find those executables in one of the
 directories specified by `path` and create a new process to run them. More on
 this below.
 The shell is very simple (conceptually): it runs in a while loop, repeatedly
 asking for input to tell it what command to execute. It then executes that
 command. The loop continues indefinitely, until the user types the built-in
 command `exit`, at which point it exits. That's it!
 For reading lines of input, you should use `getline()`. This allows you to
 obtain arbitrarily long input lines with ease. Generally, the shell will be
 run in *interactive mode*, where the user types a command (one at a time) and
 the shell acts on it. However, your shell will also support *batch mode*, in
 which the shell is given an input file of commands; in this case, the shell
 should not read user input (from `stdin`) but rather from this file to get the
 commands to execute.
 To parse the input line into constituent pieces, you might want to use
 `strtok()`. Read the man page (carefully) for more details.
 To execute commands, look into `fork()`, `exec()`, and `wait()/waitpid()`.
 See the man pages for these functions, and also read the relevant [book
 chapter](http://www.ostep.org/cpu-api.pdf) for a brief overview.
-## Built-in Commands
+You will note that there are a variety of commands in the `exec` family; for
 this project, you must use `execv`. You should **not** use the `system()`
 library function call to run a command.  Remember that if `execv()` is
 successful, it will not return; if it does return, there was an error (e.g.,
 the command does not exist). The most challenging part is getting the
 arguments correctly specified. 
 ### Paths
 In our example above, the user typed `ls` but the shell knew to execute the
 program `/bin/ls`. How does your shell know this?
 It turns out that the user must specify a **path** variable to describe the
 set of directories to search for executables; the set of directories that
 comprise the path are sometimes called the *search path* of the shell. The
 path variable contains the list of all directories to search, in order, when
 the user types a command. 
 **Important:** Note that the shell itself does not *implement* `ls` or other
 commands (except built-ins). All it does is find those executables in one of
 the directories specified by `path` and create a new process to run them.
 To check if a particular file exists in a directory and is executable,
 consider the `access()` system call. For example, when the user types `ls`,
 and path is set to include both `/bin` and `/usr/bin`, try `access("/bin/ls",
 X_OK)`. If that fails, try "/usr/bin/ls". If that fails too, it is an error.
 ### Built-in Commands
 Whenever your shell accepts a command, it should check whether the command is
 a **built-in command** or not. If it is, it should not be executed like other
 programs. Instead, your shell will invoke your implementation of the built-in
 command. For example, to implement the `exit` built-in command, you simply
-call `exit(0);` in your C program.
+call `exit(0);` in your wish source code, which then will exit the shell.
-So far, you have added your own `exit` built-in command. Most Unix shells have
+In this project, you should implement `exit`, `cd`, `pwd`, and `path` as
-many others such as `cd`, `pwd`, etc.  In this project, you should implement
+built-in commands.
 `exit`, `cd`, `pwd`, and `path`.
-The formats for `exit`, `cd`, and `pwd` are:
+* `exit`: When the user types `exit`, your shell should simply call the `exit`
  system call with 0 as a parameter. It is an error to pass any arguments to
  `exit`. 
-```
+* `cd`: `cd` always take one argument (0 or >1 args should be signaled as an
-[optional-space]exit[optional-space]
+error). To change directories, use the `chdir()` system call with the argument
-[optional-space]pwd[optional-space]
+supplied by the user; if `chdir` fails, that is also an error.
 [optional-space]cd[optional-space]
 [optional-space]cd[oneOrMoreSpace]dir[optional-space]
 ```
-When you run `cd` (without arguments), your shell should change the working
+* `pwd`: When a user types `pwd`, your shell should call getcwd() and show the 
-directory to the path stored in the $HOME environment variable. Use the call
+result. It is an error to pass any arguments to `pwd`.
 `getenv("HOME")` in your `wish` source code to obtain this value.
-You do not have to support tilde (~). Although in a typical Unix shell you
+* `path`: The `path` command takes 0 or more arguments, with each argument
-could go to a user's directory by typing `cd ~username`, in this project you
+  separated by whitespace from the others. A typical usage would be like this:
-do not have to deal with tilde. You should treat it like a common character,
+  `wish> path /bin /usr/bin`, which would add `/bin` and `/usr/bin` to the
-i.e., you should just pass the whole word (e.g. "~username") to chdir(), and
+  search path of the shell. If the user sets path to be empty, then the shell
-chdir will return an error.
+  should not be able to run any programs (except built-in commands).
-Basically, when a user types `pwd`, you simply call getcwd(), and show the
+### Redirection
 result. When a user changes the current working directory (e.g. \"cd
 somepath\"), you simply call chdir(). Hence, if you run your shell, and then
 run pwd, it should look like this:
-```
+Many times, a shell user prefers to send the output of a program to a file
-% cd
+rather than to the screen. Usually, a shell provides this nice feature with
-% pwd
+the `>` character. Formally this is named as redirection of standard
 /afs/cs.wisc.edu/u/m/j/username
 % echo $PWD
 /u/m/j/username
 % ./wish
 wish> pwd
 /afs/cs.wisc.edu/u/m/j/username
 ```
 The format of the `path` built-in command is:
 ```
    [optionalSpace]path[oneOrMoreSpace]dir[optionalSpace] (and possibly more directories, space separated)
 ```
 A typical usage would be like this:
 ```
 wish> path /bin /usr/bin
 ```
 By doing this, your shell will know to look in `/bin` and `/usr/bin`
 when a user types a command, to see if it can find the proper binary to
 execute. If the user sets path to be empty, then the shell should not be able
 to run any programs unless XXX (but built-in commands, such as path, should
 still work).
 ## Redirection
 Many times, a shell user prefers to send the output of his/her program to a
 file rather than to the screen. Usually, a shell provides this nice feature
 with the `>` character. Formally this is named as redirection of standard
 output. To make your shell users happy, your shell should also include this
 feature, but with a slight twist (explained below).
 For example, if a user types `ls -la /tmp > output`, nothing should be printed
 on the screen. Instead, the standard output of the `ls` program should be
-rerouted to the `output.out` file. In addition, the standard error output of
+rerouted to the file `output`. In addition, the standard error output of
-the file should be rerouted to the file `output.err` (the twist is that this
+the file should be rerouted to the file `output` (the twist is that this
 is a little different than standard redirection).
-If the `output.out` or `output.err` files already exists before you run your
+If the `output` file exists before you run your program, you should simple
-program, you should simple overwrite them (after truncating).  If the output
+overwrite them (after truncating it).  
 file is not specified (e.g., the user types `ls >` without a file), you should
 print an error message and not run the program `ls`.
-Here are some redirections that should **not** work:
+The exact format of redirection is a command (and possibly some arguments)
-```
+followed by the redirection symbol followed by a filename. Multiple
-ls > out1 out2
+redirection operators or multiple files to the right of the redirection sign
-ls > out1 out2 out3
+are errors.
 ls > out1 > out2
 ```
 Note: don't worry about redirection for built-in commands (e.g., we will
 not test what happens when you type `path /bin > file`).
-## Parallel Commands
+### Parallel Commands
-Your shell will also allow the user to launch parallel commands. 
+Your shell will also allow the user to launch parallel commands. This is
 accomplished with the ampersand operator as follows:
 ```
 wish> cmd1 & cmd2 args1 args2 & cmd3 args1
 ```
 In this case, instead of running `cmd1` and then waiting for it to finish,
 your shell should run `cmd1`, `cmd2`, and `cmd3` (each with whatever arguments
 the user has passed to it).
-## Program Errors
+### Program Errors
 **The one and only error message.** You should print this one and only error
 message whenever you encounter an error of any type:
@@ -172,8 +189,8 @@ message whenever you encounter an error of any type:
    write(STDERR_FILENO, error_message, strlen(error_message)); 
 ```
-The error message should be printed to stderr (standard error). Also, 
+The error message should be printed to stderr (standard error), as shown
-do not add whitespaces or tabs or extra error messages.
+above. 
 There is a difference between errors that your shell catches and those that
 the program catches. Your shell should catch all the syntax errors specified
@@ -183,178 +200,32 @@ invalid arguments to `ls` when you run it, for example), let the program
 prints its specific error messages in any manner it desires (e.g., could be
 stdout or stderr).
 ## White Spaces
 The `>` operator will be separated by spaces.  Valid input may include the
 following:
 ```
 wish> ls
 wish> ls > a
 wish>    ls    > a
 ```
 But not this (it is ok if this works, it just doesn't have to):
 ```
 wish> ls>a
 ```
 ## Defensive Programming and Error Messages
 Defensive programming is good for you, so do it! It is also required. Your
 program should check all parameters, error-codes, etc. before it trusts
 them. In general, there should be no circumstances in which your C program
 will core dump, hang indefinitely, or prematurely terminate. Therefore, your
 program must respond to all input in a reasonable manner; by "reasonable",
 we mean print the error message (as specified in the next paragraph) and
 either continue processing or exit, depending upon the situation. 
 Since your code will be graded with automated testing, you should print this
 *one and only error message* whenever you encounter an error of any type:
 ```
    char error_message\[30\] = \"An error has occurred\\n\";
    write(STDERR_FILENO, error_message, strlen(error_message)); 
 ```
 For this project, the error message should be printed to **stderr**.  Also, do
 not attempt to add whitespaces or tabs or extra error messages.
 You should consider the following situations as errors; in each case, your
 shell should print the error message to stderr and exit gracefully:
 * An incorrect number of command line arguments to your shell program.
 For the following situation, you should print the error message to
 stderr and continue processing:
 *  A command does not exist or cannot be executed.
 *  A very long command line (over 128 bytes).
 Your shell should also be able to handle the following scenarios below, which
 are *not errors.*
 * An empty command line.
 * Multiple white spaces on a command line.
 ## Hints
 Writing your shell in a simple manner is a matter of finding the relevant
 library routines and calling them properly.  To simplify things for you in
 this assignment, we will suggest a few library routines you may want to use to
 make your coding easier. You are free to use these routines if you want or to
 disregard our suggestions. To find information on these library routines, look
 at the manual pages.]
 ### Basic Shell
 **Parsing:** For reading lines of input, once again check out `getline()`. To
 open a file and get a handle with type `FILE *`, look into `fopen()`. Be sure
 to check the return code of these routines for errors!  You may find the
 `strtok()` routine useful for parsing the command line (i.e., for extracting
 the arguments within a command separated by whitespaces).  
 **Executing Commands:** Look into `fork`, `exec`, and `wait/waitpid`.  See the
 man pages for these functions, and also read [book chapter](http://www.ostep.org/cpu-api.pdf).
 You will note that there are a variety of commands in the `exec` family; for
 this project, you must use `execv`. You should **not** use the `system()`
 library function call to run a command.  Remember that if `execv()` is
 successful, it will not return; if it does return, there was an error (e.g.,
 the command does not exist). The most challenging part is getting the
 arguments correctly specified. The first argument specifies the program that
 should be executed, with the full path specified; this is
 straight-forward. The second argument, `char *argv[]` matches those
 that the program sees in its function prototype: 
 ```c
 int main(int argc, char *argv[]);
 ```
 Note that this argument is an array of strings, or an array of
 pointers to characters. For example, if you invoke a program with:
 ```
 foo 205 535 
 ```
 Assuming that you find `foo` in directory `/bin` (or elsewhere in the defined
 path), then argv[0] = "/bin/foo", argv[1] = "205" and argv[2] = "535".
 Important: the list of arguments must be terminated with a NULL pointer; in
 our example, this means argv[3] = NULL. We strongly recommend that you
 carefully check that you are constructing this array correctly! 
 ### Built-in Commands
 For the `exit` built-in command, you should simply call `exit()` from within
 your source code.  The corresponding shell process will exit, and the parent
 (i.e. your shell) will be notified.
 For managing the current working directory, you should use `getenv(),
 `chdir()`, and `getcwd()`. The `getenv()` call is useful when you want to go
 to your HOME directory. The `getcwd()` call is useful to know the current
 working directory, i.e., if a user types `pwd`, you simply call `getcwd()` and
 use those results. Finally, `chdir` is useful for moving to different
 directories. For more information on these topics, read the man pages or the
 Advanced Unix Programming book (Chapters 4 and 7) or look around online.
 ### Redirection
 Redirection is relatively easy to implement. For example, to redirect standard
 output to a file, just use `close()` on stdout, and then `open()` on a
 file. More on this below.
 With a file descriptor, you can perform read and write to a file. Maybe in
 your life so far, you have only used `fopen()`, `fread()`, and `fwrite()` for
 reading and writing to a file. Unfortunately, these functions work on `FILE
 *`, which is more of a C library support; the file descriptors are hidden. 
 To work on a file descriptor, you should use `open()`, `read()`, and `write()`
 system calls. These functions perform their work by using file descriptors.
 To understand more about file I/O and file descriptors you can read the
 Advanced Unix Programming book (Chapter 3) (specifically, 3.2 to 3.5, 3.7,
 3.8, and 3.12), or just read the man pages. Before reading forward, at this
 point, you should become more familiar file descriptors.
 The idea of redirection is to make the stdout descriptor point to your output
 file descriptor. First of all, let's understand the STDOUT_FILENO file
 descriptor.  When a command `ls -la /tmp` runs, the `ls` program prints its
 output to the screen. But obviously, the ls program does not know what a
 screen is. All it knows is that the screen is basically pointed by the
 STDOUT_FILENO file descriptor. In other words, you could rewrite
 `printf("hi");` in this way: `write(STDOUT_FILENO, "hi", 2);`.
 To check if a particular file exists in a directory, use the `stat()` system
 call. For example, when the user types `ls`, and path is set to include both
 `/bin` and `/usr/bin`, try `stat("/bin/ls")`. If that fails, try
 `stat("/usr/bin/ls")`. If that fails too, print the **only error message**.
 ### Miscellaneous Hints
 Remember to get the **basic functionality** of your shell working before
 worrying about all of the error conditions and end cases. For example, first
 get a single command running (probably first a command with no arguments, such
-as `ls`). Then try adding more arguments.
+as `ls`). 
-Next, try working on multiple commands.  Make sure that you are correctly
+Next, add built-in commands. Then, try working on redirection. Finally, think
-handling all of the cases where there is miscellaneous white space around
+about parallel commands. Each of these requires a little more effort on
-commands or missing commands. Next, add built-in commands. Finally, add
+parsing, but each should not be too hard to implement.
 redirection support. 
-We strongly recommend that you check the return codes of all system
+At some point, you should make sure your code is robust to white space of
-calls from the very beginning of your work. This will often catch
+various kinds, including spaces (` `) and tabs (`\t`). In general, the user
-errors in how you are invoking these new system calls. And, it's just good
+should be able to put variable amounts of white space before and after
-programming sense.
+commands, arguments, and various operators; however, the operators
 (redirection and parallel commands) do not require whitespace.
-Beat up your own code! You are the best (and in this case, the
+Check the return codes of all system calls from the very beginning of your
-only) tester of this code. Throw lots of junk at it and make sure the
+work. This will often catch errors in how you are invoking these new system
-shell behaves well. Good code comes through testing -- you must run
+calls. It's also just good programming sense.
-all sorts of different tests to make sure things work as
+
-desired. Don't be gentle -- other users certainly won't be. Break it
+Beat up your own code! You are the best (and in this case, the only) tester of
-now so we don't have to break it later.
+this code. Throw lots of junk at it and make sure the shell behaves well. Good
 code comes through testing -- you must run all sorts of different tests to
 make sure things work as desired. Don't be gentle -- other users certainly
 won't be. Break it now so we don't have to break it later.
 Keep versions of your code. More advanced programmers will use a source
 control system such as git. Minimally, when you get a piece of functionality