From a21e6832dd4bddad0f55bf22f2d75e5eaab43543 Mon Sep 17 00:00:00 2001
From: Remzi Arpaci-Dusseau <remzi.arpacidusseau@gmail.com>
Date: Thu, 1 Feb 2018 14:54:05 -0600
Subject: [PATCH] Initial cut at shell; missing lots of stuff

---
 README.md                 |   6 +-
 processes-shell/README.md | 364 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 368 insertions(+), 2 deletions(-)
 create mode 100644 processes-shell/README.md

diff --git a/README.md b/README.md
index bec9fad..11a6ef3 100644
--- a/README.md
+++ b/README.md
@@ -18,11 +18,13 @@ Realize the best thing you can do to learn to program in any environment is to
 program **a lot**. These small projects are only the beginning of that
 journey; you'll have to do more on your own to truly become proficient.
 
-* [Unix Utilities](https://github.com/remzi-arpacidusseau/ostep-projects/tree/master/initial-utilities)
+* [Unix Utilities](https://github.com/remzi-arpacidusseau/ostep-projects/tree/master/initial-utilities) (cat, grep, zip/unzip)
+* Sort (text-based)
+* Sort (binary)
 
 ### Processes and Scheduling
 
-* Shell 
+* [Shell]
 
 ### Virtual Memory
 
diff --git a/processes-shell/README.md b/processes-shell/README.md
new file mode 100644
index 0000000..733821c
--- /dev/null
+++ b/processes-shell/README.md
@@ -0,0 +1,364 @@
+
+# Unix Shell
+
+In this project, you'll build a simple Unix shell. The shell is the heart of
+the command-line interface, and thus is central to the Unix/C programming
+environment. Mastering use of the shell is necessary to become proficient in
+this world; knowing how the shell itself is built is the focus of this
+project.
+
+There are three specific objectives to this assignment:
+
+* To further familiarize yourself with the Linux programming environment.
+* To learn how processes are created, destroyed, and managed.
+* To gain exposure to the necessary functionality in shells.
+
+## Overview
+
+In this assignment, you will implement a *command line interpreter (CLI)* or,
+as it is more commonly known, a *shell*. The shell should operate in this
+basic way: when you type in a command (in response to its prompt), the shell
+creates a child process that executes the command you entered and then prompts
+for more user input when it has finished.
+
+The shells you implement will be similar to, but simpler than, the one you run
+every day in Unix. You can find out which shell you are running by typing
+**echo $SHELL**] at a prompt. You may then wish to look at the man pages for
+the shell you are running (probably bash) to learn more about all of the
+functionality that can be present. For this project, you do not need to
+implement too much functionality.  
+
+## Program Specifications##
+
+### Basic Shell: WiSH
+
+Your basic shell, called **wish**, is basically an interactive loop: it
+repeatedly prints a prompt `wish> ` (note the space after the
+greater-than sign), parses the input, executes the command specified on that
+line of input, and waits for the command to finish. This is repeated until the
+user types `exit`.  The name of your final executable should be `wish`:
+
+```
+prompt> ./wish
+```
+
+You should structure your shell such that it creates a new process for each
+new command (note that there are a few exceptions to this, which we discuss
+below). There are two advantages of creating a new process. First, it protects
+the main shell process from any errors that occur in the new command. Second,
+it allows for concurrency; that is, multiple commands can be started and
+allowed to execute simultaneously. 
+
+Your basic shell should be able to parse a command, and run the program
+corresponding to the command.  For example, if the user types `ls -la /tmp`,
+your shell should run the program `/bin/ls` with all the given arguments.
+
+You might be wondering how the shell knows to run `/bin/ls` (which means the
+program binary `ls` is found in the directory `/bin`) when you type `ls`. The
+shells knows this thanks to a **path** variable that the user sets. The path
+variable contains the list of all directories to search, in order, when the
+user types a command. We'll learn more about how to deal with the path below.
+
+**Important:** Note that the shell itself does not *implement* `code ls` or
+really many other commands at all. All it does is find those executables in
+one of the directories specified by `path` and create a new process to
+run them. More on this below.
+
+## Built-in Commands
+
+Whenever your shell accepts a command, it should check whether the command is
+a **built-in command** or not. If it is, it should not be executed like other
+programs. Instead, your shell will invoke your implementation of the built-in
+command. For example, to implement the `exit` built-in command, you simply
+call `exit(0);` in your C program.
+
+So far, you have added your own `exit` built-in command. Most Unix shells have
+many others such as `cd`, `pwd`, etc.  In this project, you should implement
+`exit`, `cd`, `pwd`, and `path`.
+
+The formats for `exit`, `cd`, and `pwd` are:
+
+```
+[optional-space]exit[optional-space]
+[optional-space]pwd[optional-space]
+[optional-space]cd[optional-space]
+[optional-space]cd[oneOrMoreSpace]dir[optional-space]
+```
+
+When you run `cd` (without arguments), your shell should change the working
+directory to the path stored in the $HOME environment variable. Use the call
+`getenv("HOME")` in your `wish` source code to obtain this value.
+
+You do not have to support tilde (~). Although in a typical Unix shell you
+could go to a user's directory by typing `cd ~username`, in this project you
+do not have to deal with tilde. You should treat it like a common character,
+i.e., you should just pass the whole word (e.g. "~username") to chdir(), and
+chdir will return an error.
+
+Basically, when a user types `pwd`, you simply call getcwd(), and show the
+result. When a user changes the current working directory (e.g. \"cd
+somepath\"), you simply call chdir(). Hence, if you run your shell, and then
+run pwd, it should look like this:
+
+```
+% cd
+% pwd
+/afs/cs.wisc.edu/u/m/j/username
+% echo $PWD
+/u/m/j/username
+% ./wish
+wish> pwd
+/afs/cs.wisc.edu/u/m/j/username
+```
+
+The format of the `path` built-in command is:
+```
+    [optionalSpace]path[oneOrMoreSpace]dir[optionalSpace] (and possibly more directories, space separated)
+```
+
+A typical usage would be like this:
+
+```
+wish> path /bin /usr/bin
+```
+
+By doing this, your shell will know to look in `/bin` and `/usr/bin`
+when a user types a command, to see if it can find the proper binary to
+execute. If the user sets path to be empty, then the shell should not be able
+to run any programs unless XXX (but built-in commands, such as path, should
+still work).
+
+## Redirection
+
+Many times, a shell user prefers to send the output of his/her program to a
+file rather than to the screen. Usually, a shell provides this nice feature
+with the `>` character. Formally this is named as redirection of standard
+output. To make your shell users happy, your shell should also include this
+feature, but with a slight twist (explained below).
+
+For example, if a user types `ls -la /tmp > output`, nothing should be printed
+on the screen. Instead, the standard output of the `ls` program should be
+rerouted to the `output.out` file. In addition, the standard error output of
+the file should be rerouted to the file `output.err` (the twist is that this
+is a little different than standard redirection).
+
+If the `output.out` or `output.err` files already exists before you run your
+program, you should simple overwrite them (after truncating).  If the output
+file is not specified (e.g., the user types `ls >` without a file), you should
+print an error message and not run the program `ls`.
+
+Here are some redirections that should **not** work:
+```
+ls > out1 out2
+ls > out1 out2 out3
+ls > out1 > out2
+```
+
+Note: don't worry about redirection for built-in commands (e.g., we will
+not test what happens when you type `path /bin > file`).
+
+## Parallel Commands
+
+Your shell will also allow the user to launch parallel commands. 
+
+
+## Program Errors
+
+**The one and only error message.** You should print this one and only error
+message whenever you encounter an error of any type:
+
+```
+    char error_message[30] = "An error has occurred\n";
+    write(STDERR_FILENO, error_message, strlen(error_message)); 
+```
+
+The error message should be printed to stderr (standard error). Also, 
+do not add whitespaces or tabs or extra error messages.
+
+There is a difference between errors that your shell catches and those that
+the program catches. Your shell should catch all the syntax errors specified
+in this project page. If the syntax of the command looks perfect, you simply
+run the specified program. If there is any program-related errors (e.g.,
+invalid arguments to `ls` when you run it, for example), let the program
+prints its specific error messages in any manner it desires (e.g., could be
+stdout or stderr).
+
+## White Spaces
+
+The `>` operator will be separated by spaces.  Valid input may include the
+following:
+
+```
+wish> ls
+wish> ls > a
+wish>    ls    > a
+```
+
+But not this (it is ok if this works, it just doesn't have to):
+
+```
+wish> ls>a
+```
+
+
+## Defensive Programming and Error Messages
+
+Defensive programming is good for you, so do it! It is also required. Your
+program should check all parameters, error-codes, etc. before it trusts
+them. In general, there should be no circumstances in which your C program
+will core dump, hang indefinitely, or prematurely terminate. Therefore, your
+program must respond to all input in a reasonable manner; by "reasonable",
+we mean print the error message (as specified in the next paragraph) and
+either continue processing or exit, depending upon the situation. 
+
+Since your code will be graded with automated testing, you should print this
+*one and only error message* whenever you encounter an error of any type:
+
+```
+    char error_message\[30\] = \"An error has occurred\\n\";
+    write(STDERR_FILENO, error_message, strlen(error_message)); 
+
+For this project, the error message should be printed to **stderr**.  Also, do
+not attempt to add whitespaces or tabs or extra error messages.
+
+You should consider the following situations as errors; in each case, your
+shell should print the error message to stderr and exit gracefully:
+
+* An incorrect number of command line arguments to your shell program.
+
+For the following situation, you should print the error message to
+stderr and continue processing:
+
+*  A command does not exist or cannot be executed.
+*  A very long command line (over 128 bytes).
+
+Your shell should also be able to handle the following scenarios below, which
+are *not errors.*
+
+* An empty command line.
+* Multiple white spaces on a command line.
+
+## Hints
+
+Writing your shell in a simple manner is a matter of finding the relevant
+library routines and calling them properly.  To simplify things for you in
+this assignment, we will suggest a few library routines you may want to use to
+make your coding easier. You are free to use these routines if you want or to
+disregard our suggestions. To find information on these library routines, look
+at the manual pages.]
+
+### Basic Shell
+
+**Parsing:** For reading lines of input, once again check out `getline()`. To
+open a file and get a handle with type `FILE *`, look into `fopen()`. Be sure
+to check the return code of these routines for errors!  You may find the
+`strtok()` routine useful for parsing the command line (i.e., for extracting
+the arguments within a command separated by whitespaces).  
+
+**Executing Commands:** Look into `fork`, `exec`, and `wait/waitpid`.  See the
+man pages for these functions, and also read [book chapter](http://www.ostep.org/cpu-api.pdf).
+
+You will note that there are a variety of commands in the `exec` family; for
+this project, you must use `execv`. You should **not** use the `system()`
+library function call to run a command.  Remember that if `execv()` is
+successful, it will not return; if it does return, there was an error (e.g.,
+the command does not exist). The most challenging part is getting the
+arguments correctly specified. The first argument specifies the program that
+should be executed, with the full path specified; this is
+straight-forward. The second argument, `char *argv[]` matches those
+that the program sees in its function prototype: 
+
+```c
+int main(int argc, char *argv[]);
+```
+
+Note that this argument is an array of strings, or an array of
+pointers to characters. For example, if you invoke a program with:
+
+```
+foo 205 535 
+```
+
+Assuming that you find `foo` in directory `/bin` (or elsewhere in the defined
+path), then argv[0] = "/bin/foo", argv[1] = "205" and argv[2] = "535".
+
+Important: the list of arguments must be terminated with a NULL pointer; in
+our example, this means argv[3] = NULL. We strongly recommend that you
+carefully check that you are constructing this array correctly! 
+
+### Built-in Commands
+
+For the `exit` built-in command, you should simply call `exit()` from within
+your source code.  The corresponding shell process will exit, and the parent
+(i.e. your shell) will be notified.
+
+For managing the current working directory, you should use `getenv(),
+`chdir()`, and `getcwd()`. The `getenv()` call is useful when you want to go
+to your HOME directory. The `getcwd()` call is useful to know the current
+working directory, i.e., if a user types `pwd`, you simply call `getcwd()` and
+use those results. Finally, `chdir` is useful for moving to different
+directories. For more information on these topics, read the man pages or the
+Advanced Unix Programming book (Chapters 4 and 7) or look around online.
+
+### Redirection
+
+Redirection is relatively easy to implement. For example, to redirect standard
+output to a file, just use `close()` on stdout, and then `open()` on a
+file. More on this below.
+
+With a file descriptor, you can perform read and write to a file. Maybe in
+your life so far, you have only used `fopen()`, `fread()`, and `fwrite()` for
+reading and writing to a file. Unfortunately, these functions work on `FILE
+*`, which is more of a C library support; the file descriptors are hidden. 
+
+To work on a file descriptor, you should use `open()`, `read()`, and `write()`
+system calls. These functions perform their work by using file descriptors.
+To understand more about file I/O and file descriptors you can read the
+Advanced Unix Programming book (Chapter 3) (specifically, 3.2 to 3.5, 3.7,
+3.8, and 3.12), or just read the man pages. Before reading forward, at this
+point, you should become more familiar file descriptors.
+
+The idea of redirection is to make the stdout descriptor point to your output
+file descriptor. First of all, let's understand the STDOUT_FILENO file
+descriptor.  When a command `ls -la /tmp` runs, the `ls` program prints its
+output to the screen. But obviously, the ls program does not know what a
+screen is. All it knows is that the screen is basically pointed by the
+STDOUT_FILENO file descriptor. In other words, you could rewrite
+`printf("hi");` in this way: `write(STDOUT_FILENO, "hi", 2);`.
+
+To check if a particular file exists in a directory, use the `stat()` system
+call. For example, when the user types `ls`, and path is set to include both
+`/bin` and `/usr/bin`, try `stat("/bin/ls")`. If that fails, try
+`stat("/usr/bin/ls")`. If that fails too, print the **only error message**.
+
+### Miscellaneous Hints
+
+Remember to get the **basic functionality** of your shell working before
+worrying about all of the error conditions and end cases. For example, first
+get a single command running (probably first a command with no arguments, such
+as `ls`). Then try adding more arguments.
+
+Next, try working on multiple commands.  Make sure that you are correctly
+handling all of the cases where there is miscellaneous white space around
+commands or missing commands. Next, add built-in commands. Finally, add
+redirection support. 
+
+We strongly recommend that you check the return codes of all system
+calls from the very beginning of your work. This will often catch
+errors in how you are invoking these new system calls. And, it's just good
+programming sense.
+
+Beat up your own code! You are the best (and in this case, the
+only) tester of this code. Throw lots of junk at it and make sure the
+shell behaves well. Good code comes through testing -- you must run
+all sorts of different tests to make sure things work as
+desired. Don't be gentle -- other users certainly won't be. Break it
+now so we don't have to break it later.
+
+Keep versions of your code. More advanced programmers will use a source
+control system such as git. Minimally, when you get a piece of functionality
+working, make a copy of your .c file (perhaps a subdirectory with a version
+number, such as v1, v2, etc.). By keeping older, working versions around, you
+can comfortably work on adding new functionality, safe in the knowledge you
+can always go back to an older, working version if need be.
+