ostep-projects/processes-shell/README.md


# Unix Shell

In this project, you'll build a simple Unix shell. The shell is the heart of
the command-line interface, and thus is central to the Unix/C programming
environment. Mastering use of the shell is necessary to become proficient in
this world; knowing how the shell itself is built is the focus of this
project.

There are three specific objectives to this assignment:

* To further familiarize yourself with the Linux programming environment.
* To learn how processes are created, destroyed, and managed.
* To gain exposure to the necessary functionality in shells.

## Overview

In this assignment, you will implement a *command line interpreter (CLI)* or,
as it is more commonly known, a *shell*. The shell should operate in this
basic way: when you type in a command (in response to its prompt), the shell
creates a child process that executes the command you entered and then prompts
for more user input when it has finished.

The shells you implement will be similar to, but simpler than, the one you run
every day in Unix. You can find out which shell you are running by typing
**echo $SHELL**] at a prompt. You may then wish to look at the man pages for
the shell you are running (probably bash) to learn more about all of the
functionality that can be present. For this project, you do not need to
implement too much functionality.

## Program Specifications##

### Basic Shell: WiSH

Your basic shell, called **wish**, is basically an interactive loop: it
repeatedly prints a prompt `wish> ` (note the space after the
greater-than sign), parses the input, executes the command specified on that
line of input, and waits for the command to finish. This is repeated until the
user types `exit`.  The name of your final executable should be `wish`:

```
prompt> ./wish
```

You should structure your shell such that it creates a new process for each
new command (note that there are a few exceptions to this, which we discuss
below). There are two advantages of creating a new process. First, it protects
the main shell process from any errors that occur in the new command. Second,
it allows for concurrency; that is, multiple commands can be started and
allowed to execute simultaneously.

Your basic shell should be able to parse a command, and run the program
corresponding to the command.  For example, if the user types `ls -la /tmp`,
your shell should run the program `/bin/ls` with all the given arguments.

You might be wondering how the shell knows to run `/bin/ls` (which means the
program binary `ls` is found in the directory `/bin`) when you type `ls`. The
shells knows this thanks to a **path** variable that the user sets. The path
variable contains the list of all directories to search, in order, when the
user types a command. We'll learn more about how to deal with the path below.

**Important:** Note that the shell itself does not *implement* `code ls` or
really many other commands at all. All it does is find those executables in
one of the directories specified by `path` and create a new process to
run them. More on this below.

## Built-in Commands

Whenever your shell accepts a command, it should check whether the command is
a **built-in command** or not. If it is, it should not be executed like other
programs. Instead, your shell will invoke your implementation of the built-in
command. For example, to implement the `exit` built-in command, you simply
call `exit(0);` in your C program.

So far, you have added your own `exit` built-in command. Most Unix shells have
many others such as `cd`, `pwd`, etc.  In this project, you should implement
`exit`, `cd`, `pwd`, and `path`.

The formats for `exit`, `cd`, and `pwd` are:

```
[optional-space]exit[optional-space]
[optional-space]pwd[optional-space]
[optional-space]cd[optional-space]
[optional-space]cd[oneOrMoreSpace]dir[optional-space]
```

When you run `cd` (without arguments), your shell should change the working
directory to the path stored in the $HOME environment variable. Use the call
`getenv("HOME")` in your `wish` source code to obtain this value.

You do not have to support tilde (~). Although in a typical Unix shell you
could go to a user's directory by typing `cd ~username`, in this project you
do not have to deal with tilde. You should treat it like a common character,
i.e., you should just pass the whole word (e.g. "~username") to chdir(), and
chdir will return an error.

Basically, when a user types `pwd`, you simply call getcwd(), and show the
result. When a user changes the current working directory (e.g. \"cd
somepath\"), you simply call chdir(). Hence, if you run your shell, and then
run pwd, it should look like this:

```
% cd
% pwd
/afs/cs.wisc.edu/u/m/j/username
% echo $PWD
/u/m/j/username
% ./wish
wish> pwd
/afs/cs.wisc.edu/u/m/j/username
```

The format of the `path` built-in command is:
```
    [optionalSpace]path[oneOrMoreSpace]dir[optionalSpace] (and possibly more directories, space separated)
```

A typical usage would be like this:

```
wish> path /bin /usr/bin
```

By doing this, your shell will know to look in `/bin` and `/usr/bin`
when a user types a command, to see if it can find the proper binary to
execute. If the user sets path to be empty, then the shell should not be able
to run any programs unless XXX (but built-in commands, such as path, should
still work).

## Redirection

Many times, a shell user prefers to send the output of his/her program to a
file rather than to the screen. Usually, a shell provides this nice feature
with the `>` character. Formally this is named as redirection of standard
output. To make your shell users happy, your shell should also include this
feature, but with a slight twist (explained below).

For example, if a user types `ls -la /tmp > output`, nothing should be printed
on the screen. Instead, the standard output of the `ls` program should be
rerouted to the `output.out` file. In addition, the standard error output of
the file should be rerouted to the file `output.err` (the twist is that this
is a little different than standard redirection).

If the `output.out` or `output.err` files already exists before you run your
program, you should simple overwrite them (after truncating).  If the output
file is not specified (e.g., the user types `ls >` without a file), you should
print an error message and not run the program `ls`.

Here are some redirections that should **not** work:
```
ls > out1 out2
ls > out1 out2 out3
ls > out1 > out2
```

Note: don't worry about redirection for built-in commands (e.g., we will
not test what happens when you type `path /bin > file`).

## Parallel Commands

Your shell will also allow the user to launch parallel commands.


## Program Errors

**The one and only error message.** You should print this one and only error
message whenever you encounter an error of any type:

```
    char error_message[30] = "An error has occurred\n";
    write(STDERR_FILENO, error_message, strlen(error_message));
```

The error message should be printed to stderr (standard error). Also,
do not add whitespaces or tabs or extra error messages.

There is a difference between errors that your shell catches and those that
the program catches. Your shell should catch all the syntax errors specified
in this project page. If the syntax of the command looks perfect, you simply
run the specified program. If there is any program-related errors (e.g.,
invalid arguments to `ls` when you run it, for example), let the program
prints its specific error messages in any manner it desires (e.g., could be
stdout or stderr).

## White Spaces

The `>` operator will be separated by spaces.  Valid input may include the
following:

```
wish> ls
wish> ls > a
wish>    ls    > a
```

But not this (it is ok if this works, it just doesn't have to):

```
wish> ls>a
```


## Defensive Programming and Error Messages

Defensive programming is good for you, so do it! It is also required. Your
program should check all parameters, error-codes, etc. before it trusts
them. In general, there should be no circumstances in which your C program
will core dump, hang indefinitely, or prematurely terminate. Therefore, your
program must respond to all input in a reasonable manner; by "reasonable",
we mean print the error message (as specified in the next paragraph) and
either continue processing or exit, depending upon the situation.

Since your code will be graded with automated testing, you should print this
*one and only error message* whenever you encounter an error of any type:

```
    char error_message\[30\] = \"An error has occurred\\n\";
    write(STDERR_FILENO, error_message, strlen(error_message));

For this project, the error message should be printed to **stderr**.  Also, do
not attempt to add whitespaces or tabs or extra error messages.

You should consider the following situations as errors; in each case, your
shell should print the error message to stderr and exit gracefully:

* An incorrect number of command line arguments to your shell program.

For the following situation, you should print the error message to
stderr and continue processing:

*  A command does not exist or cannot be executed.
*  A very long command line (over 128 bytes).

Your shell should also be able to handle the following scenarios below, which
are *not errors.*

* An empty command line.
* Multiple white spaces on a command line.

## Hints

Writing your shell in a simple manner is a matter of finding the relevant
library routines and calling them properly.  To simplify things for you in
this assignment, we will suggest a few library routines you may want to use to
make your coding easier. You are free to use these routines if you want or to
disregard our suggestions. To find information on these library routines, look
at the manual pages.]

### Basic Shell

**Parsing:** For reading lines of input, once again check out `getline()`. To
open a file and get a handle with type `FILE *`, look into `fopen()`. Be sure
to check the return code of these routines for errors!  You may find the
`strtok()` routine useful for parsing the command line (i.e., for extracting
the arguments within a command separated by whitespaces).

**Executing Commands:** Look into `fork`, `exec`, and `wait/waitpid`.  See the
man pages for these functions, and also read [book chapter](http://www.ostep.org/cpu-api.pdf).

You will note that there are a variety of commands in the `exec` family; for
this project, you must use `execv`. You should **not** use the `system()`
library function call to run a command.  Remember that if `execv()` is
successful, it will not return; if it does return, there was an error (e.g.,
the command does not exist). The most challenging part is getting the
arguments correctly specified. The first argument specifies the program that
should be executed, with the full path specified; this is
straight-forward. The second argument, `char *argv[]` matches those
that the program sees in its function prototype:

```c
int main(int argc, char *argv[]);
```

Note that this argument is an array of strings, or an array of
pointers to characters. For example, if you invoke a program with:

```
foo 205 535
```

Assuming that you find `foo` in directory `/bin` (or elsewhere in the defined
path), then argv[0] = "/bin/foo", argv[1] = "205" and argv[2] = "535".

Important: the list of arguments must be terminated with a NULL pointer; in
our example, this means argv[3] = NULL. We strongly recommend that you
carefully check that you are constructing this array correctly!

### Built-in Commands

For the `exit` built-in command, you should simply call `exit()` from within
your source code.  The corresponding shell process will exit, and the parent
(i.e. your shell) will be notified.

For managing the current working directory, you should use `getenv(),
`chdir()`, and `getcwd()`. The `getenv()` call is useful when you want to go
to your HOME directory. The `getcwd()` call is useful to know the current
working directory, i.e., if a user types `pwd`, you simply call `getcwd()` and
use those results. Finally, `chdir` is useful for moving to different
directories. For more information on these topics, read the man pages or the
Advanced Unix Programming book (Chapters 4 and 7) or look around online.

### Redirection

Redirection is relatively easy to implement. For example, to redirect standard
output to a file, just use `close()` on stdout, and then `open()` on a
file. More on this below.

With a file descriptor, you can perform read and write to a file. Maybe in
your life so far, you have only used `fopen()`, `fread()`, and `fwrite()` for
reading and writing to a file. Unfortunately, these functions work on `FILE
*`, which is more of a C library support; the file descriptors are hidden.

To work on a file descriptor, you should use `open()`, `read()`, and `write()`
system calls. These functions perform their work by using file descriptors.
To understand more about file I/O and file descriptors you can read the
Advanced Unix Programming book (Chapter 3) (specifically, 3.2 to 3.5, 3.7,
3.8, and 3.12), or just read the man pages. Before reading forward, at this
point, you should become more familiar file descriptors.

The idea of redirection is to make the stdout descriptor point to your output
file descriptor. First of all, let's understand the STDOUT_FILENO file
descriptor.  When a command `ls -la /tmp` runs, the `ls` program prints its
output to the screen. But obviously, the ls program does not know what a
screen is. All it knows is that the screen is basically pointed by the
STDOUT_FILENO file descriptor. In other words, you could rewrite
`printf("hi");` in this way: `write(STDOUT_FILENO, "hi", 2);`.

To check if a particular file exists in a directory, use the `stat()` system
call. For example, when the user types `ls`, and path is set to include both
`/bin` and `/usr/bin`, try `stat("/bin/ls")`. If that fails, try
`stat("/usr/bin/ls")`. If that fails too, print the **only error message**.

### Miscellaneous Hints

Remember to get the **basic functionality** of your shell working before
worrying about all of the error conditions and end cases. For example, first
get a single command running (probably first a command with no arguments, such
as `ls`). Then try adding more arguments.

Next, try working on multiple commands.  Make sure that you are correctly
handling all of the cases where there is miscellaneous white space around
commands or missing commands. Next, add built-in commands. Finally, add
redirection support.

We strongly recommend that you check the return codes of all system
calls from the very beginning of your work. This will often catch
errors in how you are invoking these new system calls. And, it's just good
programming sense.

Beat up your own code! You are the best (and in this case, the
only) tester of this code. Throw lots of junk at it and make sure the
shell behaves well. Good code comes through testing -- you must run
all sorts of different tests to make sure things work as
desired. Don't be gentle -- other users certainly won't be. Break it
now so we don't have to break it later.

Keep versions of your code. More advanced programmers will use a source
control system such as git. Minimally, when you get a piece of functionality
working, make a copy of your .c file (perhaps a subdirectory with a version
number, such as v1, v2, etc.). By keeping older, working versions around, you
can comfortably work on adding new functionality, safe in the knowledge you
can always go back to an older, working version if need be.