Initial cut at shell; missing lots of stuff
This commit is contained in:
364
processes-shell/README.md
Normal file
364
processes-shell/README.md
Normal file
@@ -0,0 +1,364 @@
|
||||
|
||||
# Unix Shell
|
||||
|
||||
In this project, you'll build a simple Unix shell. The shell is the heart of
|
||||
the command-line interface, and thus is central to the Unix/C programming
|
||||
environment. Mastering use of the shell is necessary to become proficient in
|
||||
this world; knowing how the shell itself is built is the focus of this
|
||||
project.
|
||||
|
||||
There are three specific objectives to this assignment:
|
||||
|
||||
* To further familiarize yourself with the Linux programming environment.
|
||||
* To learn how processes are created, destroyed, and managed.
|
||||
* To gain exposure to the necessary functionality in shells.
|
||||
|
||||
## Overview
|
||||
|
||||
In this assignment, you will implement a *command line interpreter (CLI)* or,
|
||||
as it is more commonly known, a *shell*. The shell should operate in this
|
||||
basic way: when you type in a command (in response to its prompt), the shell
|
||||
creates a child process that executes the command you entered and then prompts
|
||||
for more user input when it has finished.
|
||||
|
||||
The shells you implement will be similar to, but simpler than, the one you run
|
||||
every day in Unix. You can find out which shell you are running by typing
|
||||
**echo $SHELL**] at a prompt. You may then wish to look at the man pages for
|
||||
the shell you are running (probably bash) to learn more about all of the
|
||||
functionality that can be present. For this project, you do not need to
|
||||
implement too much functionality.
|
||||
|
||||
## Program Specifications##
|
||||
|
||||
### Basic Shell: WiSH
|
||||
|
||||
Your basic shell, called **wish**, is basically an interactive loop: it
|
||||
repeatedly prints a prompt `wish> ` (note the space after the
|
||||
greater-than sign), parses the input, executes the command specified on that
|
||||
line of input, and waits for the command to finish. This is repeated until the
|
||||
user types `exit`. The name of your final executable should be `wish`:
|
||||
|
||||
```
|
||||
prompt> ./wish
|
||||
```
|
||||
|
||||
You should structure your shell such that it creates a new process for each
|
||||
new command (note that there are a few exceptions to this, which we discuss
|
||||
below). There are two advantages of creating a new process. First, it protects
|
||||
the main shell process from any errors that occur in the new command. Second,
|
||||
it allows for concurrency; that is, multiple commands can be started and
|
||||
allowed to execute simultaneously.
|
||||
|
||||
Your basic shell should be able to parse a command, and run the program
|
||||
corresponding to the command. For example, if the user types `ls -la /tmp`,
|
||||
your shell should run the program `/bin/ls` with all the given arguments.
|
||||
|
||||
You might be wondering how the shell knows to run `/bin/ls` (which means the
|
||||
program binary `ls` is found in the directory `/bin`) when you type `ls`. The
|
||||
shells knows this thanks to a **path** variable that the user sets. The path
|
||||
variable contains the list of all directories to search, in order, when the
|
||||
user types a command. We'll learn more about how to deal with the path below.
|
||||
|
||||
**Important:** Note that the shell itself does not *implement* `code ls` or
|
||||
really many other commands at all. All it does is find those executables in
|
||||
one of the directories specified by `path` and create a new process to
|
||||
run them. More on this below.
|
||||
|
||||
## Built-in Commands
|
||||
|
||||
Whenever your shell accepts a command, it should check whether the command is
|
||||
a **built-in command** or not. If it is, it should not be executed like other
|
||||
programs. Instead, your shell will invoke your implementation of the built-in
|
||||
command. For example, to implement the `exit` built-in command, you simply
|
||||
call `exit(0);` in your C program.
|
||||
|
||||
So far, you have added your own `exit` built-in command. Most Unix shells have
|
||||
many others such as `cd`, `pwd`, etc. In this project, you should implement
|
||||
`exit`, `cd`, `pwd`, and `path`.
|
||||
|
||||
The formats for `exit`, `cd`, and `pwd` are:
|
||||
|
||||
```
|
||||
[optional-space]exit[optional-space]
|
||||
[optional-space]pwd[optional-space]
|
||||
[optional-space]cd[optional-space]
|
||||
[optional-space]cd[oneOrMoreSpace]dir[optional-space]
|
||||
```
|
||||
|
||||
When you run `cd` (without arguments), your shell should change the working
|
||||
directory to the path stored in the $HOME environment variable. Use the call
|
||||
`getenv("HOME")` in your `wish` source code to obtain this value.
|
||||
|
||||
You do not have to support tilde (~). Although in a typical Unix shell you
|
||||
could go to a user's directory by typing `cd ~username`, in this project you
|
||||
do not have to deal with tilde. You should treat it like a common character,
|
||||
i.e., you should just pass the whole word (e.g. "~username") to chdir(), and
|
||||
chdir will return an error.
|
||||
|
||||
Basically, when a user types `pwd`, you simply call getcwd(), and show the
|
||||
result. When a user changes the current working directory (e.g. \"cd
|
||||
somepath\"), you simply call chdir(). Hence, if you run your shell, and then
|
||||
run pwd, it should look like this:
|
||||
|
||||
```
|
||||
% cd
|
||||
% pwd
|
||||
/afs/cs.wisc.edu/u/m/j/username
|
||||
% echo $PWD
|
||||
/u/m/j/username
|
||||
% ./wish
|
||||
wish> pwd
|
||||
/afs/cs.wisc.edu/u/m/j/username
|
||||
```
|
||||
|
||||
The format of the `path` built-in command is:
|
||||
```
|
||||
[optionalSpace]path[oneOrMoreSpace]dir[optionalSpace] (and possibly more directories, space separated)
|
||||
```
|
||||
|
||||
A typical usage would be like this:
|
||||
|
||||
```
|
||||
wish> path /bin /usr/bin
|
||||
```
|
||||
|
||||
By doing this, your shell will know to look in `/bin` and `/usr/bin`
|
||||
when a user types a command, to see if it can find the proper binary to
|
||||
execute. If the user sets path to be empty, then the shell should not be able
|
||||
to run any programs unless XXX (but built-in commands, such as path, should
|
||||
still work).
|
||||
|
||||
## Redirection
|
||||
|
||||
Many times, a shell user prefers to send the output of his/her program to a
|
||||
file rather than to the screen. Usually, a shell provides this nice feature
|
||||
with the `>` character. Formally this is named as redirection of standard
|
||||
output. To make your shell users happy, your shell should also include this
|
||||
feature, but with a slight twist (explained below).
|
||||
|
||||
For example, if a user types `ls -la /tmp > output`, nothing should be printed
|
||||
on the screen. Instead, the standard output of the `ls` program should be
|
||||
rerouted to the `output.out` file. In addition, the standard error output of
|
||||
the file should be rerouted to the file `output.err` (the twist is that this
|
||||
is a little different than standard redirection).
|
||||
|
||||
If the `output.out` or `output.err` files already exists before you run your
|
||||
program, you should simple overwrite them (after truncating). If the output
|
||||
file is not specified (e.g., the user types `ls >` without a file), you should
|
||||
print an error message and not run the program `ls`.
|
||||
|
||||
Here are some redirections that should **not** work:
|
||||
```
|
||||
ls > out1 out2
|
||||
ls > out1 out2 out3
|
||||
ls > out1 > out2
|
||||
```
|
||||
|
||||
Note: don't worry about redirection for built-in commands (e.g., we will
|
||||
not test what happens when you type `path /bin > file`).
|
||||
|
||||
## Parallel Commands
|
||||
|
||||
Your shell will also allow the user to launch parallel commands.
|
||||
|
||||
|
||||
## Program Errors
|
||||
|
||||
**The one and only error message.** You should print this one and only error
|
||||
message whenever you encounter an error of any type:
|
||||
|
||||
```
|
||||
char error_message[30] = "An error has occurred\n";
|
||||
write(STDERR_FILENO, error_message, strlen(error_message));
|
||||
```
|
||||
|
||||
The error message should be printed to stderr (standard error). Also,
|
||||
do not add whitespaces or tabs or extra error messages.
|
||||
|
||||
There is a difference between errors that your shell catches and those that
|
||||
the program catches. Your shell should catch all the syntax errors specified
|
||||
in this project page. If the syntax of the command looks perfect, you simply
|
||||
run the specified program. If there is any program-related errors (e.g.,
|
||||
invalid arguments to `ls` when you run it, for example), let the program
|
||||
prints its specific error messages in any manner it desires (e.g., could be
|
||||
stdout or stderr).
|
||||
|
||||
## White Spaces
|
||||
|
||||
The `>` operator will be separated by spaces. Valid input may include the
|
||||
following:
|
||||
|
||||
```
|
||||
wish> ls
|
||||
wish> ls > a
|
||||
wish> ls > a
|
||||
```
|
||||
|
||||
But not this (it is ok if this works, it just doesn't have to):
|
||||
|
||||
```
|
||||
wish> ls>a
|
||||
```
|
||||
|
||||
|
||||
## Defensive Programming and Error Messages
|
||||
|
||||
Defensive programming is good for you, so do it! It is also required. Your
|
||||
program should check all parameters, error-codes, etc. before it trusts
|
||||
them. In general, there should be no circumstances in which your C program
|
||||
will core dump, hang indefinitely, or prematurely terminate. Therefore, your
|
||||
program must respond to all input in a reasonable manner; by "reasonable",
|
||||
we mean print the error message (as specified in the next paragraph) and
|
||||
either continue processing or exit, depending upon the situation.
|
||||
|
||||
Since your code will be graded with automated testing, you should print this
|
||||
*one and only error message* whenever you encounter an error of any type:
|
||||
|
||||
```
|
||||
char error_message\[30\] = \"An error has occurred\\n\";
|
||||
write(STDERR_FILENO, error_message, strlen(error_message));
|
||||
|
||||
For this project, the error message should be printed to **stderr**. Also, do
|
||||
not attempt to add whitespaces or tabs or extra error messages.
|
||||
|
||||
You should consider the following situations as errors; in each case, your
|
||||
shell should print the error message to stderr and exit gracefully:
|
||||
|
||||
* An incorrect number of command line arguments to your shell program.
|
||||
|
||||
For the following situation, you should print the error message to
|
||||
stderr and continue processing:
|
||||
|
||||
* A command does not exist or cannot be executed.
|
||||
* A very long command line (over 128 bytes).
|
||||
|
||||
Your shell should also be able to handle the following scenarios below, which
|
||||
are *not errors.*
|
||||
|
||||
* An empty command line.
|
||||
* Multiple white spaces on a command line.
|
||||
|
||||
## Hints
|
||||
|
||||
Writing your shell in a simple manner is a matter of finding the relevant
|
||||
library routines and calling them properly. To simplify things for you in
|
||||
this assignment, we will suggest a few library routines you may want to use to
|
||||
make your coding easier. You are free to use these routines if you want or to
|
||||
disregard our suggestions. To find information on these library routines, look
|
||||
at the manual pages.]
|
||||
|
||||
### Basic Shell
|
||||
|
||||
**Parsing:** For reading lines of input, once again check out `getline()`. To
|
||||
open a file and get a handle with type `FILE *`, look into `fopen()`. Be sure
|
||||
to check the return code of these routines for errors! You may find the
|
||||
`strtok()` routine useful for parsing the command line (i.e., for extracting
|
||||
the arguments within a command separated by whitespaces).
|
||||
|
||||
**Executing Commands:** Look into `fork`, `exec`, and `wait/waitpid`. See the
|
||||
man pages for these functions, and also read [book chapter](http://www.ostep.org/cpu-api.pdf).
|
||||
|
||||
You will note that there are a variety of commands in the `exec` family; for
|
||||
this project, you must use `execv`. You should **not** use the `system()`
|
||||
library function call to run a command. Remember that if `execv()` is
|
||||
successful, it will not return; if it does return, there was an error (e.g.,
|
||||
the command does not exist). The most challenging part is getting the
|
||||
arguments correctly specified. The first argument specifies the program that
|
||||
should be executed, with the full path specified; this is
|
||||
straight-forward. The second argument, `char *argv[]` matches those
|
||||
that the program sees in its function prototype:
|
||||
|
||||
```c
|
||||
int main(int argc, char *argv[]);
|
||||
```
|
||||
|
||||
Note that this argument is an array of strings, or an array of
|
||||
pointers to characters. For example, if you invoke a program with:
|
||||
|
||||
```
|
||||
foo 205 535
|
||||
```
|
||||
|
||||
Assuming that you find `foo` in directory `/bin` (or elsewhere in the defined
|
||||
path), then argv[0] = "/bin/foo", argv[1] = "205" and argv[2] = "535".
|
||||
|
||||
Important: the list of arguments must be terminated with a NULL pointer; in
|
||||
our example, this means argv[3] = NULL. We strongly recommend that you
|
||||
carefully check that you are constructing this array correctly!
|
||||
|
||||
### Built-in Commands
|
||||
|
||||
For the `exit` built-in command, you should simply call `exit()` from within
|
||||
your source code. The corresponding shell process will exit, and the parent
|
||||
(i.e. your shell) will be notified.
|
||||
|
||||
For managing the current working directory, you should use `getenv(),
|
||||
`chdir()`, and `getcwd()`. The `getenv()` call is useful when you want to go
|
||||
to your HOME directory. The `getcwd()` call is useful to know the current
|
||||
working directory, i.e., if a user types `pwd`, you simply call `getcwd()` and
|
||||
use those results. Finally, `chdir` is useful for moving to different
|
||||
directories. For more information on these topics, read the man pages or the
|
||||
Advanced Unix Programming book (Chapters 4 and 7) or look around online.
|
||||
|
||||
### Redirection
|
||||
|
||||
Redirection is relatively easy to implement. For example, to redirect standard
|
||||
output to a file, just use `close()` on stdout, and then `open()` on a
|
||||
file. More on this below.
|
||||
|
||||
With a file descriptor, you can perform read and write to a file. Maybe in
|
||||
your life so far, you have only used `fopen()`, `fread()`, and `fwrite()` for
|
||||
reading and writing to a file. Unfortunately, these functions work on `FILE
|
||||
*`, which is more of a C library support; the file descriptors are hidden.
|
||||
|
||||
To work on a file descriptor, you should use `open()`, `read()`, and `write()`
|
||||
system calls. These functions perform their work by using file descriptors.
|
||||
To understand more about file I/O and file descriptors you can read the
|
||||
Advanced Unix Programming book (Chapter 3) (specifically, 3.2 to 3.5, 3.7,
|
||||
3.8, and 3.12), or just read the man pages. Before reading forward, at this
|
||||
point, you should become more familiar file descriptors.
|
||||
|
||||
The idea of redirection is to make the stdout descriptor point to your output
|
||||
file descriptor. First of all, let's understand the STDOUT_FILENO file
|
||||
descriptor. When a command `ls -la /tmp` runs, the `ls` program prints its
|
||||
output to the screen. But obviously, the ls program does not know what a
|
||||
screen is. All it knows is that the screen is basically pointed by the
|
||||
STDOUT_FILENO file descriptor. In other words, you could rewrite
|
||||
`printf("hi");` in this way: `write(STDOUT_FILENO, "hi", 2);`.
|
||||
|
||||
To check if a particular file exists in a directory, use the `stat()` system
|
||||
call. For example, when the user types `ls`, and path is set to include both
|
||||
`/bin` and `/usr/bin`, try `stat("/bin/ls")`. If that fails, try
|
||||
`stat("/usr/bin/ls")`. If that fails too, print the **only error message**.
|
||||
|
||||
### Miscellaneous Hints
|
||||
|
||||
Remember to get the **basic functionality** of your shell working before
|
||||
worrying about all of the error conditions and end cases. For example, first
|
||||
get a single command running (probably first a command with no arguments, such
|
||||
as `ls`). Then try adding more arguments.
|
||||
|
||||
Next, try working on multiple commands. Make sure that you are correctly
|
||||
handling all of the cases where there is miscellaneous white space around
|
||||
commands or missing commands. Next, add built-in commands. Finally, add
|
||||
redirection support.
|
||||
|
||||
We strongly recommend that you check the return codes of all system
|
||||
calls from the very beginning of your work. This will often catch
|
||||
errors in how you are invoking these new system calls. And, it's just good
|
||||
programming sense.
|
||||
|
||||
Beat up your own code! You are the best (and in this case, the
|
||||
only) tester of this code. Throw lots of junk at it and make sure the
|
||||
shell behaves well. Good code comes through testing -- you must run
|
||||
all sorts of different tests to make sure things work as
|
||||
desired. Don't be gentle -- other users certainly won't be. Break it
|
||||
now so we don't have to break it later.
|
||||
|
||||
Keep versions of your code. More advanced programmers will use a source
|
||||
control system such as git. Minimally, when you get a piece of functionality
|
||||
working, make a copy of your .c file (perhaps a subdirectory with a version
|
||||
number, such as v1, v2, etc.). By keeping older, working versions around, you
|
||||
can comfortably work on adding new functionality, safe in the knowledge you
|
||||
can always go back to an older, working version if need be.
|
||||
|
||||
Reference in New Issue
Block a user