Initial pzip
This commit is contained in:
70
concurrency-pzip/README.md
Normal file
70
concurrency-pzip/README.md
Normal file
@@ -0,0 +1,70 @@
|
|||||||
|
|
||||||
|
# Parallel Zip
|
||||||
|
|
||||||
|
In an earlier project, you implemented a simple compression tool based on
|
||||||
|
run-length encoding, known simply as `zip`. Here, you'll implement something
|
||||||
|
similar, except you'll use threads to make a parallel version of `zip`. We'll
|
||||||
|
call this version ... wait for it ... `pzip`.
|
||||||
|
|
||||||
|
There are three specific objectives to this assignment:
|
||||||
|
|
||||||
|
* To familiarize yourself with the Linux pthreads.
|
||||||
|
* To learn how to parallelize a program.
|
||||||
|
* To learn how to program for high performance.
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
First, recall how `zip` works by reading the description
|
||||||
|
[here](https://github.com/remzi-arpacidusseau/ostep-projects/tree/master/initial-utilities).
|
||||||
|
You'll use the same basic specification, with run-length encoding as the basic
|
||||||
|
technique.
|
||||||
|
|
||||||
|
Your parallel zip (`pzip`) will externally look the same; the general usage
|
||||||
|
from the command line will be as follows:
|
||||||
|
|
||||||
|
```
|
||||||
|
prompt> ./pzip file > file.z
|
||||||
|
```
|
||||||
|
|
||||||
|
As before, there may be many input files (not just one, as above). However,
|
||||||
|
internally, the program will use POSIX threads to parallelize the compression
|
||||||
|
process.
|
||||||
|
|
||||||
|
## Considerations
|
||||||
|
|
||||||
|
Doing so effectively and with high performance will require you to address (at
|
||||||
|
least) the following issues:
|
||||||
|
|
||||||
|
- **How to parallelize the compression.** Of course, the central challenge of
|
||||||
|
this project is to parallelize the compression process. Think about what
|
||||||
|
can be done in parallel, and what must be done serially by a single
|
||||||
|
thread, and design your parallel zip as appropriate.
|
||||||
|
|
||||||
|
- **How to determine how many threads to create.** On Linux, this means using
|
||||||
|
interfaces like `get_nprocs()` and `get_nprocs_conf()`; read the man pages
|
||||||
|
for more details. Then, create threads to match the number of CPU
|
||||||
|
resources available.
|
||||||
|
|
||||||
|
- **How to efficiently perform each piece of work.** While parallelization
|
||||||
|
will yield speed up, each thread's efficiency in performing the
|
||||||
|
compression is also of critical importance. Thus, making the core
|
||||||
|
compression loop as CPU efficient as possible is needed for high
|
||||||
|
performance.
|
||||||
|
|
||||||
|
- **How to access the input file efficiently.** On Linux, there are many ways
|
||||||
|
to read from a file, including C standard library calls like `fread()` and
|
||||||
|
raw system calls like `read()`. One particularly efficient way is to use
|
||||||
|
memory-mapped files, available via `mmap()`. By mapping the input file
|
||||||
|
into the address space, you can then access bytes of the input file via
|
||||||
|
pointers and do so quite efficiently.
|
||||||
|
|
||||||
|
## Grading
|
||||||
|
|
||||||
|
Your code will first be measured for correctness, ensuring that it zips input
|
||||||
|
files correctly.
|
||||||
|
|
||||||
|
If you pass the correctness tests, your code will be tested for performance;
|
||||||
|
higher performing will lead to better scores.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
Reference in New Issue
Block a user