Initial pzip

2018-03-05 11:30:14 -06:00
parent f45d8f5c37
commit 66ca5a9199
1 changed files with 70 additions and 0 deletions
--- a/concurrency-pzip/README.md
+++ b/concurrency-pzip/README.md
@@ -0,0 +1,70 @@
+
+# Parallel Zip
+
+In an earlier project, you implemented a simple compression tool based on
+run-length encoding, known simply as `zip`. Here, you'll implement something
+similar, except you'll use threads to make a parallel version of `zip`. We'll
+call this version ... wait for it ... `pzip`. 
+
+There are three specific objectives to this assignment:
+
+* To familiarize yourself with the Linux pthreads.
+* To learn how to parallelize a program.
+* To learn how to program for high performance.
+
+## Overview
+
+First, recall how `zip` works by reading the description
+[here](https://github.com/remzi-arpacidusseau/ostep-projects/tree/master/initial-utilities). 
+You'll use the same basic specification, with run-length encoding as the basic
+technique.
+
+Your parallel zip (`pzip`) will externally look the same; the general usage
+from the command line will be as follows:
+
+```
+prompt> ./pzip file > file.z
+```
+
+As before, there may be many input files (not just one, as above). However,
+internally, the program will use POSIX threads to parallelize the compression
+process.  
+
+## Considerations
+
+Doing so effectively and with high performance will require you to address (at
+least) the following issues:
+
+- **How to parallelize the compression.** Of course, the central challenge of
+    this project is to parallelize the compression process. Think about what
+    can be done in parallel, and what must be done serially by a single
+    thread, and design your parallel zip as appropriate.
+
+- **How to determine how many threads to create.** On Linux, this means using
+    interfaces like `get_nprocs()` and `get_nprocs_conf()`; read the man pages
+    for more details. Then, create threads to match the number of CPU
+    resources available.
+
+- **How to efficiently perform each piece of work.** While parallelization
+    will yield speed up, each thread's efficiency in performing the
+    compression is also of critical importance. Thus, making the core
+    compression loop as CPU efficient as possible is needed for high
+    performance. 
+
+- **How to access the input file efficiently.** On Linux, there are many ways
+    to read from a file, including C standard library calls like `fread()` and
+    raw system calls like `read()`. One particularly efficient way is to use
+    memory-mapped files, available via `mmap()`. By mapping the input file
+    into the address space, you can then access bytes of the input file via
+    pointers and do so quite efficiently. 
+
+## Grading
+
+Your code will first be measured for correctness, ensuring that it zips input
+files correctly.
+
+If you pass the correctness tests, your code will be tested for performance;
+higher performing will lead to better scores.
+
+
+