Initial pzip

2018-03-05 11:30:14 -06:00
parent f45d8f5c37
commit 66ca5a9199
1 changed files with 70 additions and 0 deletions
--- a/concurrency-pzip/README.md
+++ b/concurrency-pzip/README.md
@@ -0,0 +1,70 @@
 # Parallel Zip
 In an earlier project, you implemented a simple compression tool based on
 run-length encoding, known simply as `zip`. Here, you'll implement something
 similar, except you'll use threads to make a parallel version of `zip`. We'll
 call this version ... wait for it ... `pzip`. 
 There are three specific objectives to this assignment:
 * To familiarize yourself with the Linux pthreads.
 * To learn how to parallelize a program.
 * To learn how to program for high performance.
 ## Overview
 First, recall how `zip` works by reading the description
 [here](https://github.com/remzi-arpacidusseau/ostep-projects/tree/master/initial-utilities). 
 You'll use the same basic specification, with run-length encoding as the basic
 technique.
 Your parallel zip (`pzip`) will externally look the same; the general usage
 from the command line will be as follows:
 ```
 prompt> ./pzip file > file.z
 ```
 As before, there may be many input files (not just one, as above). However,
 internally, the program will use POSIX threads to parallelize the compression
 process.  
 ## Considerations
 Doing so effectively and with high performance will require you to address (at
 least) the following issues:
 - **How to parallelize the compression.** Of course, the central challenge of
    this project is to parallelize the compression process. Think about what
    can be done in parallel, and what must be done serially by a single
    thread, and design your parallel zip as appropriate.
 - **How to determine how many threads to create.** On Linux, this means using
    interfaces like `get_nprocs()` and `get_nprocs_conf()`; read the man pages
    for more details. Then, create threads to match the number of CPU
    resources available.
 - **How to efficiently perform each piece of work.** While parallelization
    will yield speed up, each thread's efficiency in performing the
    compression is also of critical importance. Thus, making the core
    compression loop as CPU efficient as possible is needed for high
    performance. 
 - **How to access the input file efficiently.** On Linux, there are many ways
    to read from a file, including C standard library calls like `fread()` and
    raw system calls like `read()`. One particularly efficient way is to use
    memory-mapped files, available via `mmap()`. By mapping the input file
    into the address space, you can then access bytes of the input file via
    pointers and do so quite efficiently. 
 ## Grading
 Your code will first be measured for correctness, ensuring that it zips input
 files correctly.
 If you pass the correctness tests, your code will be tested for performance;
 higher performing will lead to better scores.