Compare commits
10 Commits
8896e20455
...
76cff3f89f
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
76cff3f89f | ||
|
|
435fd35685 | ||
|
|
6269639845 | ||
|
|
b95350cb74 | ||
|
|
8d2cbee595 | ||
|
|
685ab5f093 | ||
|
|
2b121db582 | ||
|
|
2ea50fda97 | ||
|
|
135c98f85e | ||
|
|
4b5e4c934d |
@@ -42,6 +42,11 @@ fixed-size, and are each 100 bytes (which includes the key).
|
||||
A successful sort will read all the records into memory from the input
|
||||
file, sort them, and then write them out to the output file.
|
||||
|
||||
You also have to force writes to disk by calling `fsync()` on the output file before finishing.
|
||||
|
||||
You can assume that this is a one-pass sort, i.e., the data can fit
|
||||
into memory. You do not have to implement a multi-pass sort.
|
||||
|
||||
## Considerations
|
||||
|
||||
Doing so effectively and with high performance will require you to address (at
|
||||
|
||||
@@ -7,6 +7,8 @@ consistent. When it isn't, the checker takes steps to repair the problems it
|
||||
sees; however, you won't be doing any repairs to keep this project a little
|
||||
simpler.
|
||||
|
||||
Patrick Sharp created a better version of this project [here](https://github.com/patrick-sharp/ostep-projects/tree/master/filesystems-checker); when we get unlazy we will incorporate what he has done here. Thanks Patrick!
|
||||
|
||||
## Background
|
||||
|
||||
Some basic background about file system consistency is found here:
|
||||
|
||||
181
filesystems-distributed-ufs/README.md
Normal file
181
filesystems-distributed-ufs/README.md
Normal file
@@ -0,0 +1,181 @@
|
||||
|
||||
# Distributed File System
|
||||
|
||||
In this assignment, you will be developing a working *distributed file
|
||||
server.* We provide you with only the bare minimal UDP communication
|
||||
code; you have to build the rest.
|
||||
|
||||
## A Basic File Server
|
||||
|
||||
Your file server is built as a stand-alone UDP-based server. It should wait
|
||||
for a message and then process the message as need be, replying to the given
|
||||
client.
|
||||
|
||||
Your file server will store all of its data in an on-disk, fixed-sized
|
||||
file which will be referred to as the *file system image*. This image
|
||||
contains the on-disk representation of your data structures; you
|
||||
should use these system calls to access it: `open(), read(), write(),
|
||||
lseek(), close(), fsync().`
|
||||
|
||||
To access the file server, you will be building a client library. The
|
||||
interface that the library supports is defined in [mfs.h](mfs.h). The
|
||||
library should be called `libmfs.so`, and any programs that wish to access
|
||||
your file server will link with it and call its various routines.
|
||||
|
||||
## On-Disk File System: A Basic Unix File System
|
||||
|
||||
The on-disk file system structures follow that of the
|
||||
very simple file system discussed
|
||||
[here](https://pages.cs.wisc.edu/~remzi/OSTEP/file-implementation.pdf). On-disk,
|
||||
the structures are as follows:
|
||||
- A single block (4KB) super block
|
||||
- An inode bitmap (can be one or more 4KB blocks, depending on the number of inodes)
|
||||
- A data bitmap (can be one or more 4KB blocks, depending on the number of data blocks)
|
||||
- The inode table (a multiple of 4KB-sized blocks, depending on the number of inodes)
|
||||
- The data region (some number of 4KB blocks, depending on the number of data blocks)
|
||||
|
||||
More details about on-disk structures can be found in the header [ufs.h](https://github.com/remzi-arpacidusseau/ostep-projects/blob/master/filesystems-distributed-ufs/ufs.h), which you should
|
||||
use. Specifically, this has a very specific format for the super
|
||||
block, inode, and directory entries. Bitmaps just have one bit per
|
||||
allocated unit as described in the book.
|
||||
|
||||
As for directories, here is a little more detail. Each directory has
|
||||
an inode, and points to one or more data blocks that contain directory
|
||||
entries. Each directory entry should be simple, and consist of 32
|
||||
bytes: a name and an inode number pair. The name should be a
|
||||
fixed-length field of size 28 bytes; the inode number is just an
|
||||
integer (4 bytes). When a directory is created, it should contain two
|
||||
entries: the name `.` (dot), which refers to this new directory's
|
||||
inode number, and `..` (dot-dot), which refers to the parent
|
||||
directory's inode number. For directory entries that are not yet in
|
||||
use (in an allocated 4-KB directory block), the inode number should be
|
||||
set to -1. This way, utilities can scan through the entries to check
|
||||
if they are valid.
|
||||
|
||||
When your server is started, it is passed the name of the file system
|
||||
image file. The image is created by a tool we provide, called `mkfs`.
|
||||
It is pretty self-explanatory and can be found
|
||||
[here](https://github.com/remzi-arpacidusseau/ostep-projects/blob/master/filesystems-distributed-ufs/mkfs.c).
|
||||
|
||||
When booting off of an existing image, your server should read in the
|
||||
superblock, bitmaps, and inode table, and keep in-memory versions of
|
||||
these. When writing to the image, you should update these on-disk
|
||||
structures accordingly.
|
||||
|
||||
Importantly, you cannot change the file-system on-disk format.
|
||||
|
||||
## Client library
|
||||
|
||||
The client library should export the following interfaces:
|
||||
|
||||
- `int MFS_Init(char *hostname, int port)`: `MFS_Init()` takes a host name
|
||||
and port number and uses those to find the server exporting the file system.
|
||||
- `int MFS_Lookup(int pinum, char *name)`: `MFS_Lookup()` takes the parent
|
||||
inode number (which should be the inode number of a directory) and looks up
|
||||
the entry `name` in it. The inode number of `name` is returned. Success:
|
||||
return inode number of name; failure: return -1. Failure modes: invalid pinum,
|
||||
name does not exist in pinum.
|
||||
- `int MFS_Stat(int inum, MFS_Stat_t *m)`: `MFS_Stat()` returns some
|
||||
information about the file specified by inum. Upon success, return 0,
|
||||
otherwise -1. The exact info returned is defined by `MFS_Stat_t`. Failure modes:
|
||||
inum does not exist. File and directory sizes are described below.
|
||||
- `int MFS_Write(int inum, char *buffer, int offset, int nbytes)`:
|
||||
`MFS_Write()` writes a buffer of size `nbytes` (max size: 4096 bytes) at the byte
|
||||
offset specified by `offset`. Returns 0 on success, -1 on
|
||||
failure. Failure modes: invalid inum, invalid nbytes, invalid offset, not a
|
||||
regular file (because you can't write to directories).
|
||||
- `int MFS_Read(int inum, char *buffer, int offset, int nbytes)`:
|
||||
`MFS_Read()` reads `nbytes` of data (max size 4096 bytes) specified by the
|
||||
byte offset `offset` into the buffer from file specified by
|
||||
`inum`. The routine should work for either a file or directory;
|
||||
directories should return data in the format specified by
|
||||
`MFS_DirEnt_t`. Success: 0, failure: -1. Failure modes: invalid inum,
|
||||
invalid offset, invalid nbytes.
|
||||
- `int MFS_Creat(int pinum, int type, char *name)`: `MFS_Creat()` makes a
|
||||
file (`type == MFS_REGULAR_FILE`) or directory (`type == MFS_DIRECTORY`)
|
||||
in the parent directory specified by `pinum` of name `name`. Returns 0 on
|
||||
success, -1 on failure. Failure modes: pinum does not exist, or name is too
|
||||
long. If `name` already exists, return success.
|
||||
- `int MFS_Unlink(int pinum, char *name)`: `MFS_Unlink()` removes the file or
|
||||
directory `name` from the directory specified by `pinum`. 0 on success, -1
|
||||
on failure. Failure modes: pinum does not exist, directory is NOT empty. Note
|
||||
that the name not existing is NOT a failure by our definition (think about why
|
||||
this might be).
|
||||
- `int MFS_Shutdown()`: `MFS_Shutdown()` just tells the server to force all
|
||||
of its data structures to disk and shutdown by calling `exit(0)`. This interface
|
||||
will mostly be used for testing purposes.
|
||||
|
||||
Size: The size of a file is the offset of the last valid byte written
|
||||
to the file. Specifically, if you write 100 bytes to an empty file at
|
||||
offset 0, the size is 100; if you write 100 bytes to an empty file at
|
||||
offset 10, the size is 110. For a directory, it is the same (i.e., the
|
||||
byte offset of the last byte of the last valid entry).
|
||||
|
||||
## Server Idempotency
|
||||
|
||||
The key behavior implemented by the server is *idempotency*.
|
||||
Specifically, on any change to the file system state (such as a
|
||||
`MFS_Write`, `MFS_Creat`, or `MFS_Unlink`), all the dirtied buffers in the
|
||||
server are committed to the disk. The server can achieved this end by
|
||||
calling `fsync()` on the file system image. Thus, before returning a
|
||||
success code, the file system should always `fsync()` the image.
|
||||
|
||||
Now you might be wondering: why do this? Simple: if the server crashes, the
|
||||
client can simply timeout and retry the operation and know that it is OK to do
|
||||
so. Read [this chapter](https://pages.cs.wisc.edu/~remzi/OSTEP/dist-nfs.pdf) on NFS
|
||||
for details.
|
||||
|
||||
Now you might be wondering: how do I implement a timeout? Simple, with the
|
||||
`select()` interface. The `select()` calls allows you to wait for a reply
|
||||
on a certain socket descriptor (or more than one, though that is not needed
|
||||
here). You can even specify a timeout so that the client does not block
|
||||
forever waiting for data to be returned from the server. By doing so, you can
|
||||
wait for a reply for a certain amount of time, and if nothing is returned, try
|
||||
the operation again until it is successful.
|
||||
|
||||
## Program Specifications
|
||||
|
||||
Your server program must be invoked exactly as follows:
|
||||
|
||||
prompt> server [portnum] [file-system-image]
|
||||
|
||||
The command line arguments to your file server are to be interpreted as follows.
|
||||
|
||||
- portnum: the port number that the file server should listen on.
|
||||
- file-system-image: a file that contains the file system image.
|
||||
|
||||
If the file system image does not exist, you should print out an error
|
||||
message (`image does not exist\n`) and exit with exit code 1.
|
||||
|
||||
Your client library should be called `libmfs.so`. It should implement
|
||||
the interface as specified by `mfs.h`, and in particular deal with
|
||||
the case where the server does not reply in a timely fashion; the way
|
||||
it deals with that is simply by retrying the operation, after a
|
||||
timeout of some kind (default: five second timeout).
|
||||
|
||||
## Relevant Chapters
|
||||
|
||||
Read these:
|
||||
- [File System Implementation](https://pages.cs.wisc.edu/~remzi/OSTEP/file-implementation.pdf)
|
||||
- [Distributed Systems](https://pages.cs.wisc.edu/~remzi/OSTEP/dist-intro.pdf)
|
||||
- [Distributed File System: NFS](https://pages.cs.wisc.edu/~remzi/OSTEP/dist-nfs.pdf)
|
||||
|
||||
|
||||
## Some Helper Code
|
||||
|
||||
To get you going, we have written some simple UDP code that can send a
|
||||
message and then receive a reply from a client to a server. It can be found in
|
||||
[here](https://github.com/remzi-arpacidusseau/ostep-code/tree/master/dist-intro).
|
||||
|
||||
There is also other code as mentioned above:
|
||||
- [mfs.h](https://github.com/remzi-arpacidusseau/ostep-projects/blob/master/filesystems-distributed-ufs/mfs.h)
|
||||
- [ufs.h](https://github.com/remzi-arpacidusseau/ostep-projects/blob/master/filesystems-distributed-ufs/ufs.h)
|
||||
- [mkfs.c](https://github.com/remzi-arpacidusseau/ostep-projects/blob/master/filesystems-distributed-ufs/mkfs.c)
|
||||
|
||||
You'll also have to learn how to make a shared library. Read [here](https://tldp.org/HOWTO/Program-Library-HOWTO/shared-libraries.html) for more information.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
30
filesystems-distributed-ufs/mfs.h
Normal file
30
filesystems-distributed-ufs/mfs.h
Normal file
@@ -0,0 +1,30 @@
|
||||
#ifndef __MFS_h__
|
||||
#define __MFS_h__
|
||||
|
||||
#define MFS_DIRECTORY (0)
|
||||
#define MFS_REGULAR_FILE (1)
|
||||
|
||||
#define MFS_BLOCK_SIZE (4096)
|
||||
|
||||
typedef struct __MFS_Stat_t {
|
||||
int type; // MFS_DIRECTORY or MFS_REGULAR
|
||||
int size; // bytes
|
||||
// note: no permissions, access times, etc.
|
||||
} MFS_Stat_t;
|
||||
|
||||
typedef struct __MFS_DirEnt_t {
|
||||
char name[28]; // up to 28 bytes of name in directory (including \0)
|
||||
int inum; // inode number of entry (-1 means entry not used)
|
||||
} MFS_DirEnt_t;
|
||||
|
||||
|
||||
int MFS_Init(char *hostname, int port);
|
||||
int MFS_Lookup(int pinum, char *name);
|
||||
int MFS_Stat(int inum, MFS_Stat_t *m);
|
||||
int MFS_Write(int inum, char *buffer, int offset, int nbytes);
|
||||
int MFS_Read(int inum, char *buffer, int offset, int nbytes);
|
||||
int MFS_Creat(int pinum, int type, char *name);
|
||||
int MFS_Unlink(int pinum, char *name);
|
||||
int MFS_Shutdown();
|
||||
|
||||
#endif // __MFS_h__
|
||||
203
filesystems-distributed-ufs/mkfs.c
Normal file
203
filesystems-distributed-ufs/mkfs.c
Normal file
@@ -0,0 +1,203 @@
|
||||
#include <assert.h>
|
||||
#include <fcntl.h>
|
||||
#include <stdio.h>
|
||||
#include <stdlib.h>
|
||||
#include <string.h>
|
||||
#include <unistd.h>
|
||||
|
||||
#include "ufs.h"
|
||||
|
||||
void usage() {
|
||||
fprintf(stderr, "usage: mkfs -f <image_file> [-d <num_data_blocks] [-i <num_inodes>]\n");
|
||||
exit(1);
|
||||
}
|
||||
|
||||
int main(int argc, char *argv[]) {
|
||||
int ch;
|
||||
char *image_file = NULL;
|
||||
int num_inodes = 32;
|
||||
int num_data = 32;
|
||||
int visual = 0;
|
||||
|
||||
while ((ch = getopt(argc, argv, "i:d:f:v")) != -1) {
|
||||
switch (ch) {
|
||||
case 'i':
|
||||
num_inodes = atoi(optarg);
|
||||
break;
|
||||
case 'd':
|
||||
num_data = atoi(optarg);
|
||||
break;
|
||||
case 'f':
|
||||
image_file = optarg;
|
||||
break;
|
||||
case 'v':
|
||||
visual = 1;
|
||||
break;
|
||||
default:
|
||||
usage();
|
||||
}
|
||||
}
|
||||
argc -= optind;
|
||||
argv += optind;
|
||||
|
||||
if (image_file == NULL)
|
||||
usage();
|
||||
|
||||
unsigned char *empty_buffer;
|
||||
empty_buffer = calloc(UFS_BLOCK_SIZE, 1);
|
||||
if (empty_buffer == NULL) {
|
||||
perror("calloc");
|
||||
exit(1);
|
||||
}
|
||||
|
||||
int fd = open(image_file, O_WRONLY | O_CREAT | O_TRUNC, S_IRUSR | S_IWUSR);
|
||||
if (fd < 0) {
|
||||
perror("open");
|
||||
exit(1);
|
||||
}
|
||||
|
||||
assert(num_inodes >= 32);
|
||||
assert(num_data >= 32);
|
||||
|
||||
// presumed: block 0 is the super block
|
||||
super_t s;
|
||||
|
||||
// totals
|
||||
s.num_inodes = num_inodes;
|
||||
s.num_data = num_data;
|
||||
|
||||
// inode bitmap
|
||||
int bits_per_block = (8 * UFS_BLOCK_SIZE); // remember, there are 8 bits per byte
|
||||
|
||||
s.inode_bitmap_addr = 1;
|
||||
s.inode_bitmap_len = num_inodes / bits_per_block;
|
||||
if (num_inodes % bits_per_block != 0)
|
||||
s.inode_bitmap_len++;
|
||||
|
||||
// data bitmap
|
||||
s.data_bitmap_addr = s.inode_bitmap_addr + s.inode_bitmap_len;
|
||||
s.data_bitmap_len = num_data / bits_per_block;
|
||||
if (num_data % bits_per_block != 0)
|
||||
s.data_bitmap_len++;
|
||||
|
||||
// inode table
|
||||
s.inode_region_addr = s.data_bitmap_addr + s.data_bitmap_len;
|
||||
int total_inode_bytes = num_inodes * sizeof(inode_t);
|
||||
s.inode_region_len = total_inode_bytes / UFS_BLOCK_SIZE;
|
||||
if (total_inode_bytes % UFS_BLOCK_SIZE != 0)
|
||||
s.inode_region_len++;
|
||||
|
||||
// data blocks
|
||||
s.data_region_addr = s.inode_region_addr + s.inode_region_len;
|
||||
s.data_region_len = num_data;
|
||||
|
||||
int total_blocks = 1 + s.inode_bitmap_len + s.data_bitmap_len + s.inode_region_len + s.data_region_len;
|
||||
|
||||
// super block is the first block
|
||||
int rc = pwrite(fd, &s, sizeof(super_t), 0);
|
||||
if (rc != sizeof(super_t)) {
|
||||
perror("write");
|
||||
exit(1);
|
||||
}
|
||||
|
||||
printf("total blocks %d\n", total_blocks);
|
||||
printf(" inodes %d [size of each: %lu]\n", num_inodes, sizeof(inode_t));
|
||||
printf(" data blocks %d\n", num_data);
|
||||
printf("layout details\n");
|
||||
printf(" inode bitmap address/len %d [%d]\n", s.inode_bitmap_addr, s.inode_bitmap_len);
|
||||
printf(" data bitmap address/len %d [%d]\n", s.data_bitmap_addr, s.data_bitmap_len);
|
||||
|
||||
// first, zero out all the blocks
|
||||
int i;
|
||||
for (i = 1; i < total_blocks; i++) {
|
||||
rc = pwrite(fd, empty_buffer, UFS_BLOCK_SIZE, i * UFS_BLOCK_SIZE);
|
||||
if (rc != UFS_BLOCK_SIZE) {
|
||||
perror("write");
|
||||
exit(1);
|
||||
}
|
||||
}
|
||||
|
||||
//
|
||||
// need to allocate first inode in inode bitmap
|
||||
//
|
||||
typedef struct {
|
||||
unsigned int bits[UFS_BLOCK_SIZE / sizeof(unsigned int)];
|
||||
} bitmap_t;
|
||||
assert(sizeof(bitmap_t) == UFS_BLOCK_SIZE);
|
||||
|
||||
bitmap_t b;
|
||||
for (i = 0; i < 1024; i++)
|
||||
b.bits[i] = 0;
|
||||
b.bits[0] = 0x1 << 31; // first entry is allocated
|
||||
|
||||
rc = pwrite(fd, &b, UFS_BLOCK_SIZE, s.inode_bitmap_addr * UFS_BLOCK_SIZE);
|
||||
assert(rc == UFS_BLOCK_SIZE);
|
||||
|
||||
//
|
||||
// need to allocate first data block in data bitmap
|
||||
// (can just reuse this to write out data bitmap too)
|
||||
//
|
||||
rc = pwrite(fd, &b, UFS_BLOCK_SIZE, s.data_bitmap_addr * UFS_BLOCK_SIZE);
|
||||
assert(rc == UFS_BLOCK_SIZE);
|
||||
|
||||
//
|
||||
// need to write out inode
|
||||
//
|
||||
typedef struct {
|
||||
inode_t inodes[UFS_BLOCK_SIZE / sizeof(inode_t)];
|
||||
} inode_block;
|
||||
|
||||
inode_block itable;
|
||||
itable.inodes[0].type = UFS_DIRECTORY;
|
||||
itable.inodes[0].size = 2 * sizeof(dir_ent_t); // in bytes
|
||||
itable.inodes[0].direct[0] = s.data_region_addr;
|
||||
for (i = 1; i < DIRECT_PTRS; i++)
|
||||
itable.inodes[0].direct[i] = -1;
|
||||
|
||||
rc = pwrite(fd, &itable, UFS_BLOCK_SIZE, s.inode_region_addr * UFS_BLOCK_SIZE);
|
||||
assert(rc == UFS_BLOCK_SIZE);
|
||||
|
||||
//
|
||||
// need to write out root directory contents to first data block
|
||||
// create a root directory, with nothing in it
|
||||
//
|
||||
typedef struct {
|
||||
dir_ent_t entries[128];
|
||||
} dir_block_t;
|
||||
// xxx assumes 4096 block, 32 byte entries
|
||||
assert(sizeof(dir_ent_t) * 128 == UFS_BLOCK_SIZE);
|
||||
|
||||
dir_block_t parent;
|
||||
strcpy(parent.entries[0].name, ".");
|
||||
parent.entries[0].inum = 0;
|
||||
|
||||
strcpy(parent.entries[1].name, "..");
|
||||
parent.entries[1].inum = 0;
|
||||
|
||||
for (i = 2; i < 128; i++)
|
||||
parent.entries[i].inum = -1;
|
||||
|
||||
rc = pwrite(fd, &parent, UFS_BLOCK_SIZE, s.data_region_addr * UFS_BLOCK_SIZE);
|
||||
assert(rc == UFS_BLOCK_SIZE);
|
||||
|
||||
if (visual) {
|
||||
int i;
|
||||
printf("\nVisualization of layout\n\n");
|
||||
printf("S");
|
||||
for (i = 0; i < s.inode_bitmap_len; i++)
|
||||
printf("i");
|
||||
for (i = 0; i < s.data_bitmap_len; i++)
|
||||
printf("d");
|
||||
for (i = 0; i < s.inode_region_len; i++)
|
||||
printf("I");
|
||||
for (i = 0; i < s.data_region_len; i++)
|
||||
printf("D");
|
||||
printf("\n\n");
|
||||
}
|
||||
|
||||
(void) fsync(fd);
|
||||
(void) close(fd);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
37
filesystems-distributed-ufs/ufs.h
Normal file
37
filesystems-distributed-ufs/ufs.h
Normal file
@@ -0,0 +1,37 @@
|
||||
#ifndef __ufs_h__
|
||||
#define __ufs_h__
|
||||
|
||||
#define UFS_DIRECTORY (0)
|
||||
#define UFS_REGULAR_FILE (1)
|
||||
|
||||
#define UFS_BLOCK_SIZE (4096)
|
||||
|
||||
#define DIRECT_PTRS (30)
|
||||
|
||||
typedef struct {
|
||||
int type; // MFS_DIRECTORY or MFS_REGULAR
|
||||
int size; // bytes
|
||||
unsigned int direct[DIRECT_PTRS];
|
||||
} inode_t;
|
||||
|
||||
typedef struct {
|
||||
char name[28]; // up to 28 bytes of name in directory (including \0)
|
||||
int inum; // inode number of entry (-1 means entry not used)
|
||||
} dir_ent_t;
|
||||
|
||||
// presumed: block 0 is the super block
|
||||
typedef struct __super {
|
||||
int inode_bitmap_addr; // block address (in blocks)
|
||||
int inode_bitmap_len; // in blocks
|
||||
int data_bitmap_addr; // block address (in blocks)
|
||||
int data_bitmap_len; // in blocks
|
||||
int inode_region_addr; // block address (in blocks)
|
||||
int inode_region_len; // in blocks
|
||||
int data_region_addr; // block address (in blocks)
|
||||
int data_region_len; // in blocks
|
||||
int num_inodes; // just the number of inodes
|
||||
int num_data; // and data blocks...
|
||||
} super_t;
|
||||
|
||||
|
||||
#endif // __ufs_h__
|
||||
Reference in New Issue
Block a user