diff --git a/filesystems-distributed-ufs/README.md b/filesystems-distributed-ufs/README.md new file mode 100644 index 0000000..15e1cc8 --- /dev/null +++ b/filesystems-distributed-ufs/README.md @@ -0,0 +1,153 @@ + +# Distributed File System + +In this assignment, you will be developing a working *distributed file +server.* We provide you with only the bare minimal UDP communication +code; you have to build the rest. + +## A Basic File Server + +Your file server is built as a stand-alone UDP-based server. It should wait +for a message and then process the message as need be, replying to the given +client. + +Your file server will store all of its data in an on-disk, fixed-sized +file which will be referred to as the *file system image*. This image +contains the on-disk representation of your data structures; you +should use these system calls to access it: `open(), read(), write(), +lseek(), close(), fsync().` + +To access the file server, you will be building a client library. The +interface that the library supports is defined in [mfs.h](mfs.h). The +library should be called `libmfs.so`, and any programs that wish to access +your file server will link with it and call its various routines. + +## On-Disk File System: A Basic Unix File System + +Your on-disk file system structures should roughly follow that of the +very simple file system discussed +[here](https://pages.cs.wisc.edu/~remzi/OSTEP/file-implement.pdf). On-disk, + +One other structure you'll have to manage on disk are +directories. Each directory has an inode, and points to one or more +data blocks that contain directory entries. Each directory entry +should be simple, and consist of 32 bytes: a name and an inode number +pair. The name should be a fixed-length field of size 28 bytes; the +inode number is just an integer (4 bytes). When a directory is +created, it should contain two entries: the name `.` (dot), which +refers to this new directory's inode number, and `..` (dot-dot), which +refers to the parent directory's inode number. For directory entries +that are not yet in use (in an allocated 4-KB directory block), the +inode number should be set to -1. This way, utilities can scan through +the entries to check if they are valid. + +When your server is started, it is passed the name of the file system +image file. The image is created by a tool we provide, called `mkfs`. + +When booting off of an existing image, your server should read in the +superblock, bitmaps, and inode table, and keep in-memory versions of these. + + + +## Client library + +The client library should export the following interfaces: + +- `int MFS_Init(char *hostname, int port)`: `MFS_Init()` takes a host name +and port number and uses those to find the server exporting the file system. +- `int MFS_Lookup(int pinum, char *name)`: `MFS_Lookup()` takes the parent +inode number (which should be the inode number of a directory) and looks up +the entry `name` in it. The inode number of `name` is returned. Success: +return inode number of name; failure: return -1. Failure modes: invalid pinum, +name does not exist in pinum. +- `int MFS_Stat(int inum, MFS_Stat_t *m)`: `MFS_Stat()` returns some +information about the file specified by inum. Upon success, return 0, +otherwise -1. The exact info returned is defined by `MFS_Stat_t`. Failure modes: +inum does not exist. +- `int MFS_Write(int inum, char *buffer, int block)`: `MFS_Write()` writes a +block of size 4096 bytes at the block offset specified by `block`. Returns 0 +on success, -1 on failure. Failure modes: invalid inum, invalid block, not a +regular file (because you can't write to directories). +- `int MFS_Read(int inum, char *buffer, int block)`: `MFS_Read()` reads +a block specified by `block` into the buffer from file specified by +`inum`. The routine should work for either a file or directory; +directories should return data in the format specified by +`MFS_DirEnt_t`. Success: 0, failure: -1. Failure modes: invalid inum, +invalid block. +- `int MFS_Creat(int pinum, int type, char *name)`: `MFS_Creat()` makes a +file (`type == MFS_REGULAR_FILE`) or directory (`type == MFS_DIRECTORY`) +in the parent directory specified by `pinum` of name `name`. Returns 0 on +success, -1 on failure. Failure modes: pinum does not exist, or name is too +long. If `name` already exists, return success (think about why). +- `int MFS_Unlink(int pinum, char *name)`: `MFS_Unlink()` removes the file or +directory `name` from the directory specified by `pinum`. 0 on success, -1 +on failure. Failure modes: pinum does not exist, directory is NOT empty. Note +that the name not existing is NOT a failure by our definition (think about why +this might be). +- `int MFS_Shutdown()`: `MFS_Shutdown()` just tells the server to force all +of its data structures to disk and shutdown by calling `exit(0)`. This interface +will mostly be used for testing purposes. + + +## Server Idempotency + +The key behavior implemented by the server is *idempotency*. +Specifically, on any change to the file system state (such as a +`MFS_Write`, `MFS_Creat`, or `MFS_Unlink`), all the dirtied buffers in the +server are committed to the disk. The server can achieved this end by +calling `fsync()` on the file system image. Thus, before returning a +success code, the file system should always `fsync()` the image. + +Now you might be wondering: why do this? Simple: if the server crashes, the +client can simply timeout and retry the operation and know that it is OK to do +so. Read [this chapter](https://pages.cs.wisc.edu/~remzi/OSTEP/dist-nfs.pdf) on NFS +for details. + +Now you might be wondering: how do I implement a timeout? Simple, with the +`select()` interface. The `select()` calls allows you to wait for a reply +on a certain socket descriptor (or more than one, though that is not needed +here). You can even specify a timeout so that the client does not block +forever waiting for data to be returned from the server. By doing so, you can +wait for a reply for a certain amount of time, and if nothing is returned, try +the operation again until it is successful. + +## Program Specifications + +Your server program must be invoked exactly as follows: + +prompt> server [portnum] [file-system-image] + +The command line arguments to your file server are to be interpreted as follows. + +- portnum: the port number that the file server should listen on. +- file-system-image: a file that contains the file system image. + +If the file system image does not exist, you should print out an error message and exit with exit code 1. + +Your client library should be called `libmfs.so`. It should implement +the interface as specified by `mfs.h`, and in particular deal with +the case where the server does not reply in a timely fashion; the way +it deals with that is simply by retrying the operation, after a +timeout of some kind (default: five second timeout). + +## Relevant Chapters + +Read these: +- [File System Implementation](https://pages.cs.wisc.edu/~remzi/OSTEP/file-implement.pdf) +- [Distributed Systems](https://pages.cs.wisc.edu/~remzi/OSTEP/dist-intro.pdf) +- [Distributed File System: NFS](https://pages.cs.wisc.edu/~remzi/OSTEP/dist-nfs.pdf) + + +## Some Helper Code + +To get you going, we have written some simple UDP code that can send a +message and then receive a reply from a client to a server. It can be found in +[here](https://github.com/remzi-arpacidusseau/ostep-code/tree/master/dist-intro). + +You'll also have to learn how to make a shared library. Read [here](https://tldp.org/HOWTO/Program-Library-HOWTO/shared-libraries.html) for more information. + + + + + + diff --git a/filesystems-distributed-ufs/mfs.h b/filesystems-distributed-ufs/mfs.h new file mode 100644 index 0000000..554d4dc --- /dev/null +++ b/filesystems-distributed-ufs/mfs.h @@ -0,0 +1,30 @@ +#ifndef __MFS_h__ +#define __MFS_h__ + +#define MFS_DIRECTORY (0) +#define MFS_REGULAR_FILE (1) + +#define MFS_BLOCK_SIZE (4096) + +typedef struct __MFS_Stat_t { + int type; // MFS_DIRECTORY or MFS_REGULAR + int size; // bytes + // note: no permissions, access times, etc. +} MFS_Stat_t; + +typedef struct __MFS_DirEnt_t { + char name[28]; // up to 28 bytes of name in directory (including \0) + int inum; // inode number of entry (-1 means entry not used) +} MFS_DirEnt_t; + + +int MFS_Init(char *hostname, int port); +int MFS_Lookup(int pinum, char *name); +int MFS_Stat(int inum, MFS_Stat_t *m); +int MFS_Write(int inum, char *buffer, int offset, int nbytes); +int MFS_Read(int inum, char *buffer, int offset, int nbytes); +int MFS_Creat(int pinum, int type, char *name); +int MFS_Unlink(int pinum, char *name); +int MFS_Shutdown(); + +#endif // __MFS_h__ diff --git a/filesystems-distributed-ufs/mkfs.c b/filesystems-distributed-ufs/mkfs.c new file mode 100644 index 0000000..4e38cfb --- /dev/null +++ b/filesystems-distributed-ufs/mkfs.c @@ -0,0 +1,197 @@ +#include +#include +#include +#include +#include +#include + +#include "ufs.h" + +void usage() { + fprintf(stderr, "usage: mkfs -f [-d ]\n"); + exit(1); +} + +int main(int argc, char *argv[]) { + int ch; + char *image_file = NULL; + int num_inodes = 32; + int num_data = 32; + int visual = 0; + + while ((ch = getopt(argc, argv, "i:d:f:v")) != -1) { + switch (ch) { + case 'i': + num_inodes = atoi(optarg); + break; + case 'd': + num_data = atoi(optarg); + break; + case 'f': + image_file = optarg; + break; + case 'v': + visual = 1; + break; + default: + usage(); + } + } + argc -= optind; + argv += optind; + + if (image_file == NULL) + usage(); + + unsigned char *empty_buffer; + empty_buffer = calloc(UFS_BLOCK_SIZE, 1); + if (empty_buffer == NULL) { + perror("calloc"); + exit(1); + } + + int fd = open(image_file, O_WRONLY | O_CREAT | O_TRUNC, S_IRUSR | S_IWUSR); + if (fd < 0) { + perror("open"); + exit(1); + } + + assert(num_inodes >= 32); + assert(num_data >= 32); + + // presumed: block 0 is the super block + super_t s; + + // inode bitmap + s.inode_bitmap_addr = 1; + s.inode_bitmap_len = num_inodes / UFS_BLOCK_SIZE; + if (num_inodes % UFS_BLOCK_SIZE != 0) + s.inode_bitmap_len++; + + // data bitmap + s.data_bitmap_addr = s.inode_bitmap_addr + s.inode_bitmap_len; + s.data_bitmap_len = num_data / UFS_BLOCK_SIZE; + if (num_data % UFS_BLOCK_SIZE != 0) + s.data_bitmap_len++; + + // inode table + s.inode_region_addr = s.data_bitmap_addr + s.data_bitmap_len; + int total_inode_bytes = num_inodes * sizeof(inode_t); + s.inode_region_len = total_inode_bytes / UFS_BLOCK_SIZE; + if (total_inode_bytes % UFS_BLOCK_SIZE != 0) + s.inode_region_len++; + + // data blocks + s.data_region_addr = s.inode_region_addr + s.inode_region_len; + s.data_region_len = num_data; + + int total_blocks = 1 + s.inode_bitmap_len + s.data_bitmap_len + s.inode_region_len + s.data_region_len; + + // super block is the first block + int rc = pwrite(fd, &s, sizeof(super_t), 0); + if (rc != sizeof(super_t)) { + perror("write"); + exit(1); + } + + printf("total blocks %d\n", total_blocks); + printf(" inodes %d [size of each: %lu]\n", num_inodes, sizeof(inode_t)); + printf(" data blocks %d\n", num_data); + printf("layout details\n"); + printf(" inode bitmap address/len %d [%d]\n", s.inode_bitmap_addr, s.inode_bitmap_len); + printf(" data bitmap address/len %d [%d]\n", s.data_bitmap_addr, s.data_bitmap_len); + + // first, zero out all the blocks + int i; + for (i = 1; i < total_blocks; i++) { + rc = pwrite(fd, empty_buffer, UFS_BLOCK_SIZE, i * UFS_BLOCK_SIZE); + if (rc != UFS_BLOCK_SIZE) { + perror("write"); + exit(1); + } + } + + // + // need to allocate first inode in inode bitmap + // + typedef struct { + unsigned int bits[UFS_BLOCK_SIZE / sizeof(unsigned int)]; + } bitmap_t; + assert(sizeof(bitmap_t) == UFS_BLOCK_SIZE); + + bitmap_t b; + for (i = 0; i < 1024; i++) + b.bits[i] = 0; + b.bits[0] = 0x80000000; // first entry is allocated + + rc = pwrite(fd, &b, UFS_BLOCK_SIZE, s.inode_bitmap_addr * UFS_BLOCK_SIZE); + assert(rc == UFS_BLOCK_SIZE); + + // + // need to allocate first data block in data bitmap + // (can just reuse this to write out data bitmap too) + // + rc = pwrite(fd, &b, UFS_BLOCK_SIZE, s.data_bitmap_addr * UFS_BLOCK_SIZE); + assert(rc == UFS_BLOCK_SIZE); + + // + // need to write out inode + // + typedef struct { + inode_t inodes[UFS_BLOCK_SIZE / sizeof(inode_t)]; + } inode_block; + + inode_block itable; + itable.inodes[0].type = UFS_DIRECTORY; + itable.inodes[0].size = sizeof(dir_ent_t); // in bytes + itable.inodes[0].direct[0] = s.data_region_addr; + for (i = 1; i < DIRECT_PTRS; i++) + itable.inodes[0].direct[i] = -1; + + rc = pwrite(fd, &itable, UFS_BLOCK_SIZE, s.inode_region_addr * UFS_BLOCK_SIZE); + assert(rc == UFS_BLOCK_SIZE); + + // + // need to write out root directory contents to first data block + // create a root directory, with nothing in it + // + typedef struct { + dir_ent_t entries[128]; + } dir_block_t; + // xxx assumes 4096 block, 32 byte entries + assert(sizeof(dir_ent_t) * 128 == UFS_BLOCK_SIZE); + + dir_block_t parent; + strcpy(parent.entries[0].name, "."); + parent.entries[0].inum = 0; + + strcpy(parent.entries[1].name, ".."); + parent.entries[1].inum = 0; + + for (i = 2; i < 128; i++) + parent.entries[i].inum = -1; + + rc = pwrite(fd, &parent, UFS_BLOCK_SIZE, s.data_region_addr * UFS_BLOCK_SIZE); + assert(rc == UFS_BLOCK_SIZE); + + if (visual) { + int i; + printf("\nVisualization of layout\n\n"); + printf("S"); + for (i = 0; i < s.inode_bitmap_len; i++) + printf("i"); + for (i = 0; i < s.data_bitmap_len; i++) + printf("d"); + for (i = 0; i < s.inode_region_len; i++) + printf("I"); + for (i = 0; i < s.data_region_len; i++) + printf("D"); + printf("\n\n"); + } + + (void) fsync(fd); + (void) close(fd); + + return 0; +} + diff --git a/filesystems-distributed-ufs/ufs.h b/filesystems-distributed-ufs/ufs.h new file mode 100644 index 0000000..10891d5 --- /dev/null +++ b/filesystems-distributed-ufs/ufs.h @@ -0,0 +1,35 @@ +#ifndef __ufs_h__ +#define __ufs_h__ + +#define UFS_DIRECTORY (0) +#define UFS_REGULAR_FILE (1) + +#define UFS_BLOCK_SIZE (4096) + +#define DIRECT_PTRS (30) + +typedef struct { + int type; // MFS_DIRECTORY or MFS_REGULAR + int size; // bytes + unsigned int direct[DIRECT_PTRS]; +} inode_t; + +typedef struct { + char name[28]; // up to 28 bytes of name in directory (including \0) + int inum; // inode number of entry (-1 means entry not used) +} dir_ent_t; + +// presumed: block 0 is the super block +typedef struct __super { + int inode_bitmap_addr; // block address + int inode_bitmap_len; // in blocks + int data_bitmap_addr; // block address + int data_bitmap_len; // in blocks + int inode_region_addr; // block address + int inode_region_len; // in blocks + int data_region_addr; // block address + int data_region_len; // in blocks +} super_t; + + +#endif // __ufs_h__