Compare commits

...

10 Commits

Author SHA1 Message Date
Remzi Arpaci-Dusseau
76cff3f89f Update README.md
point to P Sharp's version.
2023-11-09 08:20:20 -05:00
Remzi Arpaci-Dusseau
435fd35685 slight fix to bitmap size calcs, and more info in superblock on num_inodes and num_data 2022-12-07 17:32:27 -06:00
Remzi Arpaci-Dusseau
6269639845 minor clean up 2022-12-06 11:18:58 -06:00
Remzi Arpaci-Dusseau
b95350cb74 mkfs root dir size wrong 2022-12-06 09:52:47 -06:00
Remzi Arpaci-Dusseau
8d2cbee595 clarify disk image stuff 2022-12-02 16:35:42 -06:00
Remzi Arpaci-Dusseau
685ab5f093 nit 2022-11-30 17:39:05 -06:00
Remzi Arpaci-Dusseau
2b121db582 more details in README 2022-11-29 11:41:29 -06:00
Remzi Arpaci-Dusseau
2ea50fda97 init cut 2022-11-29 08:53:47 -06:00
Remzi Arpaci-Dusseau
135c98f85e single pass 2022-10-27 16:05:52 -05:00
Remzi Arpaci-Dusseau
4b5e4c934d fsync 2022-10-27 16:04:35 -05:00
6 changed files with 458 additions and 0 deletions

View File

@@ -42,6 +42,11 @@ fixed-size, and are each 100 bytes (which includes the key).
A successful sort will read all the records into memory from the input A successful sort will read all the records into memory from the input
file, sort them, and then write them out to the output file. file, sort them, and then write them out to the output file.
You also have to force writes to disk by calling `fsync()` on the output file before finishing.
You can assume that this is a one-pass sort, i.e., the data can fit
into memory. You do not have to implement a multi-pass sort.
## Considerations ## Considerations
Doing so effectively and with high performance will require you to address (at Doing so effectively and with high performance will require you to address (at

View File

@@ -7,6 +7,8 @@ consistent. When it isn't, the checker takes steps to repair the problems it
sees; however, you won't be doing any repairs to keep this project a little sees; however, you won't be doing any repairs to keep this project a little
simpler. simpler.
Patrick Sharp created a better version of this project [here](https://github.com/patrick-sharp/ostep-projects/tree/master/filesystems-checker); when we get unlazy we will incorporate what he has done here. Thanks Patrick!
## Background ## Background
Some basic background about file system consistency is found here: Some basic background about file system consistency is found here:

View File

@@ -0,0 +1,181 @@
# Distributed File System
In this assignment, you will be developing a working *distributed file
server.* We provide you with only the bare minimal UDP communication
code; you have to build the rest.
## A Basic File Server
Your file server is built as a stand-alone UDP-based server. It should wait
for a message and then process the message as need be, replying to the given
client.
Your file server will store all of its data in an on-disk, fixed-sized
file which will be referred to as the *file system image*. This image
contains the on-disk representation of your data structures; you
should use these system calls to access it: `open(), read(), write(),
lseek(), close(), fsync().`
To access the file server, you will be building a client library. The
interface that the library supports is defined in [mfs.h](mfs.h). The
library should be called `libmfs.so`, and any programs that wish to access
your file server will link with it and call its various routines.
## On-Disk File System: A Basic Unix File System
The on-disk file system structures follow that of the
very simple file system discussed
[here](https://pages.cs.wisc.edu/~remzi/OSTEP/file-implementation.pdf). On-disk,
the structures are as follows:
- A single block (4KB) super block
- An inode bitmap (can be one or more 4KB blocks, depending on the number of inodes)
- A data bitmap (can be one or more 4KB blocks, depending on the number of data blocks)
- The inode table (a multiple of 4KB-sized blocks, depending on the number of inodes)
- The data region (some number of 4KB blocks, depending on the number of data blocks)
More details about on-disk structures can be found in the header [ufs.h](https://github.com/remzi-arpacidusseau/ostep-projects/blob/master/filesystems-distributed-ufs/ufs.h), which you should
use. Specifically, this has a very specific format for the super
block, inode, and directory entries. Bitmaps just have one bit per
allocated unit as described in the book.
As for directories, here is a little more detail. Each directory has
an inode, and points to one or more data blocks that contain directory
entries. Each directory entry should be simple, and consist of 32
bytes: a name and an inode number pair. The name should be a
fixed-length field of size 28 bytes; the inode number is just an
integer (4 bytes). When a directory is created, it should contain two
entries: the name `.` (dot), which refers to this new directory's
inode number, and `..` (dot-dot), which refers to the parent
directory's inode number. For directory entries that are not yet in
use (in an allocated 4-KB directory block), the inode number should be
set to -1. This way, utilities can scan through the entries to check
if they are valid.
When your server is started, it is passed the name of the file system
image file. The image is created by a tool we provide, called `mkfs`.
It is pretty self-explanatory and can be found
[here](https://github.com/remzi-arpacidusseau/ostep-projects/blob/master/filesystems-distributed-ufs/mkfs.c).
When booting off of an existing image, your server should read in the
superblock, bitmaps, and inode table, and keep in-memory versions of
these. When writing to the image, you should update these on-disk
structures accordingly.
Importantly, you cannot change the file-system on-disk format.
## Client library
The client library should export the following interfaces:
- `int MFS_Init(char *hostname, int port)`: `MFS_Init()` takes a host name
and port number and uses those to find the server exporting the file system.
- `int MFS_Lookup(int pinum, char *name)`: `MFS_Lookup()` takes the parent
inode number (which should be the inode number of a directory) and looks up
the entry `name` in it. The inode number of `name` is returned. Success:
return inode number of name; failure: return -1. Failure modes: invalid pinum,
name does not exist in pinum.
- `int MFS_Stat(int inum, MFS_Stat_t *m)`: `MFS_Stat()` returns some
information about the file specified by inum. Upon success, return 0,
otherwise -1. The exact info returned is defined by `MFS_Stat_t`. Failure modes:
inum does not exist. File and directory sizes are described below.
- `int MFS_Write(int inum, char *buffer, int offset, int nbytes)`:
`MFS_Write()` writes a buffer of size `nbytes` (max size: 4096 bytes) at the byte
offset specified by `offset`. Returns 0 on success, -1 on
failure. Failure modes: invalid inum, invalid nbytes, invalid offset, not a
regular file (because you can't write to directories).
- `int MFS_Read(int inum, char *buffer, int offset, int nbytes)`:
`MFS_Read()` reads `nbytes` of data (max size 4096 bytes) specified by the
byte offset `offset` into the buffer from file specified by
`inum`. The routine should work for either a file or directory;
directories should return data in the format specified by
`MFS_DirEnt_t`. Success: 0, failure: -1. Failure modes: invalid inum,
invalid offset, invalid nbytes.
- `int MFS_Creat(int pinum, int type, char *name)`: `MFS_Creat()` makes a
file (`type == MFS_REGULAR_FILE`) or directory (`type == MFS_DIRECTORY`)
in the parent directory specified by `pinum` of name `name`. Returns 0 on
success, -1 on failure. Failure modes: pinum does not exist, or name is too
long. If `name` already exists, return success.
- `int MFS_Unlink(int pinum, char *name)`: `MFS_Unlink()` removes the file or
directory `name` from the directory specified by `pinum`. 0 on success, -1
on failure. Failure modes: pinum does not exist, directory is NOT empty. Note
that the name not existing is NOT a failure by our definition (think about why
this might be).
- `int MFS_Shutdown()`: `MFS_Shutdown()` just tells the server to force all
of its data structures to disk and shutdown by calling `exit(0)`. This interface
will mostly be used for testing purposes.
Size: The size of a file is the offset of the last valid byte written
to the file. Specifically, if you write 100 bytes to an empty file at
offset 0, the size is 100; if you write 100 bytes to an empty file at
offset 10, the size is 110. For a directory, it is the same (i.e., the
byte offset of the last byte of the last valid entry).
## Server Idempotency
The key behavior implemented by the server is *idempotency*.
Specifically, on any change to the file system state (such as a
`MFS_Write`, `MFS_Creat`, or `MFS_Unlink`), all the dirtied buffers in the
server are committed to the disk. The server can achieved this end by
calling `fsync()` on the file system image. Thus, before returning a
success code, the file system should always `fsync()` the image.
Now you might be wondering: why do this? Simple: if the server crashes, the
client can simply timeout and retry the operation and know that it is OK to do
so. Read [this chapter](https://pages.cs.wisc.edu/~remzi/OSTEP/dist-nfs.pdf) on NFS
for details.
Now you might be wondering: how do I implement a timeout? Simple, with the
`select()` interface. The `select()` calls allows you to wait for a reply
on a certain socket descriptor (or more than one, though that is not needed
here). You can even specify a timeout so that the client does not block
forever waiting for data to be returned from the server. By doing so, you can
wait for a reply for a certain amount of time, and if nothing is returned, try
the operation again until it is successful.
## Program Specifications
Your server program must be invoked exactly as follows:
prompt> server [portnum] [file-system-image]
The command line arguments to your file server are to be interpreted as follows.
- portnum: the port number that the file server should listen on.
- file-system-image: a file that contains the file system image.
If the file system image does not exist, you should print out an error
message (`image does not exist\n`) and exit with exit code 1.
Your client library should be called `libmfs.so`. It should implement
the interface as specified by `mfs.h`, and in particular deal with
the case where the server does not reply in a timely fashion; the way
it deals with that is simply by retrying the operation, after a
timeout of some kind (default: five second timeout).
## Relevant Chapters
Read these:
- [File System Implementation](https://pages.cs.wisc.edu/~remzi/OSTEP/file-implementation.pdf)
- [Distributed Systems](https://pages.cs.wisc.edu/~remzi/OSTEP/dist-intro.pdf)
- [Distributed File System: NFS](https://pages.cs.wisc.edu/~remzi/OSTEP/dist-nfs.pdf)
## Some Helper Code
To get you going, we have written some simple UDP code that can send a
message and then receive a reply from a client to a server. It can be found in
[here](https://github.com/remzi-arpacidusseau/ostep-code/tree/master/dist-intro).
There is also other code as mentioned above:
- [mfs.h](https://github.com/remzi-arpacidusseau/ostep-projects/blob/master/filesystems-distributed-ufs/mfs.h)
- [ufs.h](https://github.com/remzi-arpacidusseau/ostep-projects/blob/master/filesystems-distributed-ufs/ufs.h)
- [mkfs.c](https://github.com/remzi-arpacidusseau/ostep-projects/blob/master/filesystems-distributed-ufs/mkfs.c)
You'll also have to learn how to make a shared library. Read [here](https://tldp.org/HOWTO/Program-Library-HOWTO/shared-libraries.html) for more information.

View File

@@ -0,0 +1,30 @@
#ifndef __MFS_h__
#define __MFS_h__
#define MFS_DIRECTORY (0)
#define MFS_REGULAR_FILE (1)
#define MFS_BLOCK_SIZE (4096)
typedef struct __MFS_Stat_t {
int type; // MFS_DIRECTORY or MFS_REGULAR
int size; // bytes
// note: no permissions, access times, etc.
} MFS_Stat_t;
typedef struct __MFS_DirEnt_t {
char name[28]; // up to 28 bytes of name in directory (including \0)
int inum; // inode number of entry (-1 means entry not used)
} MFS_DirEnt_t;
int MFS_Init(char *hostname, int port);
int MFS_Lookup(int pinum, char *name);
int MFS_Stat(int inum, MFS_Stat_t *m);
int MFS_Write(int inum, char *buffer, int offset, int nbytes);
int MFS_Read(int inum, char *buffer, int offset, int nbytes);
int MFS_Creat(int pinum, int type, char *name);
int MFS_Unlink(int pinum, char *name);
int MFS_Shutdown();
#endif // __MFS_h__

View File

@@ -0,0 +1,203 @@
#include <assert.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include "ufs.h"
void usage() {
fprintf(stderr, "usage: mkfs -f <image_file> [-d <num_data_blocks] [-i <num_inodes>]\n");
exit(1);
}
int main(int argc, char *argv[]) {
int ch;
char *image_file = NULL;
int num_inodes = 32;
int num_data = 32;
int visual = 0;
while ((ch = getopt(argc, argv, "i:d:f:v")) != -1) {
switch (ch) {
case 'i':
num_inodes = atoi(optarg);
break;
case 'd':
num_data = atoi(optarg);
break;
case 'f':
image_file = optarg;
break;
case 'v':
visual = 1;
break;
default:
usage();
}
}
argc -= optind;
argv += optind;
if (image_file == NULL)
usage();
unsigned char *empty_buffer;
empty_buffer = calloc(UFS_BLOCK_SIZE, 1);
if (empty_buffer == NULL) {
perror("calloc");
exit(1);
}
int fd = open(image_file, O_WRONLY | O_CREAT | O_TRUNC, S_IRUSR | S_IWUSR);
if (fd < 0) {
perror("open");
exit(1);
}
assert(num_inodes >= 32);
assert(num_data >= 32);
// presumed: block 0 is the super block
super_t s;
// totals
s.num_inodes = num_inodes;
s.num_data = num_data;
// inode bitmap
int bits_per_block = (8 * UFS_BLOCK_SIZE); // remember, there are 8 bits per byte
s.inode_bitmap_addr = 1;
s.inode_bitmap_len = num_inodes / bits_per_block;
if (num_inodes % bits_per_block != 0)
s.inode_bitmap_len++;
// data bitmap
s.data_bitmap_addr = s.inode_bitmap_addr + s.inode_bitmap_len;
s.data_bitmap_len = num_data / bits_per_block;
if (num_data % bits_per_block != 0)
s.data_bitmap_len++;
// inode table
s.inode_region_addr = s.data_bitmap_addr + s.data_bitmap_len;
int total_inode_bytes = num_inodes * sizeof(inode_t);
s.inode_region_len = total_inode_bytes / UFS_BLOCK_SIZE;
if (total_inode_bytes % UFS_BLOCK_SIZE != 0)
s.inode_region_len++;
// data blocks
s.data_region_addr = s.inode_region_addr + s.inode_region_len;
s.data_region_len = num_data;
int total_blocks = 1 + s.inode_bitmap_len + s.data_bitmap_len + s.inode_region_len + s.data_region_len;
// super block is the first block
int rc = pwrite(fd, &s, sizeof(super_t), 0);
if (rc != sizeof(super_t)) {
perror("write");
exit(1);
}
printf("total blocks %d\n", total_blocks);
printf(" inodes %d [size of each: %lu]\n", num_inodes, sizeof(inode_t));
printf(" data blocks %d\n", num_data);
printf("layout details\n");
printf(" inode bitmap address/len %d [%d]\n", s.inode_bitmap_addr, s.inode_bitmap_len);
printf(" data bitmap address/len %d [%d]\n", s.data_bitmap_addr, s.data_bitmap_len);
// first, zero out all the blocks
int i;
for (i = 1; i < total_blocks; i++) {
rc = pwrite(fd, empty_buffer, UFS_BLOCK_SIZE, i * UFS_BLOCK_SIZE);
if (rc != UFS_BLOCK_SIZE) {
perror("write");
exit(1);
}
}
//
// need to allocate first inode in inode bitmap
//
typedef struct {
unsigned int bits[UFS_BLOCK_SIZE / sizeof(unsigned int)];
} bitmap_t;
assert(sizeof(bitmap_t) == UFS_BLOCK_SIZE);
bitmap_t b;
for (i = 0; i < 1024; i++)
b.bits[i] = 0;
b.bits[0] = 0x1 << 31; // first entry is allocated
rc = pwrite(fd, &b, UFS_BLOCK_SIZE, s.inode_bitmap_addr * UFS_BLOCK_SIZE);
assert(rc == UFS_BLOCK_SIZE);
//
// need to allocate first data block in data bitmap
// (can just reuse this to write out data bitmap too)
//
rc = pwrite(fd, &b, UFS_BLOCK_SIZE, s.data_bitmap_addr * UFS_BLOCK_SIZE);
assert(rc == UFS_BLOCK_SIZE);
//
// need to write out inode
//
typedef struct {
inode_t inodes[UFS_BLOCK_SIZE / sizeof(inode_t)];
} inode_block;
inode_block itable;
itable.inodes[0].type = UFS_DIRECTORY;
itable.inodes[0].size = 2 * sizeof(dir_ent_t); // in bytes
itable.inodes[0].direct[0] = s.data_region_addr;
for (i = 1; i < DIRECT_PTRS; i++)
itable.inodes[0].direct[i] = -1;
rc = pwrite(fd, &itable, UFS_BLOCK_SIZE, s.inode_region_addr * UFS_BLOCK_SIZE);
assert(rc == UFS_BLOCK_SIZE);
//
// need to write out root directory contents to first data block
// create a root directory, with nothing in it
//
typedef struct {
dir_ent_t entries[128];
} dir_block_t;
// xxx assumes 4096 block, 32 byte entries
assert(sizeof(dir_ent_t) * 128 == UFS_BLOCK_SIZE);
dir_block_t parent;
strcpy(parent.entries[0].name, ".");
parent.entries[0].inum = 0;
strcpy(parent.entries[1].name, "..");
parent.entries[1].inum = 0;
for (i = 2; i < 128; i++)
parent.entries[i].inum = -1;
rc = pwrite(fd, &parent, UFS_BLOCK_SIZE, s.data_region_addr * UFS_BLOCK_SIZE);
assert(rc == UFS_BLOCK_SIZE);
if (visual) {
int i;
printf("\nVisualization of layout\n\n");
printf("S");
for (i = 0; i < s.inode_bitmap_len; i++)
printf("i");
for (i = 0; i < s.data_bitmap_len; i++)
printf("d");
for (i = 0; i < s.inode_region_len; i++)
printf("I");
for (i = 0; i < s.data_region_len; i++)
printf("D");
printf("\n\n");
}
(void) fsync(fd);
(void) close(fd);
return 0;
}

View File

@@ -0,0 +1,37 @@
#ifndef __ufs_h__
#define __ufs_h__
#define UFS_DIRECTORY (0)
#define UFS_REGULAR_FILE (1)
#define UFS_BLOCK_SIZE (4096)
#define DIRECT_PTRS (30)
typedef struct {
int type; // MFS_DIRECTORY or MFS_REGULAR
int size; // bytes
unsigned int direct[DIRECT_PTRS];
} inode_t;
typedef struct {
char name[28]; // up to 28 bytes of name in directory (including \0)
int inum; // inode number of entry (-1 means entry not used)
} dir_ent_t;
// presumed: block 0 is the super block
typedef struct __super {
int inode_bitmap_addr; // block address (in blocks)
int inode_bitmap_len; // in blocks
int data_bitmap_addr; // block address (in blocks)
int data_bitmap_len; // in blocks
int inode_region_addr; // block address (in blocks)
int inode_region_len; // in blocks
int data_region_addr; // block address (in blocks)
int data_region_len; // in blocks
int num_inodes; // just the number of inodes
int num_data; // and data blocks...
} super_t;
#endif // __ufs_h__