[Planetlab-devel] why FUSE should be safe, and why it isn't

Marc E. Fiuczynski mef at CS.Princeton.EDU
Wed Nov 22 14:11:14 EST 2006


Am forwarding a nice summary on FUSE status from Russ Cox (MIT) to the list.
Subject says it all.  Russ, could you give us an update since we last
exchange email. --Marc

[I'm not on planetlab-users but Jeremy and Frans forwarded your mail.]

High order bit: there's a good argument to be made for why FUSE
isn't inherently unsafe to allow on PlanetLab, but in fact right now
there appear to be some problems left to be ironed out.

* Why FUSE should be safe to run on PlanetLab

This is mostly speculation based on my (rather complete) knowledge of how
the fuse interface and protocol works and my (rather incomplete) reading
of how it is implemented in the Linux source.

The only data structures that fuse maintains are:
  - struct fuse_conn, representing mounted connection
     (one per open of /dev/fuse)
  - struct fuse_req, a request sent into user space
     (one per active system call involving fuse,
     limited to FUSE_MAX_OUTSTANDING (10)
     per mounted connection, preallocated at mount time)
  - struct fuse_file, representing an open file
     (one per open file descriptor on a fuse file system)
  - struct fuse_inode, representing a directory entry
     (one per active inode on a fuse file system,
     plus one per cached inode having to do with the fuse fs).

Each active fuse connection incurs a memory requirement of one fuse_conn
and 10 fuse_reqs.  You can only have one active connection per open
/dev/fuse, so as long as there is a per-slice fd limit, you cant DOS
the system by opening /dev/fuse lots of times.

Each active fd corresponding to the user-level fs incurs a memory
requirement of one fuse_file.  Again a per-slice fd limit would avoid
DOSing the system.

Each active inode corresponding to the user-level fs incurs a memory
requirement of one fuse_inode.  There is at most one inode per active
fd, and then at most one inode per cwd for processes that cd into the
user-level fs.  Per-slice fd and process limits cap this.

There is also a fuse_inode for each fuse inode that is cached in the
kernel's inode cache.  Presumably the kernel can handle its inode cache
without being DOSed.  A fuse inode is about the same size as an ext3 inode
(it might actually be smaller) so no new vulnerabilities here.

That's just memory, of course.  One might legitimately worry about
deadlocks of one kind or another, or fuse's reaction to bogus inputs.

Plan 9 goes to great lengths to worry about making sure the clients and
servers agree on the current state of the protocol conversation (things
like which files are open, etc.).  Getting that right without deadlock
is not trivial.  Fuse makes no attempt to get this right, so there is
no worry about deadlock here -- fuse can, when it needs to, issue a
"i'm done with this fd" or "i'm done with this vnode" and that's that.
It does not wait for a response from user space (there is no response
for those messages).  This means that fuse can reclaim all its resources
even if the user space file server stops responding.  Fuse pre-allocates
space for the cancel messages, so it can't deadlock because it is out
of memory either.

I haven't tried sending fuse malicious inputs, but I have sent it my fair
share of broken inputs and I never got it to crash or behave incorrectly.
It always gave me a nice error message.  Overall I was quite surprised
at how sturdy it was.  Very different from the Linux 9P implementation,
which is very shaky.  I haven't inspected the code to see that it handles
invalid counts correctly in the read and write messages, but presumably
message parsing is a solvable problem.

A more interesting question is what kind of bugs in the rest of the
system might be tickleable using fuse, since the user can present
arbitrarily pathological file systems to the rest of the operating system.
Obviously fuse doesn't honor setuid bits, for example.
Whether there are bugs in the rest of Linux that you care about
is another issue entirely, separate from whether fuse itself is safe.

* Why FUSE isn't safe to run on PlanetLab

I wrote a test program that talks directly to the fuse kernel module.
It doesn't use the fuse user library, because I found it easier to
write my own interface than to figure out how to use the "advanced" fuse
interface (I did this over the summer on a lark, not for this exercise).

I tested on my Ubuntu Edgy laptop running Linux 2.6.17.

I verified that mounting lots of fuses doesn't cause any problems.
I was able to mount 1000 before getting an error about too many fuse file
systems.  I believe this limit is fuse-imposed, but I would presumably
have run out of file descriptors in the process doing the mounting
not long thereafter.

Then I started poking around at what happens if the user-level fs presents
various pathological file systems that would never occur in practice.
Here I think I found some problems, and I suspect it's Linux's fault.

I verified that creating a big binary tree with a ton of directories
(2^50) doesn't cause any problems: du and friends had no problems.

Then I tried creating a big linear tree with a ton of directories
in a giant line: d1, d1/d2, d1/d2/d3, d1/d2/d3/d4, ..., up to
d1/d2/.../d(2^50).  If I run du it churns for a while (expected).
If I kill it and run du again, this one wedges in the kernel using 99%
of my CPU and is not killable, even after I kill off the user-level
file server.  I have sent a bug report to the fuse-devel mailing list:
http://article.gmane.org/gmane.comp.file-systems.fuse.devel/3826

I was going to try having a big flat tree with 2^50 files in one directory
too, but I've stopped pounding until they fix whatever bug du is tickling
(which may well not be in fuse itself but in some central Linux kernel
cache).  Wedging my machine makes it essentially unusable so I have to
reboot between experiments.

* Conclusion

Once the fuse guy fixes this problem, I'll keep pounding on fuse some
more.  It looks like it isn't ready for prime time yet.

Even so, unless you are worried about malicious users of PlanetLab (are
there any?), it's probably safe enough to deploy at least experimentally
to get feedback.  Depends how paranoid you are.

Russ



More information about the Devel mailing list