Kernel API Wishlist
Michael Stone
michael at laptop.org
Thu Jul 31 11:46:48 EDT 2008
Deepak,
I don't think I'm going to be able to attend LPC but Chris suggested
that I offer you some wishlist items in case you meet someone who would
be interested in them. (I'm chipping away at them in my free time, but
at that rate... :)
Anyway, here's my grab-bag of items:
a) The filesystem is basically a shared memory with a rooted directed
(usually acyclic) graph-structured variable-length addresses. It is
well known that processes communicating with shared memory benefit from
atomic primitives for state update and it is also apparent that present
programs like olpc-update and qmail use _dirty hacks_ based on
rename(), link(), and symlink() to implement safe atomic updates.
Please give us more powerful atomic primitives -- e.g. some of CAS,
TAS, k-CAS, double-CAS, and "load-linked/store-conditional" (LL/SC)
operations. (Ask Scott for detailed citations of papers on the
strengths and uses of these operations. He suggests 'things cited in
[1]'. See [2] for many more papers on the subject.)
b) Plan 9 showed that networking can be conveniently expressed through
filesystem primitives. This means that _access control_ of networking
can be expressed with filesystem permissions. This would be _much_
nicer than current firewall languages since it would permit user-level
programs to exercise real control over what networking their
subcomponents perform. Separately, it would be nice if userland could
instruct the kernel to rate-limit writes to mount-points, inodes, fds,
etc.
c) Secure Unix daemons are commonly implemented with privilege
separation along uid boundaries but:
a) setresuid, setresgid, setgroups, etc. cannot be called together to
atomically change all credentials of a process
b) These operations only permit us to change the credentials of the
_calling_ process.
c) The only way we have to refer to processes is by pid. Pids are
aliasable -- i.e. they can be vacated and reused without notification
to the referring process. (And I can't use the standard wait
primitives on processes who aren't my children.)
This is problematic in the case of Rainbow because we really want to
securely manipulate _other_ processes' credentials. The kind of API
that I really want here is
a) the ability to get a handle pointing to a process
b) the ability to wait - or not - on events on that handle or to
signal the process with the handle
c) the ability to atomically change all credentials on processes for
which I have writable handle.
(P.S. - file descriptors are really nice handles!)
d) All of the items in the security/reliability section of
http://cr.yp.to/unix.html
Michael
[1]: http://research.sun.com/scalable/pubs/SPAA04.pdf
[2]: http://research.sun.com/scalable/
More information about the Devel
mailing list