Kernel API Wishlist

Thu Jul 31 11:46:48 EDT 2008

Deepak,

I don't think I'm going to be able to attend LPC but Chris suggested
that I offer you some wishlist items in case you meet someone who would
be interested in them. (I'm chipping away at them in my free time, but
at that rate... :)

Anyway, here's my grab-bag of items:

  a) The filesystem is basically a shared memory with a rooted directed
  (usually acyclic) graph-structured variable-length addresses. It is
  well known that processes communicating with shared memory benefit from
  atomic primitives for state update and it is also apparent that present
  programs like olpc-update and qmail use _dirty hacks_ based on
  rename(), link(), and symlink() to implement safe atomic updates.
  Please give us more powerful atomic primitives -- e.g. some of CAS,
  TAS, k-CAS, double-CAS, and "load-linked/store-conditional" (LL/SC)
  operations. (Ask Scott for detailed citations of papers on the
  strengths and uses of these operations. He suggests 'things cited in
  [1]'. See [2] for many more papers on the subject.)

  b) Plan 9 showed that networking can be conveniently expressed through
  filesystem primitives. This means that _access control_ of networking
  can be expressed with filesystem permissions. This would be _much_
  nicer than current firewall languages since it would permit user-level
  programs to exercise real control over what networking their
  subcomponents perform. Separately, it would be nice if userland could
  instruct the kernel to rate-limit writes to mount-points, inodes, fds,
  etc.

  c) Secure Unix daemons are commonly implemented with privilege
  separation along uid boundaries but:

    a) setresuid, setresgid, setgroups, etc. cannot be called together to
    atomically change all credentials of a process

    b) These operations only permit us to change the credentials of the
    _calling_ process.

    c) The only way we have to refer to processes is by pid. Pids are
    aliasable -- i.e. they can be vacated and reused without notification
    to the referring process. (And I can't use the standard wait
    primitives on processes who aren't my children.)

  This is problematic in the case of Rainbow because we really want to
  securely manipulate _other_ processes' credentials. The kind of API
  that I really want here is 

    a) the ability to get a handle pointing to a process
    b) the ability to wait - or not - on events on that handle or to
    signal the process with the handle
    c) the ability to atomically change all credentials on processes for
    which I have writable handle.

  (P.S. - file descriptors are really nice handles!)

  d) All of the items in the security/reliability section of
  http://cr.yp.to/unix.html

Michael

[1]: http://research.sun.com/scalable/pubs/SPAA04.pdf
[2]: http://research.sun.com/scalable/