Computer vision on the XO: what it is. what should it be?

Tue Jul 22 22:43:44 EDT 2008

I'm writing computer vision functions for Pygame (available at
http://git.n0r.org/?p=pygame-nrp;a=summary ), and I've gotten to the
point where I very much need community input on where to go next.
Basically, I would like to know what you want to be able to do with
the camera on the XO, whether its related to gaming, input,
accessibility, education, or anything else.

What it can currently do, in overly simplistic terms:

1. Capture images:  This is the basis for everything else, but it can
be useful on its own too.  For example, letting someone take a picture
of themselves, let them select their face in the picture, and then
crop that and use it as a character in a game.

2. Get the average color:  This is useful for picking values to
threshold, but you can also switch to YUV or HSV colorspace and find
the average brightness of the area.

3. Threshold images:  Thresholding is pretty flexible.  You can have
it select everything within a threshold of a color or everything
outside the threshold of a color.  You can also threshold between two
images.  This lets you get a "green screen" effect, so you can have a
person being displayed realtime over a virtual background.

4. Track an object:  After thresholding, you can turn the remaining
object into a bitmask and get various properties about it like its
bounding box, centroid, size in pixels, and angle with respect to the
x-axis.  You can also test collisions between the mask and masks of
virtual objects.

They may seem pretty basic and simple, but you can do a lot of things
by combining them.  For example:

1.  Drawing with a real life object:  Have the user pick up an object
and hold it up so it fills a box being displayed on screen and hit a
button.  Save the average color within the box.  Threshold out just
that color, turn it into a bitmask, get the largest connected
component, and find the centroid of it.  Use that centroid as the
coordinates for the on screen paint brush, perhaps also using the
saved average color.  The user now has the illusion of using the
object in hand as a paint brush.

2.  Play pong with your hand:  Have the user step out of the field of
view of the camera, save the image.  Now threshold between the saved
image and the images currently being captured.  This results in just
showing the differences between the two (like the user, who has now
stepped back into the field of view).  Turn the image into a bitmask
and check it for collisions with the bitmask of the ball in pong.
Actually, now that I'm thinking about it, I'll probably write this
game when I get home.

Other examples: http://eclecti.cc/olpc

What it doesn't do yet, but could depending on if there is interest:

1. Generic motion detection:  You can do stuff like thresholding
between a background and current images, or tracking whether blobs of
colors have moved, but neither is a great way of detecting total
motion in an image.  There are many optical flow algorithms, but
finding one that'll run realtime on an XO will be tricky.  The other
issue is how to present motion to the developer.  Perhaps have the
function request two images and a list of points on the first image,
and return a list of where it guesses the points are now on the second
image.

2. Object recognition:  You could guess based on the size or color of
an object, but there isn't really a way of detecting if what you're
holding up is a lime or a pear.  There are a lot of really
computationally heavy ways of doing object recognition, but there are
also some lightweight ways.  The one I was considering was using image
moments (which I am currently doing to find the centroid and angle of
an object), to get basic parameters like the eccentricity and skewness
of an object.  There is also the Hu set of invariant moments that will
give more information about an object, though not really in a
human-friendly form.  Thus, while I could fairly easily write
functions that would drop a dozen of these numbers, I'm not sure
anyone would be able to make use of them in an Activity.

So far, I've just been writing functions based on usage cases I could
think of in games.  There are probably many that I just haven't
thought of for which the current functions are inadequate.  So, any
ideas would be appreciated.

Nirav