Computer vision on the XO: what it is. what should it be?
brian at laptop.org
Wed Jul 23 02:46:07 EDT 2008
Hey Nirav, games, devel!
On Tue, Jul 22, 2008 at 10:43 PM, Nirav Patel <olpc at spongezone.net> wrote:
> I'm writing computer vision functions for Pygame (available at
> http://git.n0r.org/?p=pygame-nrp;a=summary ), and I've gotten to the
> point where I very much need community input on where to go next.
> Basically, I would like to know what you want to be able to do with
> the camera on the XO, whether its related to gaming, input,
> accessibility, education, or anything else.
> What it can currently do, in overly simplistic terms:
> 1. Capture images: This is the basis for everything else, but it can
> be useful on its own too. For example, letting someone take a picture
> of themselves, let them select their face in the picture, and then
> crop that and use it as a character in a game.
Yes! This would be something *really cool* to be able to build with
during the Physics/Game jam that will be held August 29-31 (full
announcement coming soon!).
I know you likely won't be in Boston for that time, but I hope you and
others will participate in this devfest via IRC/Gobby/Audio/Video!
This would also be great to use with Box2D (used in Physics activity
and available as the pygame/Elements API), which is showing some
promise for simple (GUI-tool-based) physics-based game creation.
I've also updated the OLPC physics page to reflect this email and
portions of the #olpc-physics conversation on this:
> 2. Get the average color: This is useful for picking values to
> threshold, but you can also switch to YUV or HSV colorspace and find
> the average brightness of the area.
> 3. Threshold images: Thresholding is pretty flexible. You can have
> it select everything within a threshold of a color or everything
> outside the threshold of a color. You can also threshold between two
> images. This lets you get a "green screen" effect, so you can have a
> person being displayed realtime over a virtual background.
> 4. Track an object: After thresholding, you can turn the remaining
> object into a bitmask and get various properties about it like its
> bounding box, centroid, size in pixels, and angle with respect to the
> x-axis. You can also test collisions between the mask and masks of
> virtual objects.
> They may seem pretty basic and simple, but you can do a lot of things
> by combining them. For example:
> 1. Drawing with a real life object: Have the user pick up an object
> and hold it up so it fills a box being displayed on screen and hit a
> button. Save the average color within the box. Threshold out just
> that color, turn it into a bitmask, get the largest connected
> component, and find the centroid of it. Use that centroid as the
> coordinates for the on screen paint brush, perhaps also using the
> saved average color. The user now has the illusion of using the
> object in hand as a paint brush.
Playing with your "Paint!" vision processing addition is one of the
most fun things I've done on the XO.
> 2. Play pong with your hand: Have the user step out of the field of
> view of the camera, save the image. Now threshold between the saved
> image and the images currently being captured. This results in just
> showing the differences between the two (like the user, who has now
> stepped back into the field of view). Turn the image into a bitmask
> and check it for collisions with the bitmask of the ball in pong.
> Actually, now that I'm thinking about it, I'll probably write this
> game when I get home.
> Other examples: http://eclecti.cc/olpc
> What it doesn't do yet, but could depending on if there is interest:
> 1. Generic motion detection: You can do stuff like thresholding
> between a background and current images, or tracking whether blobs of
> colors have moved, but neither is a great way of detecting total
> motion in an image. There are many optical flow algorithms, but
> finding one that'll run realtime on an XO will be tricky. The other
> issue is how to present motion to the developer. Perhaps have the
> function request two images and a list of points on the first image,
> and return a list of where it guesses the points are now on the second
> 2. Object recognition: You could guess based on the size or color of
> an object, but there isn't really a way of detecting if what you're
> holding up is a lime or a pear. There are a lot of really
> computationally heavy ways of doing object recognition, but there are
> also some lightweight ways. The one I was considering was using image
> moments (which I am currently doing to find the centroid and angle of
> an object), to get basic parameters like the eccentricity and skewness
> of an object. There is also the Hu set of invariant moments that will
> give more information about an object, though not really in a
> human-friendly form. Thus, while I could fairly easily write
> functions that would drop a dozen of these numbers, I'm not sure
> anyone would be able to make use of them in an Activity.
We had an interesting conversation about using vision processing to
decompose polygons for the physics engine (or, more generally, simple
pygame/SDL vector-based drawing). Let's figure out a good method for
this! The current thing-to-be-done is to write a function that takes a
pygame bitmask and turns it into a list of points for polygon
http://wiki.laptop.org/go/Physics_engines/Speed_tests shows that
pygame's drawing is what is slowing down these things from running
quickly on the XO. Does anyone know why pygame/SDL drawing is how slow
it is on the XO? If it's possible to make it faster, or optimize
activity code to do that?
> So far, I've just been writing functions based on usage cases I could
> think of in games. There are probably many that I just haven't
> thought of for which the current functions are inadequate. So, any
> ideas would be appreciated.
This is a ripe topic for new ideas and Activities. Yes! Please share
ideas, and be sure to update a wiki page with them, so hackers can
begin access these ideas before and during the August jam!
A wiki page for Vision Processing seems in order for opening up some
of this hacking and collecting vision-specific ideas, as well)
Nirav and I both hang out a bunch on #olpc-physics on
irc.freenode.net... come by and chat/help!
> Devel mailing list
> Devel at lists.laptop.org
More information about the Devel