Tuesday, November 18, 2008

Java Image Comparison - Motion Detection

So this is what I've managed to do with some simple image comparison thus far... I've repackaged it all to try make some sense of it; The sample apps reside in com.b22222.app.webcam; The most informative app probably being WebCamState.java

I've simply zipped my source folder. I use eclipse at this present time and have left my project settings file in the archive. The sample apps require java's JMF to be installed on the system.

What are my goals?
I want to design some kind of (physical) contact free input system; This system should be as functional as, or more than the standard keyboard and mouse. Now that's the main goal... A bunch of other possibilities have come alive since I've been tinkering around. I can now set my camera facing the front door and get email alerts at work (with pictures of the culprit) when motion is detected at home. I've also got plans in the pipe to design a kinetic sculpture that reacts to motion in front of it. The list continues.

Where am I now?
I can continually read from a live stream of webcam video and dismantle the picture into more pliable data. With this data I can pick up motion hotspots, primitive edge detection (particularly bad on blunt edges), and some noise reduction. All this is demo'ed in the attached library of code. Please keep this in mind that I develop in spare time by myself who has never studied image or video in large depth - I mostly only design as much I need to progress onto my next goal.

Most general application settings end up in my settings.ini file. This might be a good place to start tinkering with values if you want to poke and prod my library. I had tried to use a neural network to interpret webcam input but it never really worked. I have left my code there in case I revisit that idea. The neural net library I used is called Joone and is freely downloadable.

Details of how I dismantle an image:
Although I have copied the code in several places throughout, the best piece of code to reference this process is probably com.b22222.routine.ImageHelper.
A BufferedImage (Raw Image pixel data) is drawn from the webcam source. This image is converted into what I called a State object. A State is simply a 2d array of integers. At the moment this array represents the brightness on the pixels from the image. (I plan to somehow incorporate hue difference into this as well in the future.) From here you should partly forget that you are working with images, but rather arrays of numbers. I did this so that from here our code could be used for any map of numbers - Say cloud patterns, or temperature maps. (Not that I ever intend to go down this road myself)

A Comparison is an object drawn from the difference of two States. If I remember correctly I just subtracted one array from the other. We now have a new array of numbers representing the difference between to images. This may help two fold; a) I'm obviously searching for motion, and b) if 70% of an image rarely changes then we want to consciously ignore it.

Next a primitive EdgeDetector object can process the change map to emphasize the edges of island and lines. This is not an essential stage and could quite likely be taken out if it is silencing too much useful data.

After that I found two statistics of the data in the array: The average, and the *standard-deviation. Then using these multiplied by factors specified in the settings file, if the value in the array is not greater than the average + std-dev, then it is set to zero. This helps clean up small noise generated through subtle light differences, etc.

The array that is now left contains data that is somewhat usefull to me. And hopefully you :-)

*Ps: I always mixed up standard deviation and variance. It's one of the two. I think.

This can all be seen in action in the com.b22222.app.webcam.WebCamState class.
Instructions for use: Run the app. Wait for the video feed to register in the left window. (Mine usually takes a few seconds). Then click the left button titled 'Capture Base Image'. This will set the image to compare against for motion detection. Now click the right button titled 'Start Compare...'.

Here is a demo of when I tuned my settings to pick up my black pen against a white wall. (The red circles are rendered onto the image in areas of interest. The double green circle is the center of gravity of the points of interest)

What do I plan on doing next?
Creating a more 'opaque' input interface. This interface would expose some events and hide all of the workings by the libraries described above. This interface will most likely provide some kind of coordinate information. I would also like to provide, but have no idea how, a polygon best representing the image input.

Long term?
I'm thinking of a menu/list based input system. A user would navigate from menu to menu choosing options which would either automate keyboard input or mouse input. This is in part a resignation because I don't think getting pixel perfect mouse positioning will be viable with webcam input; but storing keyboard/mouse inputs in templates, sequences and menus could reduce work for repetitive actions normally done on these devices.

In case you missed it up top, this is the source code.