Global Game Jam 2012: Friday!

I’ve been spending a lot of time lately rebuilding the old Triangle Game Jam website. Now that it’s done I can get back to new articles. The new site is up at http://www.trianglegamejam.com.  It’s where we’ll be archiving all Raleigh/Durham/Chapel Hill – Global and Triangle game jams.  All the past years works that I could get my hands on are up there and worth taking a look at.  It’s amazing what people with a goal can create in a weekend.

In other news – The global game jam starts this Friday (January 27th).  You should signup and find a local jam site you can attend.  It’s totally worth a weekend of your time.

Creating Kinect Controls for Angry Birds

Before I begin let me give you a little background.  About a month ago we decided to try and create NUI (Natural User Interface) or Kinect controls for Angry Birds.  Partially as a learning experience but also as proof of concept that games like Angry Birds (ex. applications that lend themselves to touchscreen devices) can work in a gestural / depth camera environment better than has been demonstrated.  In the end we tried many different input methods that would be possible with a Kinect and I wanted to catalog that experience.

I’ll start with the worst input paradigm we tried go from there.

#5 The Faux Mouse

The Idea
First we just tried mapping the hands to the screen as if they were each a mouse or finger.  Each would control a hand like cursor and move it around the screen.  Clicks could either be performed by grasping the hands or pushing towards the screen.  The game would be played normally – except you’d click on everything using your hand.

The Reality
This input method was by far the worst.  Your hands are simply not mice, they get tired much faster and are not as dexterous.  Even with heavy filtering and snap-able buttons, the interaction is just too nuanced and motor skill intensive an operation to click on a button or object on the screen using your hands as mice.  The time it took us to play a level vs. someone playing it on the iPhone was an order of magnitude or longer a task.

When pushing towards the screen a lot of care had to be taken to deal with user drift.  When a user pushes towards the screen, they may in fact be pushing towards many possible locations.  They may drift towards the TV, or the camera if that’s their focus.  They may also consciously attempt to maintain a straight hand as they push forward.  No matter the case – a user will drift even off their intended target.

When we tried grasp detection using computer vision – again we saw drift.  When a user opens and closes their hand the volume of the hand is changed from the point of view of the camera.  The result ends up displacing the center of mass of the hand causing it to shift downward when the hand is closed.  There are solutions to this problem such as eliminating the fingers from the hand volume calculation but this is a difficult problem given the quality of the data.

In the end the drift issues prevented us from playing as well as we could with the real mouse or on the iPhone and since Angry Birds is both a game of quick victory and defeat as well as a game of precision.  Having both large problems with precision and game play progression speed – we ditched this idea.

#4 Voice

The Idea
This one is exactly what you might expect – firing the bird using your voice.  Either by saying “Fire!” or maybe… “Ca Cah!”.

The Reality
We really were not sure how this method would perform.  The voice portion was layered onto the mouse control system.  Essentially you would click on the slingshot – move your hand to aim and exclaim viciously to let loose the birds of war.  The hope was that it would alleviate the drift problems, which it did.  Without having to push your hand forward to fire the drift was eliminated from firing.

However there were other problems, there was delay in the speech recognition and often outright failure if you didn’t say the words just right.  We tried several tricks, like using multiple words to identify “Fire”.  One good way to generate that list is to take the top 10 words it mistakened someone saying fire for and add those to the dictionary as triggers for firing the bird.

Moving Away From Faux Mouse

After the failures with both #4 and #5 we went back to the drawing board.  We needed to get away from the mouse or touch device centered thinking.  The worlds are simply too different to treat it the same.  So we prototyped 3 other solutions that ended up all being better than the mouse style interface.

The ideas all stem from an understanding that when you go to port an application from a touch enabled world you need to think about 2 things primarily,

  • Context
  • Automation

Context – How can you reduce and scope the options to the user so that a broad array of options can be presented to the user – but with only a few usable at any given time with a small vocabulary of motions.

An example from Angry Birds is all the options the user can perform in the game:

  • Fire Bird
  • Activate Bird Special Attack
  • Restart
  • Pan
  • Zoom
  • Return to the Menu

We needed to find a way to contextualize these options when moving away from a mouse driven style interface.

Automation – Find the items in the application that everyone does without thinking about it and automate them.  If they aren’t relevant to game play find a way to make them irrelevant in a NUI application.

An example from Angry Birds is activating the slingshot.  You probably don’t think about it when playing the game but to fire a bird you must first place your finger on the slingshot before drawing back the bird to fire it.  While this is unbelievably trivial to the point of not thinking about it on an iPhone, it’s a huge pain in an environment where you have to get a virtual hand cursor over it, even more so if you then need to push forward to activate it.

So we needed to find a way to automate clicking the slingshot.  That way instead of clicking the slingshot explicitly it would be implicitly activated by performing some gesture to begin the act of firing a bird, that would be disconnected from the onscreen location of the hand relative to the slingshot on the screen.

#3 Arclight

The Idea
You would draw back the slingshot by bringing your hands together.  Then bring your hands apart and rotate them around your center of mass to change the firing angle on the slingshot.  Then once you’ve settled on the firing angle, bring your hands together fast to trigger the fire.

The Reality
The problem with this kind of activation of the slingshot was the drift when the hands come together.  This can be partially accounted for but it’s heuristically based and can be erroneous.  Additionally activating the bird’s special power was difficult.  You would have to choose a different kind of interaction to activate the special power which would complicate the process.

#2 Stretch and Snap

The Idea
This idea grew out of the Arclight firing system.  To attempt to solve the problem of drift, have the slingshot fire as soon as the arms reach a certain distance apart.

The Reality
With this firing system you still have the problem of determining how to activate the bird special power.  You also introduce a new problem – all birds are fired at maximum drawback.  You also need to make sure to provide the user with feedback so that they know how close the user is to the *snap*, some kind of progress firing bar.

#1 Axis Separated

The Idea
For this idea I separated the functions of the hands into distinct responsibilities.  Your left hand activates the slingshot by pushing forward (doesn’t matter where).  After a threshold is crossed the slingshot is activated, from then on an angle is calculated between the shoulder location and the left hand’s location relative to it, to produce the slingshot firing angle.  To fire the bird the right hand is pushed forward and pulled back, this sends the bird flying.  To activate the bird’s special power you again push the right hand forward.

The Reality
This method ended up working perfectly.  It doesn’t result in any drift when firing is activated. It is also easy to perform because all the motions can be performed with your arms down by your sides, reducing exhaustion in long game play sessions.

Lessons Learned

You hear it all the time but it is critically important to prototype ideas when it comes to creating Kinect controls.  They simply don’t work as well as you would like in reality as they do in your head. Here’s a demonstration video of the end result,

YouTube Preview Image

Kinect, Anthropometry and You

Anthropometry (Greek anthropos (άνθρωπος – "man") and metron (μέτρον – "measure") therefore "measurement of man") refers to the measurement of the human individual.

Ask any Kinect developer what the hardest problem is developing a game or application that uses the Kinect – or any other depth camera.  The answer you’ll get most often will be creating something that works well for 95% of target users.

This is something you have to consider for all your gestures, poses, and UI interaction.

  • Is this gesture too difficult for your average user?
  • Does this pose require too much flexibility?
  • Is this UI interaction comfortable and easy to perform?

One area that can benefit from anthropometry data is UI interaction.  When you think about UI interaction with Kinect you’ve got to picture it as a real world space (box, sphere, cylinder, other) located somewhere around the body that you’re mapping the hand position in that space to the 2D or 3D UI coordinate plane.

When determining the size and location of this real world space and how it maps user hand locations onto the UI coordinate system the largest question you need to consider is:

Where will they be most comfortable?

Generally speaking you want a space that minimizes upper arm movement – as that is much more strenuous compared to forearm movements.

However, since the 3d position we’re mapping into our 2d/3d UI coordinate system varies based upon user skeleton size we can’t choose a single set of real world dimensions that will work for everyone.  We’ll have to make educated guesses about the size and location of our real world UI coordinate frame based on size and location of the users bones.

So how does anthropometry fit in?

Because the skeleton you get from Kinect – and other SDKs can be unreliable in certain poses you often find yourself heavily filtering any kind of data you’re tracking about the user.  Especially things like the user’s arm length – which can vary dramatically over a session.

So one thing I prefer to do is use anthropometry tables to ensure a more consistent size and location and doesn’t fluctuate as much as the user’s skeleton.  Using anthropometry tables we can estimate the users arm length or hand size based on other bones in their body, bones that are more stable in your skeleton SDK of choice (Kinect, OpenNI, Iisu, Omek…etc).

You can also use anthropometry tables to estimate the size of body parts that the skeleton SDK you’re using doesn’t provide – such as the size of the users hand.

But where do you find that kind of anthropometry data?

Luckily such a resource has already been painstakingly cataloged for us by the FAA – The Human Factors Design Guide.  The HFDG was put together so that planes could be constructed so that almost anyone would fit and be able to operate anything from their seat.

The anthropometry data that’s valuable to us starts in chapter 14, page 791.  For example, these lovely tables from page 818 show the functional reach and the extended functional reach of men and women broken down by population percentiles.

hfdg_reach

Making Your Files Merge Friendly

Merging is a fact of life for most of us.  Eventually two users touch the same files and a conflict must be resolved.  For programmers merging is a daily activity, but what about the content creators?

If two artists touch the same level, have your tools programmers made enough of an effort to make merging possible and if possible as painless as can be?

Here’s a handy checklist,

1 – Use XML/JSON (or some other text based file format)

It’s slower, it’s bigger but it’s going to make merging possible.  If your level files are binary blobs merging without a custom tool just isn’t going to be possible.  Using XML or JSON are the simplest text based alternatives because there are already many libraries for reading and writing them.

2 – For XML – Give Each Attribute a Line

If you’re using XML, you should make sure to write out the files such that every element and every attribute receives its own line.  If you don’t do this it will make conflicts a lot more likely if two users touch the same object.

If you’re using JSON, having each attribute on its own line is the norm unless the library is attempting to keep the text compact.

Here’s a quick example, note that after separating the attributes the merge tool handles the conflict correctly while it fails to do so when they are on the same line.

nonewline
Figure 1 – Attributes all on the same line

newline
Figure 2 – Attributes on different lines

If you’re using .Net this can be achieved very easily by changing the default XmlWriterSettings with the NewLineOnAttribute property set to true.

C#
var settings = new XmlWriterSettings()
{
    NewLineOnAttributes = true
};
 
using (XmlWriter writer = XmlWriter.Create(filestream, settings))
{
    // Write out document...
}

3 – Maintain Order

Always write out the order of the data the same.  This one is pretty easy, the only real pitfall is to make sure your data structure and undo/redo system are working correctly.  For example, if you happen to store your objects in a list and someone removes and object at the front, but when it’s undone it’s inserted at the end of the list (Even if the UI doesn’t reflect this) this could affect your output order.

4 – Don’t Add New Objects At The End

When you go to save the entities don’t write them out in the order they were created.  This will definitely result in a merge conflict.  Because it ensures that if two users edit the file and both add an entity they’ll both show up in the same location and confuse the merge tool.

One thing you could do is to sort them based on a GUID that is stored as part of the objects data.  Sorting based on GUID ensures a lower probability of collisions occurring when two users both add objects.

Alternatively a sorting based on a string containing the machine name of the original creator of the object is another idea.  It would ensure every user touching the file would be inserting to their own section of the file instead of everyone inserting to the tail.

Robust Inside and Outside Solid Voxelization

complete_polygonal_scene_voxelization

While wrapping up my post for generating simplified occluders for Hierarchical Z-Buffer Occlusion Culling, I was pointed to a paper called Complete Polygonal Scene Voxelization.  Afterwards I found time to read it thoroughly and implement it as a replacement for my existing ray casting based solid voxelization method.

The problem with the solid voxelization technique I was using previously was that it used ray casting; making it impossible to perform solid voxelization unless the mesh is watertight in addition to having no anomalies like intersecting geometry.

However, that restriction makes it an unrealistic solution in the real world because game art typically has holes in the locations players never see; such as the bottom cap on a building, which is rarely modeled.

The New Solution

The Complete Polygonal Scene Voxelization paper’s solution to voxelizing a scene is pretty clever; It applies a heuristic model to the problem of determining the inside/outside status of each voxel or octree cell.  Allowing it to overcome the problem of holes and intersecting geometry making it suitable as a real world solution.

How It Works

bunny_octree_3

You can download the paper and read it for yourself, but let me go ahead and summarize the algorithm for brevity’s sake so that the rest of the article makes sense.

The algorithm takes place in 3 stages:

  1. Create Octree
  2. Find Seed Cell
  3. Propagate Seed Cell

Create Octree

First you create an octree around the mesh that continues to subdivide each cell until either the cell no longer intersects with any triangle or a maximum depth is reached.  A typical maximum octree depth that will work for most meshes is 5.  If the mesh has some exceptionally thin walls that you want the cells to be small enough to fill you may need to go as high as 7 or 8.

I was having some problems with the GPU AABB/Triangle overlap test I used for voxelization in the Hierarchical Z-Buffer Occlusion Culling article and so I ended up porting the Möller implementation of AABB/Triangle overlap test to C# and just used it instead.

Also if you ever need to lookup how an intersection is performed I highly recommend the gigantic matrix of intersections over at realtimerendering.com.  It was a handy resource since I don’t keep the algorithm for AABB/Triangle overlap stored in my brain.

After you’ve created the octree we need to process each cell that wasn’t intersecting with a triangle to determine if it is inside or outside.

Find Seed Cell

Before we can determine if a cell is inside or outside the mesh we need to find a seed cell.  The seed cell is sort of the ground truth example cell that we use to propagate its status to the other cells that it can see.  The seed cell’s status is determined by rendering a cube map centered inside the cell with the near plane placed at the cell edge.

When rendering each side of the cube map, you render the scene such that all front facing polygons are blue and all back facing polygons are red.  You then read back each cube map surface from the GPU and determine the percentage of red and blue pixels seen at each face.

If at least 4 sides of the cube map contain red pixels, the cell is determined to be inside the mesh.

The paper says that NO red pixels can be seen for a seed cell to be determined to be outside however I found this problematic since occasionally a red pixel can be seen just through tiny rendering artifacts.

So I feel a better solution is one like the following:

C#
MIN_INSIDE_FACES = 4;
MIN_INSIDE_PERCENTAGE = 0.03f;
 
int cubemap_sides_seeing_inside = 0;
 
for (int i = 0; i < 6; i++)
{
    RenderCubeMapSide(i);
    float backfacePercentage = CalculateBackfacePercentage(i);
 
    if (backfacePercentage > MIN_INSIDE_PERCENTAGE)
        cubemap_sides_seeing_inside++;
}
 
if (cubemap_sides_seeing_inside >= MIN_INSIDE_FACES || cubemap_sides_seeing_inside == 0)
{
    if (cubemap_sides_seeing_inside >= MIN_INSIDE_FACES)
        cell.Status = CellStatus.Inside;
    else // cubemap_sides_seeing_inside == 0
        cell.Status = CellStatus.Outside;
 
    // Propagate cell status...
}
else
{
    // Unable to solve status exactly.
}

Where you don’t count a face as inside unless at least 3% of the total red and blue pixels are red.  The percentage is just something I picked out of thin air, it feels like a number small enough to be easily overcome by any truly inside cube face, but high enough to allow me to ignore tiny artifacts.

Propagate Seed Cell

The last step is to propagate the seed cell’s status to other cells.  After classifying a seed cell you need to test every unknown cell against each the depth map and frustum of each cube map surface.

You’re performing a test to see if any of the 8 corners of the octree cell when projected into the camera space of each cube face are closer than the depth value at that pixel.  If it is, then then the entire octree cell likely visible.

If the cell is visible from the seed cell, then in all likelihood the cell has the same status as the seed cell.  However because having holes means 2 seed cells (one inside and one outside) can potentially see the same cell you want several seed cells to confirm the status of a cell before committing to it.

So once you’ve determined a cell is visible from the seed cell you’ll increment a counter on the cell for that status.  Once one of the statuses reaches a threshold, like for example 16 you’ll change the status of the cell from Unknown to whatever status counter overcame the threshold and no longer process that cell.

It should be noted that only seed cells propagate their status.  Cells that you propagate to do not propagate their own status.

Repeat

After you’ve found a seed cell and propagated its status you’ll continue to repeat finding a seed cell and propagating the seed cell until all cells have a status of inside, outside or intersecting.  You can rarely end up with some cells whose status simply can’t be determined so make sure your code can handle that scenario and not loop forever.

Improvements

While implementing the paper I made some additional improvements to the proposed solution.  I sped up the process by taking advantage of hardware improvements to render the scene using a single pass.  I also improved the conservativeness of the algorithm in situations where you’re using square voxels. When a mesh is wider than it is tall in those sitations there will be padding below the mesh; if the bottom of the mesh is uncapped it can lead to inside cells ‘leaking’ their status outside.

Single Pass Rendering

The paper was published back in 2002 and due to limitations at the time the simplest method of rendering front faces one color and back faces another was to render the scene twice and flip the winding order and the color of the triangles being rendered.  However this method is slower than just using a simple pixel shader to change the color of front/back faces.

In OpenGL you can use gl_FrontFacing and in DirectX 11 you can use SV_IsFrontFace.

GLSL
void main()
{
     gl_FragColor = gl_FrontFacing ? vec4(0,0,1,1) : vec4(1,0,0,1);
}

Intersect Mesh Bounds and Clip To Bounds

One problem I found is that when a mesh (like a building) is uncapped at its base but is wider than it is tall there will be several cells below the base of the mesh.  Cells that will have the status of inside spread to them, even though a human could easily see those cells are outside.

So one improvement I ended up adding is that when testing for triangle intersection, you should also test against intersection against the mesh bounds.  Additionally you immediately mark a cell as Outside if is outside the mesh bounds, since it simply is not possible for that cell to be inside the mesh, but don’t treat that cell like it’s a seed cell; just mark it as outside and move on.

I needed some real game art to properly test the solution so I exported a roof structure from the Necropolis map from UDK’s UTGame sample.  Here you can really see the difference it makes to clip to the bounds of the mesh.  Note how many additional voxel/octree cells (purple lines) are determined to be ‘inside’ because of how many backfaces (red triangles) they can see.

udk_necropolis_roof_not_fixed
Figure 1 – Roof Inner Voxelization Not Bounded (Before)

udk_necropolis_roof_fixed
Figure 2 – Roof Inner Voxelization Bounded (After)

Future Improvements

When the cube map for each seed cell is processed it’s read back from the GPU and each pixel is checked on the CPU.  This is wildly inefficient when all we care about is the percentage of red to blue pixels on each face.

So that processing can be moved to OpenCL to improve the performance significantly.  I would prefer to have all cells be seed cells since that makes it easier to define the rules of what cells are inside vs. outside.  Allowing the cells to propagate has a higher potential to cause problems I suspect on meshes with very nasty artifacts.  Giving each cell the ability to individually determine their status will be more stable and more predictable.

Currently my cube map rendering is performed in 6 passes.  Shifting over to a single pass method using the geometry shader will likely add additional speed improvements, but I don’t know for sure.

If I move enough of the processing to the GPU it may allow me to make more cells seed cells (perhaps all?) and still maintain an acceptable performance footprint for tool time usage.  For offline usage though this method is already very acceptable (a few seconds for an average mesh and a maximum depth of 5) even with all the CPU read-backs I’m performing.

Sample Code

I’m still working on an improved version of the Generating Occluders for Hierarchical Z-Buffer Occlusion Culling sample.  So the code is a bit tied up at the moment.  However the next post I do on generating the occluders will contain it, which should be soon.

Robust Inside and Outside Solid Voxelization @ altdevblogaday.com