« Posts under Programming

Robust Inside and Outside Solid Voxelization

complete_polygonal_scene_voxelization

While wrapping up my post for generating simplified occluders for Hierarchical Z-Buffer Occlusion Culling, I was pointed to a paper called Complete Polygonal Scene Voxelization.  Afterwards I found time to read it thoroughly and implement it as a replacement for my existing ray casting based solid voxelization method.

The problem with the solid voxelization technique I was using previously was that it used ray casting; making it impossible to perform solid voxelization unless the mesh is watertight in addition to having no anomalies like intersecting geometry.

However, that restriction makes it an unrealistic solution in the real world because game art typically has holes in the locations players never see; such as the bottom cap on a building, which is rarely modeled.

The New Solution

The Complete Polygonal Scene Voxelization paper’s solution to voxelizing a scene is pretty clever; It applies a heuristic model to the problem of determining the inside/outside status of each voxel or octree cell.  Allowing it to overcome the problem of holes and intersecting geometry making it suitable as a real world solution.

How It Works

bunny_octree_3

You can download the paper and read it for yourself, but let me go ahead and summarize the algorithm for brevity’s sake so that the rest of the article makes sense.

The algorithm takes place in 3 stages:

  1. Create Octree
  2. Find Seed Cell
  3. Propagate Seed Cell

Create Octree

First you create an octree around the mesh that continues to subdivide each cell until either the cell no longer intersects with any triangle or a maximum depth is reached.  A typical maximum octree depth that will work for most meshes is 5.  If the mesh has some exceptionally thin walls that you want the cells to be small enough to fill you may need to go as high as 7 or 8.

I was having some problems with the GPU AABB/Triangle overlap test I used for voxelization in the Hierarchical Z-Buffer Occlusion Culling article and so I ended up porting the Möller implementation of AABB/Triangle overlap test to C# and just used it instead.

Also if you ever need to lookup how an intersection is performed I highly recommend the gigantic matrix of intersections over at realtimerendering.com.  It was a handy resource since I don’t keep the algorithm for AABB/Triangle overlap stored in my brain.

After you’ve created the octree we need to process each cell that wasn’t intersecting with a triangle to determine if it is inside or outside.

Find Seed Cell

Before we can determine if a cell is inside or outside the mesh we need to find a seed cell.  The seed cell is sort of the ground truth example cell that we use to propagate its status to the other cells that it can see.  The seed cell’s status is determined by rendering a cube map centered inside the cell with the near plane placed at the cell edge.

When rendering each side of the cube map, you render the scene such that all front facing polygons are blue and all back facing polygons are red.  You then read back each cube map surface from the GPU and determine the percentage of red and blue pixels seen at each face.

If at least 4 sides of the cube map contain red pixels, the cell is determined to be inside the mesh.

The paper says that NO red pixels can be seen for a seed cell to be determined to be outside however I found this problematic since occasionally a red pixel can be seen just through tiny rendering artifacts.

So I feel a better solution is one like the following:

C#
MIN_INSIDE_FACES = 4;
MIN_INSIDE_PERCENTAGE = 0.03f;
 
int cubemap_sides_seeing_inside = 0;
 
for (int i = 0; i < 6; i++)
{
    RenderCubeMapSide(i);
    float backfacePercentage = CalculateBackfacePercentage(i);
 
    if (backfacePercentage > MIN_INSIDE_PERCENTAGE)
        cubemap_sides_seeing_inside++;
}
 
if (cubemap_sides_seeing_inside >= MIN_INSIDE_FACES || cubemap_sides_seeing_inside == 0)
{
    if (cubemap_sides_seeing_inside >= MIN_INSIDE_FACES)
        cell.Status = CellStatus.Inside;
    else // cubemap_sides_seeing_inside == 0
        cell.Status = CellStatus.Outside;
 
    // Propagate cell status...
}
else
{
    // Unable to solve status exactly.
}

Where you don’t count a face as inside unless at least 3% of the total red and blue pixels are red.  The percentage is just something I picked out of thin air, it feels like a number small enough to be easily overcome by any truly inside cube face, but high enough to allow me to ignore tiny artifacts.

Propagate Seed Cell

The last step is to propagate the seed cell’s status to other cells.  After classifying a seed cell you need to test every unknown cell against each the depth map and frustum of each cube map surface.

You’re performing a test to see if any of the 8 corners of the octree cell when projected into the camera space of each cube face are closer than the depth value at that pixel.  If it is, then then the entire octree cell likely visible.

If the cell is visible from the seed cell, then in all likelihood the cell has the same status as the seed cell.  However because having holes means 2 seed cells (one inside and one outside) can potentially see the same cell you want several seed cells to confirm the status of a cell before committing to it.

So once you’ve determined a cell is visible from the seed cell you’ll increment a counter on the cell for that status.  Once one of the statuses reaches a threshold, like for example 16 you’ll change the status of the cell from Unknown to whatever status counter overcame the threshold and no longer process that cell.

It should be noted that only seed cells propagate their status.  Cells that you propagate to do not propagate their own status.

Repeat

After you’ve found a seed cell and propagated its status you’ll continue to repeat finding a seed cell and propagating the seed cell until all cells have a status of inside, outside or intersecting.  You can rarely end up with some cells whose status simply can’t be determined so make sure your code can handle that scenario and not loop forever.

Improvements

While implementing the paper I made some additional improvements to the proposed solution.  I sped up the process by taking advantage of hardware improvements to render the scene using a single pass.  I also improved the conservativeness of the algorithm in situations where you’re using square voxels. When a mesh is wider than it is tall in those sitations there will be padding below the mesh; if the bottom of the mesh is uncapped it can lead to inside cells ‘leaking’ their status outside.

Single Pass Rendering

The paper was published back in 2002 and due to limitations at the time the simplest method of rendering front faces one color and back faces another was to render the scene twice and flip the winding order and the color of the triangles being rendered.  However this method is slower than just using a simple pixel shader to change the color of front/back faces.

In OpenGL you can use gl_FrontFacing and in DirectX 11 you can use SV_IsFrontFace.

GLSL
void main()
{
     gl_FragColor = gl_FrontFacing ? vec4(0,0,1,1) : vec4(1,0,0,1);
}

Intersect Mesh Bounds and Clip To Bounds

One problem I found is that when a mesh (like a building) is uncapped at its base but is wider than it is tall there will be several cells below the base of the mesh.  Cells that will have the status of inside spread to them, even though a human could easily see those cells are outside.

So one improvement I ended up adding is that when testing for triangle intersection, you should also test against intersection against the mesh bounds.  Additionally you immediately mark a cell as Outside if is outside the mesh bounds, since it simply is not possible for that cell to be inside the mesh, but don’t treat that cell like it’s a seed cell; just mark it as outside and move on.

I needed some real game art to properly test the solution so I exported a roof structure from the Necropolis map from UDK’s UTGame sample.  Here you can really see the difference it makes to clip to the bounds of the mesh.  Note how many additional voxel/octree cells (purple lines) are determined to be ‘inside’ because of how many backfaces (red triangles) they can see.

udk_necropolis_roof_not_fixed
Figure 1 – Roof Inner Voxelization Not Bounded (Before)

udk_necropolis_roof_fixed
Figure 2 – Roof Inner Voxelization Bounded (After)

Future Improvements

When the cube map for each seed cell is processed it’s read back from the GPU and each pixel is checked on the CPU.  This is wildly inefficient when all we care about is the percentage of red to blue pixels on each face.

So that processing can be moved to OpenCL to improve the performance significantly.  I would prefer to have all cells be seed cells since that makes it easier to define the rules of what cells are inside vs. outside.  Allowing the cells to propagate has a higher potential to cause problems I suspect on meshes with very nasty artifacts.  Giving each cell the ability to individually determine their status will be more stable and more predictable.

Currently my cube map rendering is performed in 6 passes.  Shifting over to a single pass method using the geometry shader will likely add additional speed improvements, but I don’t know for sure.

If I move enough of the processing to the GPU it may allow me to make more cells seed cells (perhaps all?) and still maintain an acceptable performance footprint for tool time usage.  For offline usage though this method is already very acceptable (a few seconds for an average mesh and a maximum depth of 5) even with all the CPU read-backs I’m performing.

Sample Code

I’m still working on an improved version of the Generating Occluders for Hierarchical Z-Buffer Occlusion Culling sample.  So the code is a bit tied up at the moment.  However the next post I do on generating the occluders will contain it, which should be soon.

Robust Inside and Outside Solid Voxelization @ altdevblogaday.com

Published Achievement Unlocked!

image

The September 2011 issue of Game Developer Magazine contains an article of mine!  They asked me to write an article for the “Inner Product” section summarizing the Hi-Z GPU Culling algorithm as well as the related research I started working on to automatically generate occluders for the runtime.

My copy hasn’t yet arrived in the mail so I don’t know how the final version turned out, but the last draft I saw after the good folks at GD Mag spruced up the language in my article was looking really good.

It was a lot of work putting the article together, more than I actually thought going into it.  Especially since I already had 3 blog entries to pull from.  However, you find yourself second guessing everything you write because it’s not like a blog entry that you can go back and edit if there’s a mistake.

It also becomes a bit more of a challenge to summarize things that you simply can’t add a link to and tell the reader, go read this.  Also, not being able to just insert a huge chunk of code makes explaining something simple often a lot harder to explain in words.  Hopefully you’re good at making diagrams in those situations.

It was an interesting experience.  Really glad I got the opportunity to contribute an article.

SWIG: Casting Revisited

A while back I wrote SWIG and a Miss, which is a post about several of the problems I’ve encountered with SWIG. At that time I didn’t have a solution for dealing with down casting – the process of casting from a base class to a more derived class.

I ended up coming up with a solution that I thought would be good to do a post on, since there doesn’t seem to be much out there on SWIG and these types of problems.

Some Background

For those unfamiliar with why it’s difficult, let me explain. When your API returns a native pointer to be wrapped, of lets say class Foo, and your function returns Foo* but the instance it returns is actually the more derived version Bar. Sadly SWIG has no way of knowing this, so the object it creates in C# would be a C# Foo class. Therefore, if you tried to cast the C# Foo to the more derived C# Bar you’d be unable to because as far as .Net is concerned the instance of Foo is just a Foo and nothing more.

Alright now that you understand the problem, lets talk about how we can solve it.

How We Can Solve It

The solution to the problem can be solved in essentially 3 ways,

1 // Walk Away

Simply don’t do anything that would require it. Don’t expose classes that users will ever need to downcast to for any reason. If this option is available to you, take it. This however means that your base class needs to support all the functionality a user will ever need, something not always possible or desirable. But if you can, do it, because once you get to script land, down casting becomes a expensive process.

2 // Manual Cast Wrapper

Write or generate C++ functions for every downcast that you would like to perform and SWIG those functions. This would allow you to take object Foo as a parameter, returns Bar, does the dynamic cast in C++ and returns the pointer to Bar.

This option is manpower intensive and is truly brute forcing a solution. It also causes potential holes where nasty aliasing occurs and can become a real problem if the source object is ever deleted. Imagine an API with a factory function that returns object Foo, but Foo is actually Bar due to the way the factory works. You need to access a Bar specific only function on the object so you downcast Foo to Bar.

Now in C++ these objects are the same object, but SWIG doesn’t know this, so two completely different C# objects are created one wrapping the native pointer to Foo, and one wrapping the native pointer to Bar. If the SWIG layer was properly authored, the factory function notes that the object returned was created with new memory and thus SWIG is responsible for deleting the object.

So we call our Bar specific function and we decide to hang onto Bar and we let Foo on the stack pass out of scope, because why hang onto it, it’s the same object as Bar…right? But Foo was responsible for managing the memory for the common native object they both point to. So the underlying native object is destroyed when the garbage collector runs next after Foo is no longer referenced and now our Bar object points to totally bogus memory.

3 // Custom Solution

The solution I ended up coming up with requires you to have a custom RTTI system that allows you to check if an object is a specific type by name.

The first step is to extend the C# wrapper for your base class with RTTI information. If you don’t have a base RTTI object this will be a bit harder. The reason we have a new m_castedSource member variable is to solve the aliasing problem. After the cast we can store the source here so that we never have to worry about the garbage collector accidentally cleaning up memory we’re still using.

C#
%typemap(cscode) YourBaseRTTIObject
%{
    internal YourBaseRTTIObject m_castedSource;
 
    public bool IsKindOf<T>()
    {
        // Use your C++ RTTI system to test if this wrapped C# object's
        // native object is a 'typeof(T).Name'
    }
%}

The next step is to create a function that can create the C# method needed to factory the C# object. We could use reflection, but that method is slow and we can do a far better job by emitting IL at runtime to specifically initialize one type and just cache those created methods.

C#
private static Dictionary<Type, Func<IntPtr, bool, object>> constructorCache =
    new Dictionary<Type, Func<IntPtr, bool, object>>();
 
private static Func<IntPtr, bool, object> CreateConstructorDelegate<T>()
{
    // We need to first grab the reflection information for the more derrived type we're casting to.
    // SWIG has protected constructors we'll be able to call that take 2 parameters, the native pointer
    // to the type and whether or not SWIG owns the memory
    ConstructorInfo ctor = typeof(T).GetConstructor(
        BindingFlags.Instance | BindingFlags.NonPublic, null,
        CallingConventions.HasThis, new Type[] { typeof(IntPtr), typeof(bool) }, null);
 
    // Lets emit some IL that will allow us to construct our type much
    // faster than using reflection.
    DynamicMethod dm = new DynamicMethod("Create", typeof(object),
        new Type[] { typeof(IntPtr), typeof(bool) }, typeof(T), true);
    ILGenerator il = dm.GetILGenerator();
    il.Emit(OpCodes.Ldarg_0);
    il.Emit(OpCodes.Ldarg_1);
    il.Emit(OpCodes.Newobj, ctor);
    il.Emit(OpCodes.Ret);
 
    // Create the delegate for the dynamic method, which will further improve the calling speed.
    return (Func<IntPtr, bool, object>)dm.CreateDelegate(typeof(Func<IntPtr, bool, object>));
}

After that we need to add a function that can perform the down cast. This is just a matter of checking if the object is actually the desired casting type. If it is we get or create the factory delegate for the type. We construct the new wrapper object and hand it the pointer for the type. (NOTE: This is only viable if the new type isn’t offset in the vtable, as would be the case with multiple inheritance).

C#
public static T CastTo<t>(YourBaseRTTIObject self) where T : YourBaseRTTIObject
{
    if (self == null)
        return null;
 
    // Check if the object is actually the type we're trying to cast to.
    if (self.IsKindOf<t>())
    {
        Type type = typeof(T);
 
        // Check if we've already cached the factory function.
        Func<IntPtr, bool, object> factory;
        if (!constructorCache.TryGetValue(type, out factory))
        {
            factory = CreateConstructorDelegate<t>();
            constructorCache.Add(type, factory);
        }
 
        // Call the factory function and set the casted source to the
        // current object.
        T castedObject = (T)factory(YourBaseRTTIObject.getCPtr(self).Handle, false);
        castedObject.m_castedSource = self;
 
        return castedObject;
    }
    return null;
}

Terraria: The Auto-Saving Server

After beating Minecraft (not really) I thought my addition to these blasted sandbox mining games was satisfied.  Sadly, I was wrong.  I started playing Terraria a few days ago and its got many more game elements than Minecraft.  It’s also wildly addicting if you’ve got a sever case of ‘Got’a’ Catch’em Alls’.  It’s also possibly the real world equivalent of Heroin Hero.

In any event, I setup a server for my friends and I to play only to discover that the server does not auto-save the world.  So if it crashes or is shutdown improperly, everything in the world resets.

So I wrote a little bit of code to fix it and decided to share.  It’s a simple bootstrapping program, you give it the file path of the Terraria server executable as the first argument and then all the other arguments to pass to it as a single string.  It then sends save commands to the server every 10 minutes.

If you want to just download a compiled version for yourself, you can download it here.

If you were to place it in the same directory as the TerrariaServer, you’d run it like this:

Text
TerrariaAutosave.exe TerrariaServer.exe "-config serverconfig.txt"

P.S. You never catch the dragon.

Hierarchical Z-Buffer Occlusion Culling – Generating Occlusion Volumes

volume_cat

Almost a year ago I wrote a couple of posts on Hierarchical Z-Buffer Occlusion Culling (HZB-OC).

One very glaring issue I left largely untouched was the workflow issue.  Artists currently have to author all or a large percentage of the volumes by hand in the implementations I’m aware exist.  Ideally these simplified occlusion volumes would be generated by a tool in the art pipeline instead.

A few weeks ago I started looking for a new side project and decided to try and flesh-out a very rough idea I had for automatically generating occlusion volumes.  This is still a work in progress but I wanted to share my current progress because I think it has some value even at this early stage.

The Problem

An ideal occlusion volume has some important features,

  1. Conservativeness – Doesn’t extend beyond the surface of the mesh
  2. Simplicity – The occlusion volume is made of very few triangles or is fast to render
  3. Volume Conservation – Closely matches the original mesh’s volume
  4. Dynamic – Some games have large movable occluders or destroyable walls

Normal methods of simplifying a mesh such as typical triangle simplification causes problems in both the area of conservativeness and volume conservation.  In some cases you can use the physics collision mesh if available.  One thing to be aware of is that when the physics volume is larger than the visual mesh it can cause false occlusions leading to things popping into view.  Also not every mesh needs a physics volume nor are physics volumes always meshes.  Will Vale’s talk from GDC 2011 covered using physics collision meshes pretty thoroughly for HZB-OC, you should check it out.

The Technique

Let me start by summarizing the technique I’ve been developing for generating the occlusion volumes before I go in-depth into each step in the process.

  1. Find all the voxels completely inside a mesh
  2. Find the voxel at the densest point in the volume
  3. Expand a box from this point until all sides collide with the mesh surface or another box
  4. Repeat 2-3 until you’ve consumed X% of the total volume
  5. Filter small or useless boxes (Unimplemented)
  6. Use a Constructive Solid Geometry (CSG) algorithm to merge the boxes you create

1. Voxelization

surface_solidFirst you have to determine all the voxels completely inside the mesh.  That way we can have complete confidence that anything we generate contained inside these voxels will be conservative.  It also gives us a very easy way of quantifying the total volume and the volume remaining.

The voxelization algorithms I ended up using comes from this paper Fast Parallel Surface and Solid Voxelization on GPUs [Schwarz 2010].  I used the 3.1 section for surface voxelization and the 4.1 section for solid voxelization.  Both are required because a voxel is considered part of the solid volume if the center of the voxel is inside the mesh.  We want the entire voxel to be inside the mesh though.  So you have to remove the excess voxels from the solid set using the surface set.

My first implementation was in C++ and entirely on the CPU which took about 20 seconds to run on a mesh with ~10,000 triangles and a 503 voxel volume.  In the version I’m presenting here I moved to C#, when I did that time went up to about 60 seconds.  So I ended up porting a similar version of the CPU implementation to the GPU using OpenCL (Cloo), on the same data it ran in about 6 seconds on my Nvidia 540M, with no attempt to optimize.

One unfortunate limitation of Schwarz’s solid voxelization algorithm it that it requires watertight meshes.  The reason for this is you’re shooting a ray down a column of voxels until you intersect with a triangle.  You’re then xor’ing the bit for each voxel signaling that this column of voxel’s is inside or outside.  Voxels that are inside a mesh will be Xor’ed an odd number of times, where as outside voxels will be xored an even number of times.  So if there is a hole in the mesh, you’d leave a incomplete trail of xor’s in that column, leaving every voxel from the last triangle to the bounding box marked as ‘inside’.

Two papers I’ve run across that might contain solutions to this problem are An Automatic Hole-Filling Algorithm for Polygon Meshes and Complete Polygonal Scene Voxelization.

2. Find The Highest Density Voxel

In this step you’ll find the voxel that is furthest away from any external area, and excluding any voxel that has already been enclosed by a box in step 3.  Because the number of empty voxels likely far exceeds the number of voxels touching an empty voxel, you’ll probably want to test against that list instead of determining the distance from the closest empty voxel.

For my project this was implemented on the CPU, but this is probably something you could do faster on the GPU if this ever became too slow for your needs.

3. Box Expansion

Once you’ve found the densest voxel you’re going to create a 13 voxel sized box at that location.  You’ll then proceed to iteratively expand each side of the box in voxel space until you can’t expand any side of the box without entering an empty voxel or another box that has already been placed.

As you verify each expansion of the box you’ll mark the enclosed voxels in such a way so that the next time you choose the densest voxel you can exclude any already enclosed in a box.

Currently the expansion order is uniform, however, I suspect a better approach would be some small amount of prediction allowing me to grow one side more than another to maximize the potential volume of each box. 

The best implementation of this AABB expansion methodology would be to use an optimization solver, but that could potentially take quite awhile to run.

I’m still looking for alternatives to the AABB box expansion method of generating the simplified geometry. The axis aligned boxes aren’t the best for models at non-right angles.  Perhaps a better approach could be divined with OBBs but that’s just conjecture.

I briefly considered the Iso-Surface method Xi Wang mentioned in his Automated Level of Detail Generation for HALO: REACH talk at GDC 2011 but that generates a lot of triangles and a volume larger than the voxel volume you start with. So that just gets us right back to the problem we started with.

4. Repeat 2-3

I ended up having two different stopping conditions.  If an absolute percentage of the remaining volume is met I stop adding new boxes.  Alternatively if the last box created consumed too small a percentage of the total volume that also stops the box expansion process.

Some other things I didn’t implement but that could be tried are, setting a maximum number of boxes, or a box too small (in voxel space, instead of percentage) as a cutoff point.

5. Filter Boxes

This portion of the technique I haven’t tried but makes a lot of sense depending on your stopping condition.  If several small boxes are generated before one of the cutoff conditions is met during step 4, you’ll probably want to ignore them before moving to step 6 to reduce your occluder triangle count.

6. Merge Boxes With Constructive Solid Geometry

Drawing individual boxes even as a single draw call would cause lots of overdraw.  Instead we should take our collection of boxes and combine them into a single mesh before we finish.  To do this we’re going to use Constructive Solid Geometry (CSG).

Soon after I realized I was going to need CSG (or something akin to it) to solve this problem I was informed of a new book, Game Development Tools.  In that book there is an implementation for Real-Time Constructive Solid Geometry written by Sander van Rossen and Matthew Baranowski.  The implementation is pretty slick and I was able to get it up and running pretty quickly.

The Results: Time-Lapse

The box expansion runs in a fraction of a second, but I slowed it down and broke it up over several frames so that you could see how the occlusion shape evolves.

YouTube Preview Image YouTube Preview Image

The Results: Different Models

To get an idea of the results on different models I’ve provided some screenshots.  The settings I used for these was to cutoff after 90% of the volume had been consumed by boxes.

occluder_results_1 occluder_results_3
occluder_results_2 occluder_results_4

Notes

In situations where the technique doesn’t work very well allow an artist to specify a custom mesh.  In some cases an option to use the same mesh for both the visual and the occluder would probably be a good idea (like for planes).

To reduce draw calls you will likely want to generate the occluders during export time.  Then during the cook stage in your level editor, merge some number of the occluders together to reduce draw calls.

Download

works_on_my_machine
Voxelization Occlusion Generator.zip

License: MIT

Thanks

Thanks to Michael Noland and Shaun Kime for letting me bounce ideas off them.  Also thanks to Stephen Hill for telling me about Rossen / Baranowski’s Constructive Solid Geometry implementation and for giving me feedback on this post.