Duke Nukem Forever Lives!
The rumor has been floating around for awhile now that gearbox was going to be finishing DNF. I’m glad to see my decade+ of waiting might actually come to a glorious end
The rumor has been floating around for awhile now that gearbox was going to be finishing DNF. I’m glad to see my decade+ of waiting might actually come to a glorious end
There’s a pretty cool paper on Ambient Occlusion Volumes I was reading awhile back by Morgan McGuire, the technique actually uses volumes to calculate what pixels are in a region that needs to be shaded. So no more halos and anti-halos which is fantastic, the effect almost looks as good as ray-tracing and is several orders of magnitude faster
Though, the technique is a bit expensive for this generation of hardware; the Fermi cards from Nvidia appear to do a much better job, 35ms range for full quality. But if you don’t sample every pixel, you can get that number way down, 3-6ms range.
The reason I bring it up is Morgan has posted a sample implementation over on Nvidia’s research page here that’s worth taking a look at if the topic interests you.
I’ve released an updated version of my WPF ShaderEffect Generator tool on Codeplex. The release is fairly minor but I figured I’d update my blog anyway.
Changes:
Last week was the start of new things for me. I’ve moved on from Emergent and joined my good friends and fellow bloggers Dan and Shaun over at Activate3D. I’m a week in and very excited to be working on a fresh set of challenges.
I think we’re still running in silent mode so I can’t really post about anything I’m working on. I’ll just repeat what Dan said, we’re working on motion synthesis and animation blending
This is a follow-up to this post. After exchanging a few emails with Stephen Hill, he clarified how they were able to cull shadows from the scene using the Hierarchical Z-Buffer (HZB). The purpose of mentioning the paper CC Shadow Volumes, was just to point out the source that gave them the inspiration. Their solution to culling the shadows was actually really simple, I was kicking myself for not seeing it sooner.
I may try to throw together a shadow sample that demonstrates this, but it’s so straight forward I figured I would go ahead and post the explanation in case I never get around to coding up a version with this implemented in there.
Figure 1 Figure 2 |
While I was at GDC I had the pleasure of attending the Rendering with Conviction talk by Stephen Hill, one of the topics was so cool that I thought it would be fun to try it out. The hierarchical z-buffer solution presented at GDC barrows heavily from this paper, Siggraph 2008 Advances in Real-Time Rendering (Section 3.3.3). Though I ran into a fair number of issues trying to get the AMD implementation working, a lot of the math is too simplistic and does not take into account perspective distortions and the proper width of the sphere in screen space so you end up with false negatives.
You should read the papers to get a firm grasp of the algorithm, but here is my take on the process and some implementation notes of my own.

The downsampling is pretty much what you would expect, you take the current pixel, sample one pixel to the right, bottom and bottom right. You take the furthest depth value and use it as the new depth in the downsampled pixel. Here’s an example of a before and after version, black is a closer depth, the whiter a pixel is the further away / higher the depth value.
Before |
After |
The downsampling HLSL code looks like this:
float4 vTexels; vTexels.x = LastMip.Load( nCoords ); vTexels.y = LastMip.Load( nCoords, uint2(1,0) ); vTexels.z = LastMip.Load( nCoords, uint2(0,1) ); vTexels.w = LastMip.Load( nCoords, uint2(1,1) ); float fMaxDepth = max( max( vTexels.x, vTexels.y ), max( vTexels.z, vTexels.w ) );
Here’s the heart of the algorithm, the culling. One note, [numthreads(1,1,1)] is terrible for performance with compute shaders. Anyone planning to use this should do a better job of their thread group and thread management than I did. This is the DX11 compute shader version, I decided to use it here since it’s clearer what the intentions are. You’ll find the DX9 code in the full sample at the bottom of the post.
cbuffer CB { matrix View; matrix Projection; matrix ViewProjection; float4 FrustumPlanes[6]; // view-frustum planes in world space (normals face out) float2 ViewportSize; // Viewport Width and Height in pixels float2 PADDING; }; // Bounding sphere center (XYZ) and radius (W), world space StructuredBuffer<float4> Buffer0 : register(t0); // Is Visible 1 (Visible) 0 (Culled) RWStructuredBuffer<float> BufferOut : register(u0); Texture2D<float> HizMap : register(t1); SamplerState HizMapSampler : register(s0); // Computes signed distance between a point and a plane // vPlane: Contains plane coefficients (a,b,c,d) where: ax + by + cz = d // vPoint: Point to be tested against the plane. float DistanceToPlane( float4 vPlane, float3 vPoint ) { return dot(float4(vPoint, 1), vPlane); } // Frustum cullling on a sphere. Returns > 0 if visible, <= 0 otherwise float CullSphere( float4 vPlanes[6], float3 vCenter, float fRadius ) { float dist01 = min(DistanceToPlane(vPlanes[0], vCenter), DistanceToPlane(vPlanes[1], vCenter)); float dist23 = min(DistanceToPlane(vPlanes[2], vCenter), DistanceToPlane(vPlanes[3], vCenter)); float dist45 = min(DistanceToPlane(vPlanes[4], vCenter), DistanceToPlane(vPlanes[5], vCenter)); return min(min(dist01, dist23), dist45) + fRadius; } [numthreads(1, 1, 1)] void CSMain( uint3 GroupId : SV_GroupID, uint3 DispatchThreadId : SV_DispatchThreadID, uint GroupIndex : SV_GroupIndex) { // Calculate the actual index this thread in this group will be reading from. int index = DispatchThreadId.x; // Bounding sphere center (XYZ) and radius (W), world space float4 Bounds = Buffer0[index]; // Perform view-frustum test float fVisible = CullSphere(FrustumPlanes, Bounds.xyz, Bounds.w); if (fVisible > 0) { float3 viewEye = -View._m03_m13_m23; float CameraSphereDistance = distance( viewEye, Bounds.xyz ); float3 viewEyeSphereDirection = viewEye - Bounds.xyz; float3 viewUp = View._m01_m11_m21; float3 viewDirection = View._m02_m12_m22; float3 viewRight = normalize(cross(viewEyeSphereDirection, viewUp)); // Help handle perspective distortion. // http://article.gmane.org/gmane.games.devel.algorithms/21697/ float fRadius = CameraSphereDistance * tan(asin(Bounds.w / CameraSphereDistance)); // Compute the offsets for the points around the sphere float3 vUpRadius = viewUp * fRadius; float3 vRightRadius = viewRight * fRadius; // Generate the 4 corners of the sphere in world space. float4 vCorner0WS = float4( Bounds.xyz + vUpRadius - vRightRadius, 1 ); // Top-Left float4 vCorner1WS = float4( Bounds.xyz + vUpRadius + vRightRadius, 1 ); // Top-Right float4 vCorner2WS = float4( Bounds.xyz - vUpRadius - vRightRadius, 1 ); // Bottom-Left float4 vCorner3WS = float4( Bounds.xyz - vUpRadius + vRightRadius, 1 ); // Bottom-Right // Project the 4 corners of the sphere into clip space float4 vCorner0CS = mul(ViewProjection, vCorner0WS); float4 vCorner1CS = mul(ViewProjection, vCorner1WS); float4 vCorner2CS = mul(ViewProjection, vCorner2WS); float4 vCorner3CS = mul(ViewProjection, vCorner3WS); // Convert the corner points from clip space to normalized device coordinates float2 vCorner0NDC = vCorner0CS.xy / vCorner0CS.w; float2 vCorner1NDC = vCorner1CS.xy / vCorner1CS.w; float2 vCorner2NDC = vCorner2CS.xy / vCorner2CS.w; float2 vCorner3NDC = vCorner3CS.xy / vCorner3CS.w; vCorner0NDC = float2( 0.5, -0.5 ) * vCorner0NDC + float2( 0.5, 0.5 ); vCorner1NDC = float2( 0.5, -0.5 ) * vCorner1NDC + float2( 0.5, 0.5 ); vCorner2NDC = float2( 0.5, -0.5 ) * vCorner2NDC + float2( 0.5, 0.5 ); vCorner3NDC = float2( 0.5, -0.5 ) * vCorner3NDC + float2( 0.5, 0.5 ); // In order to have the sphere covering at most 4 texels, we need to use // the entire width of the rectangle, instead of only the radius of the rectangle, // which was the original implementation in the ATI paper, it had some edge case // failures I observed from being overly conservative. float fSphereWidthNDC = distance( vCorner0NDC, vCorner1NDC ); // Compute the center of the bounding sphere in screen space float3 Cv = mul( View, float4( Bounds.xyz, 1 ) ).xyz; // compute nearest point to camera on sphere, and project it float3 Pv = Cv - normalize( Cv ) * Bounds.w; float4 ClosestSpherePoint = mul( Projection, float4( Pv, 1 ) ); // Choose a MIP level in the HiZ map. // The original assumed viewport width > height, however I've changed it // to determine the greater of the two. // // This will result in a mip level where the object takes up at most // 2x2 texels such that the 4 sampled points have depths to compare // against. float W = fSphereWidthNDC * max(ViewportSize.x, ViewportSize.y); float fLOD = ceil(log2( W )); // fetch depth samples at the corners of the square to compare against float4 vSamples; vSamples.x = HizMap.SampleLevel( HizMapSampler, vCorner0NDC, fLOD ); vSamples.y = HizMap.SampleLevel( HizMapSampler, vCorner1NDC, fLOD ); vSamples.z = HizMap.SampleLevel( HizMapSampler, vCorner2NDC, fLOD ); vSamples.w = HizMap.SampleLevel( HizMapSampler, vCorner3NDC, fLOD ); float fMaxSampledDepth = max( max( vSamples.x, vSamples.y ), max( vSamples.z, vSamples.w ) ); float fSphereDepth = (ClosestSpherePoint.z / ClosestSpherePoint.w); // cull sphere if the depth is greater than the largest of our HiZ map values BufferOut[index] = (fSphereDepth > fMaxSampledDepth) ? 0 : 1; } else { // The sphere is outside of the view frustum BufferOut[index] = 0; } }
Here’s my sample implementation of the Hierarchical Z-Buffer Culling solution in DX11 and DX9. Some notes, during one of my iterations I disabled the code for rendering a visible representation of the occluders which are just two triangles hardcoded in a vertex buffer to be rendered every frame. Also, DX9 doesn’t actually render anything based on the results. I was just using PIX to test my output of the cull render target and was more focused on getting it working in DX11. The controls are the arrow keys to move the camera around. Red boxes represent culled boxes, white boxes are the visible ones.
I haven’t quite figured out how to deal with shadows. I’ve sort of figured out how to cull the objects whose shadows you can’t possibly see, but not really. Stephen mentions using a tactic similar to the one presented in this paper, CC Shadow Volumes. I wasn’t able to figure it out in the hour I spent going over the paper and haven’t really found the time to revisit it.
We play a lot of Super Smash Brothers Brawl during lunch around here, and occasionally after work there’s additional smashing. I walked in today to find this poster up on the wall, figured I would share since it was so awesome.

One of our customers, Creative Edge Studios, has released the trailer for their new game “Warriors of Elysia” that they’re working on. Dan comes by my office this morning holding a thumb drive and telling me, “Stop whatever you’re doing and watch this!”. “Dude, turn up your sound”. The game trailer then proceeded to go viral inside the office. Awesome.
One of the biggest pain points of MVVM is the boilerplate code that is necessary for it to function. The code I’m referring to is the code needed to Hook and Unhook, initialize/shutdown, load/unload the things the ViewModel needs to do before and after being used by the View. Because the ViewModel is usually forced to hook events coming from the Model to know when to emit its own set of events to update the View, it means knowing the lifetime of the View is critical to knowing when to unhook those events from the Model.
An idea I came up with while working on our level editor a few months back was this thought that, what if I could use the attached behavior pattern to actually solve the problems of ViewModels’s boilerplate code for hooking and unhooking those event on the model handlers.
First you’ll need a base class for your ViewModel, I actually went with an interface instead because I felt it was better suited for this role. The ViewModel I decided to go with is extremely basic, there’s just a Load and Unload method and we also use the interface INotifyPropertyChanged. In our own code I also defined a ViewModel class that derives from this interface that I tended to use most of the time, that just stubs out these methods with your basic implementations just so I didn’t need to implement a RaisePropertyChanged method 40 times.
public interface IViewModel : INotifyPropertyChanged
{
bool Initialized { get; set; }
void Load(FrameworkElement element);
void Unload(FrameworkElement element);
}
Next we need to create an attachable property that has logic associated with it that reacts to changes in the control, the community has come to call these attachable behaviors.
The job of this attachable behavior will be to notify our IViewModel that is databound to the control, when the control is loaded and when the control is unloaded. That way we can just put all our event hooking and handling code into our ViewModel, and never need to worry about manually unhooking those events elsewhere ever again.
Whenever the DataContext on the View is changed, we want to Unload() the old ViewModel that was attached and Load() the new one. Also, whenever the control is Loaded or Unloaded, do the same for the ViewModel.
There are some additional things that need to be taken into account when dealing with IsAsync=True DataContext’s. Additionally there’s a bit of code to handle the problem with some WPF controls that actually sometimes send multiple Loaded events.
public static class ViewModelBehavior
{
public static readonly DependencyProperty LoadUnloadProperty =
DependencyProperty.RegisterAttached("LoadUnload", typeof(Boolean),
typeof(ViewModelBehavior),
new FrameworkPropertyMetadata(false,
new PropertyChangedCallback(OnLoadUnloadChanged)));
public static void SetLoadUnload(FrameworkElement element, Boolean value)
{
element.SetValue(LoadUnloadProperty, value);
}
public static Boolean GetLoadUnload(FrameworkElement element)
{
return (Boolean)element.GetValue(LoadUnloadProperty);
}
public static void OnLoadUnloadChanged(DependencyObject obj,
DependencyPropertyChangedEventArgs args)
{
FrameworkElement element = obj as FrameworkElement;
if (element == null)
throw new InvalidOperationException();
element.DataContextChanged += (sender, e) =>
{
if (!element.IsLoaded)
return;
if (e.OldValue is IViewModel)
{
IViewModel viewModel = ((IViewModel)e.OldValue);
if (viewModel.Initialized)
{
viewModel.Unload(element);
viewModel.Initialized = false;
}
}
if (e.NewValue is IViewModel)
{
IViewModel viewModel = ((IViewModel)e.NewValue);
if (!viewModel.Initialized)
{
viewModel.Initialized = true;
viewModel.Load(element);
}
}
};
element.Loaded += (sender, e) =>
{
IViewModel viewModel =
element.GetValue(FrameworkElement.DataContextProperty) as IViewModel;
if (viewModel != null && !viewModel.Initialized)
{
viewModel.Initialized = true;
viewModel.Load(element);
}
};
element.Unloaded += (sender, e) =>
{
IViewModel viewModel =
element.GetValue(FrameworkElement.DataContextProperty) as IViewModel;
if (viewModel != null && viewModel.Initialized)
{
viewModel.Unload(element);
viewModel.Initialized = false;
}
};
}
}
The way you would use this is to place the property on the controls that are the primary targets when you’re data binding your ViewModel to your view on. For example, suppose I had a ListBox with a series of ListBoxItems, there was a ViewModel databound as the ItemsSource of the ListBox (which will also set the DataContext of the ListBox to the ItemsSource). Also imagine that there is a ViewModel for each item in the list. You would place the following property on the ListBox declaration in XAML. You would also place this property on the first control element of whatever DataTemplate you defined for the ViewModel representing each element in the list.
ViewModelBehavior.LoadUnload="True"
You can download the full source files for both of these classes here, there is more documentation for these classes in those files that I didn’t want to include in the post.
One of the good talks I saw at GDC was this one, Rendering With Conviction. Stephen Hill posted his slides for his presentation and you should check them out if you’re interested. The talk covers some cool tricks they used for Splinter Cell: Conviction. A hierarchical z buffer approach to culling geometry when rendering, and ambient occlusion without the nasty halo effects you typically see in standard SSAO approaches.