One of the plugins I use for my blog is wp-syntax, which is a pretty fantastic plugin for embedding code in posts. It uses GeSHi as its syntax highlighter to process the code, but it lacked a scheme for HLSL. So I hacked out my own and had intended to create a post about it to share with others. I finally remembered to do that, so feel free to snag it from here. To use it in a wordpress blog, you just need to extract the php file and place it here “wp-content\plugins\wp-syntax\geshi\geshi” with the other GeSHi syntax php files.
Nsightful
It took several versions before enough kinks got worked out in Nvidia’s Parallel Nsight Graphics/CUDA debugging application before I could successfully profile my Hierarchical Z-Buffer DirectX 11 implementation.
However, it looks like the latest version which is now completely free with pro level functionality debugs my Hi-Z implementation like a champ!
The tool is really slick. It’s much closer to the experience I’ve wanted to have when debugging the GPU on windows. Thus far it feels much better than the windows version of PIX, with more functionality and information.
I was finally able to get some numbers I trust are more accurate for my Hi-Z DX11 implementation on a GeForce GTX 480.
| ?µs | Some number of occluders being rendered |
| 85µs | Downsample the Hi-Z mipchain |
| 27µs | Testing 900 bounds in the compute shader |
So if you extrapolate that data, that’s culling 10,000 bounds for ~0.3 milliseconds of GPU time. Plus however long it takes you to render your occluders, which should hopefully be cheap and crammed into a deferred command list.
I’m sure you could improve upon the performance of my compute shader that does the work of figuring out what should be culled by just better managing the thread groups. I just thought it was great I finally was able to profile my code with Nsight because I’ve tried every time a new version came out and was thrilled to see it working.
Redneck Cloud Computing
Every now and then I wonder what decisions might be different if I had access to a cloud of machines dedicated to baking game assets or solving other highly parallelizable tasks. So I started looking into what options were available to someone wanting to distribute a ton of work over many machines. I found lots of options but all of them suffered from one or more of these problems,
- Specialized language
- Specialized framework (tied to language)
- Specialized operating system
- New versions would require manual deployment
- Non-commercial license
I wanted a solution that didn’t require a specialized framework or language because chances are I would find something I wanted to distribute that I didn’t want to completely rewrite. No specialized OS, I want to be able to slave any unused windows machine in the office (dedicated farms are really expensive). Also if I need to perform new operations or fix a bug, I don’t want to reinstall / deploy new versions to everyone’s machine.
So I decided to roll my own solution and share the design. I’ll share the finished version of the software once I’ve completed it.
Let’s start with a usecase. I have an executable and a pile of data on disk that I would like to distribute over a ton of machines. I was able to quickly modify the existing program to have a command line option that allowed it to process a range of the data on disk instead of all of it. How do I distribute this work over multiple machines so that I process all the data on disk?
Each time we execute the program with some amount of data to be processed, let’s call that a task. Each task will be processed by some machine in the cloud. To submit the tasks, we will need some common / cross language mechanism of communication. I chose named pipes just because they are easy to use on windows.
To submit tasks to the named pipe, we could either write a simple reusable program that reads task descriptions from a file and submits them to the named pipe, or we could wrap the named pipe communication in a C++ library so that C++ and C# (P-Invoke) could both reuse the logic inside of many various tools (including the generic reusable program that reads task descriptions from files or std::in).
Each machine has a single server running on it. I chose to write the server in C#, but it’s a blackbox as far as your tasks are concerned so you could use something different if you wanted.
Once the server receives the task description via the named pipe it looks at the list of servers it knows about. The serves are detected using a simple UDP broadcast.
Using .Net remoting I then connect to each one of these machines to see what it can offer me.
Now each of these machines could have who knows what installed on them, and you’re about to transfer and run an exe that may have requirements, like .Net 4.0. So each task needs to contain a list of requirements. For now, I’ve got them scoped to 3 things, defined environment variables, registry key/values and .Net version. You could probably drop the .Net version if you just always keep your server written using the latest version so it’s a prerequisite for any machine on your network.
Now I can try and reserve a slot on the machine, if I fail to reserve a slot because of a race condition I move onto the next server.
Having reserved a slot (some machines that are idle or dedicated may have multiple slots), I need to transfer the executable and all the data files listed inside the task description that are claimed to be needed by the task.
I tell the remote server to then begin running the task in a new thread and I continue looking for empty slots in the cloud to submit tasks to.
After I discover that a task has finished, everything that is different in the remote folder where the task was dumped and subsequently executed is then transferred back to the local machine that submitted the task.
The named pipe that was used to submit one or more tasks remains open the whole time and is notified after each task is finished and the data successfully transferred back to the host machine.
So that’s the design in a nutshell. It’s not a solution designed to solve every distributed computing problem, but I like that it solves a very common pattern of problems I see fairly often in a non-intrusive manner.
I haven’t nailed down yet the best way to prevent the system from being abused by a malicious user. However, I suspect having a SQL server with the list of MD5 hashes of the executables that are approved for deployment is one idea I’ve been toying with.
Here’s a simple diagram to help explain the task submission process. Because who doesn’t love diagrams.
Documentation [Easy Mode]
I’ve talked about AtomineerUtils before on my blog very briefly several months back when I first discovered it. I’ve now been using it pretty much straight since then and I can’t live without it at this point.
So what does it do?
It takes an educated guess at what your method does based on the method name, return type and parameters and inserts a pretty good stub of documentation to get you started.
Think about it… if you write Doxygen style comments in your code, how much time do you lose to just formatting and tagging each parameter and then repeatedly copying and pasting the same description over and over for parameters that are basically self explanatory?
The current version lets your organization share the same preference files so you can all share the same rules to assist the tool generate the first stab at documenting your code. The files that allow you to control the documentation generation are VERY customizable, my favorite feature is word expansion. Which allows it to look for patterns in variables or method names like “MS” and expand them to “milliseconds” in the documentation.
I whipped together some examples to show you what it produces out of the box…
Before:
int GetTimeInMS();
void SendPlayerDeathMsg(Player player,
const char* howPlayerDied,
const char* wherePlayerDied);
After:
/**
* @brief Gets the time in milliseconds.
*
* @return The time in milliseconds.
**/
int GetTimeInMS();
/**
* @brief Sends a player death message.
*
* @param player The player.
* @param howPlayerDied Describes how player died.
* @param wherePlayerDied The where player died.
**/
void SendPlayerDeathMsg(Player player,
const char* howPlayerDied,
const char* wherePlayerDied);
It has a tool menu button you can use for generating the documentation, but I prefer to bind it to a key, makes it a lot easier to use. Here are the steps if you don’t know how.
Remember though, this is not a replacement for writing actual documentation. Don’t let yourself get into the habit of hitting Ctrl+D and moving on. AtomineerUtils will get you started, but you still need to verify and improve its results. If you find yourself correcting the same stuff over and over though, try creating a rule for it. Happy documenting.
Heap Inspector
This is a pretty awesome memory tool hobby project Jvander Beek of Vanguard Games is working on. I’m a big fan of the physical memory layout / fragmentation view. Take a look at the videos he has up on YouTube showing it off. He mentioned in the comments potentially releasing it to the public. I hope he does; I’d love to be able to integrate it with a few of my own memory tools and projects (assuming he makes it open source).



