new game playaer: tech

Showing posts with label tech. Show all posts

Monday, July 4, 2011

Tech feature: Script Overview #1

I just recorded a little clip about how scripting works in HPL3. In this film I just talk about the very basic elements of scripting and will follow up with another movie were I talk about some more complex features:

Make sure to watch in in HD: http://youtu.be/5OKFik2IEZU?hd=1

Friday, May 27, 2011

Tech feature: Scripting upgrade

Introduction
For a couple of months now I have, on and off, worked on some basic tech aspects for the engine. Everytime I was done with one of these I thought it was among the hardest things I would do for the new engine, yet the next feature as always proved more challenging. Terrain geometry was harder to implement than sun shadows, terrain texturing harder than geometry, and so on.

Implementing the script system is no different. It is easily the hardest thing I have done so far for our new engine - HPL 3. It has had this "perfect"sort of challenge: Difficult problems to solve, much basic knowledge to wrap your head around and awfully boring and monotonous parts. I really hope this marks the end of this trend of increasing difficulty, as another proportionally large step might make my brain to melt and my fingers to crack. At least it can wait for a lil while...

Enough complaining. Now that the scripting is pretty much implemented (some engine stuff still needs to be added and some more problems remain to be solved, but it is really minor), I am extremely happy with it. I think it will help us build better games, and to finish them faster. Now let's move on to how a boring scripting system accomplishes this.

Background
Before moving on the the actual scripting I need to explain what brought on the creation of the current system. It all started with our first commercial games, The Penumbra series.

When creating the Penumbra games our tools were primitive to say the least. All maps were made in a 3D modeling program, Maya, and then exported to Collada. The game engine loaded the Collada file and built the map from it. As a 3D modeling program is meant to create 3D models, it is not really meant to make levels in. With no ways of placing entities we had to use special naming conventions to tell the game where any non-static objects in the game were located. To be able to do this properly we had to make special "instance" versions of each model meant for the game, since without this you would not be able to see how an object was placed before the game started.

Lighting was equally annoying since Maya has no support for radius on lights. This mean that you could not visually see how far a light reached, but simply entered a numerical value and hoped for the best. As this was not enough, you also needed to place portals and group meshes in order for the engine to provide occlusion culling. This could be quite tricky at times, and often you could sit a day or two simply tweaking portal setup. Added to this was also the problem that Maya often failed to show any textures, and most editing was done on grayish levels. For more info, you can just check the wiki.

The problems do not stop here though. Everytime you meant a change to the game, you had to do a complete reload. So setting up lighting in the game could be quite the effort: change light position for two seconds, load map for 2 minutes, notice it is not good, repeat. As you might figure, we got quite good at batches tasks and the phrase "it will have to do" was uttered more often than not.

For scripting it was just a grueling. Every change in the script required a full restart of the game, creating the same sort of frustration present when modeling maps. And to make scripting even more frustrating, there was no syntax checking until the entire map was loaded! This meant you could wait two minutes of loading only to find out you had forgotten a semicolon or something else trivial.

As I write this, I actually have a hard time understanding how we could have gotten anything done at all. And unsurprisingly, even though we released mod tools and documentation, not a single user map for Penumbra was ever released.

For Amnesia we knew we wanted to fix this somehow. The first step we took was to simply make our own editor where all the maps are built. Since it rendered with the same engine as the game, it made is much easier and faster to tweak entity and light placement. We instantly saw that productivity rise with this change. For scripting it was pretty much the same, but we added the extremely simple fix of compiling the script before loading the game. This removed some of the time previously spent on, in vain, looking at loading screen.

Although we had new tools all was not good. You still had to reload all level data every time you made a change to the map or script. We did not think much of this though as we were so used to doing it this way, and happy that we had all the other improvements. However, a year and a half into the development we discussed if we really needed to reload the level. I cannot recall what sparked this idea, but anyhow we figured that we did not and I added a menu with a Quick Reload button. This cached all textures, models, etc and reloaded map very quickly (usually taking but a few seconds). This increased productivity and creativity tremendously and was one of the better decisions we made during the development of Amnesia. Another sign of how much these changes improved work flow are the over a hundred of user maps created as of today.

What is so strange about the reload-feature is that is something that we could have added during the development of the first Penumbra, but for some reason we did not. It is quite frightening how often you convinces yourself that there is no better way of doing a task, and never try to improve it. We did not want to make this mistake again and started thinking of what more we could do.

Taking script to the next level
In Amnesia and Penumbra, scripting is only used to control logic flow in the levels. How enemies spawn, how puzzles work and so on. All other gameplay is hard-coded into the exe file and written in C++. Normally when I write this kind of code, rendering for instance, I can do large chunks at a time and then simply see if it works as intended. This often in small projects that are fast to reload. However, when writing gameplay and UI code this is almost never the case. Instead you constantly need to fine tune algorithms and variables until you get the expected behavior and work in large projects. Not only does this mean a level restart (with full resource reload), but the exe itself also needs to be built from code, a process that can easily take half a minute, even if the changes are minor. This means that coding gameplay can be quite a hassle at times, on par with how map building was in the Penumbra days. With the lessons learned from Amnesia fresh in mind, this felt like the obvious area of improvement.

In order to make this happen we had to move as much gameplay code as possible into the scripting. What this meant was that we needed to do some large upgrades to our current script implementations. For example, right now we only supported the most basic types (bool, int and float) together with strings in script. This already caused some issues when exposing game/engine functions and when writing scripts, for example instead having a single argument for color, you had to have four floats (one for each color and one of alpha), making code ugly and writing it more cumbersome. So just this upgrade was worth doing.

We also needed expose all engine classes, so that the script could be used pretty much as if it was normal C++ code, and achieve pretty much the same things. I was not sure exactly how much to expose but knew that the more the better.

Finally, the most important feature was to be able to reload script at any point, so it would be easy to just change a line, click a reload button and then a few seconds (or less) later see the change in-game. To get this working was the by far the most important goal with the entire upgrade. This would not be as easy as the level script was in Amnesia though, since the script system would not only take care of the code, but of part of the data as well. This meant I needed to save the state somehow, a feat I was not sure yet how to accomplish.

Implementing Classes
Before tackling the problem of script reload, I first had to make sure to add engine types to the engine. And even before doing that I needed to be sure our current scripting middleware was up to the task.

The scripting system that we were using, and have been for a long time, is a library called Angel Script. It is actually the middleware that we have used the longest, ever since end of 2004 and the Energetic project. Even though we have used it for such a long time we never really used to anywhere near its full extent and now was finally the time to make up for that. I took a day to look through the documentation and found that AngelScript could support everything that we needed, and what was even better was that it was not that difficult to add.

AngelScript is a bit different from other popular script languages (like Lua) in that it is strongly typed and very closely connected to the underlying C++ code. For one thing this makes the script quite fast (although not faster, see end of post for more info). It also meant that AngelScript could link to the classes almost directly and I only had to declare the class and link to its different parts (whatever member variables and methods that I wanted to be exposed). It also supports pointers quite easily, but makes them more secure with a script specific handle type. This requires you to keep track of some memory management for the data, but you do not need to do it, and I could very easily and quickly add support for a vast number of engine resources (textures, meshes, lights, etc).

The only thing that was a little bit trickier was supporting inheritance (when a class can build upon another class). Basically you have to redeclare methods every time you add new class that inherits from something. You also need to specify to what other classes this class can be cast to. This might sound a bit of a hassle, but the result is that you can control so the script always has a very close mapping to the code it exposes, something that I was extremely thankful for when implementing the state saving (more on that below). Also, through some use of macros, adding the implemented classes become quite easy.

Basic script layout
The next thing to figure out was the basic structure of the script code. In Penumbra and Amnesia we simply added the functions directly in a script file and then allowed that script file to represent an object. I first thought that this would be a valid design this time again, until I started thinking about how to store the data (meaning any data that should be kept between executions of the script).

In Amnesia and Penumbra the only data that is saved is simple variables like an integer for the number of times the player had stepped on button or similar. This is done through using special functions for all these variables, for example:

SetLocalVarInt("ButtonPushNum", 1);

This function saves the data to the game, and when the script is reloaded (meaning destroyed and recreated) the script can easily reference the data again by doing:

int lX = GetLocalVarInt("ButtonPushNum");

However, this time the game had to save a lot more complex data, like pointers to resources, matrices and whatnot. At first I actually considered using system with functions like this, but I figured that it would just mean a lot of extra work, and just make things more difficult. Again I thought about the lessons from Amnesia's reload feature, and thought it was worth trying to find a better solution. What I ended up doing was to force each object to be contained in class, and then let all data to be members of that class. This allowed AngelScript to make copies (needed for enemies and whatever there will be more than one instance of) and also made it was easy to keep track of the members (you simply create an object in the C++ code and can then iterate all its data).

A problem that I realized now was that I needed to have C++ functions that only worked in certain types of classes and only on data for a certain copy. For instance, an enemy class might want a function like IsInLineOfSight(...) to see if it has visual contact with something. However, there were not any functionality in AngelScript for doing this. I could give the script class a template class that forces it to implement certain functions, but I could not let the C++ code act as a base and expose specific functions from it. To solve this I had do some hacking. I ended up using global functions, and to keep track of the currently active object. (Again with macros to the rescue.) The resulting solution was not perfect though, as the global function can be used in any class and not just the one it was meant for. I am still looking into some fix for this.

Saving the state
It was now time to save the state of the script. To start this off I took the naive approach and simply saved the script data directly by copying it. This works very well for stuff like vectors, matrices, etc where it is just a matter to make a copy of the data. But it does not work that well with resources like meshes, texture or even objects like physics bodies, billboards and lights. There is way too much data in those to copy. And if I simply save the pointer I need to make sure that no data has changed when the state is loaded again, or else the saved state will not work.

At the time I was only building the system to work with a script reload only, so these issues did not pose a problem. But a major obstacle popped up when I wanted to save classes defined in scripts code. These where not possible to simply save by copying, because when you rewrite a script they can change entirely. For instance, if the class when saved consisted of two integers, and when reloaded has one string and a matrix instead the data you saved is invalid. It gets even worse if a script class has another script class as member and even more so if script classes are saved in arrays.

I figured I needed to do some kind of drastic change if this was to work. I was sketching on a few systems that could save variables in a separate structure, when it hit me. The system I was working on could not only be used to save the state, but if I implemented it correctly I could also use it for saving. For Amnesia and Penumbra I use a special serialize system that can save classes to file if initialized correctly. It is quite cumbersome, but way better than writing load / save for every single variable. However, as I was to do most gameplay code through script (the code that pretty much contain everything that needs saving), I could actually get rid of rewriting the save code for every single gameplay update. Another huge benefit of using scripting: automatic saving. This is bound to save tons of work. (It did mean a lot more work though...)

Up to that point I had looked at the scripting as separate part of in the engine. A module that all other modules could work without if removed, exactly like how most other modules work. Thinking like this had colored a lot of the design decisions I had made. Now, I instead decided that since scripting will be basically what controls the engine, I started of thinking as something that was part of all other modules. This led me to do a major rewrite of the state saving.

There is quite a lot to say about this system (and serializing in general), but I do not really have the space right now, so I will just do a quick overview. First of all, the system defines a few basic types that all other structures are built up from. These are bool, int, float, vector, matrix, color, etc and I implement very specific create/save/load methods for each of these. In AngelScript all of these are also implemented as primitives (built-in types) or data objects (which is a type where AngelScript handle all memory management). When saved the actual binary data is saved.

All other classes are implemented as references, meaning AngelScript only manages their pointers. When saving the state of these classes, each class must implement a a special method that adds all of the member variables (either basic types or other reference classes) to a list. (Basically the explicit method described here). This list is then used when transferring data to a special save buffer that contains the data of all the basic types and/or sub buffers containing data for other classes.

For certain classes, its pointer is backed up in the C++ code. So when the save data for it is loaded, the pointer to the class is first searched for and if found it is updated with the saved member data. This makes it possible to reload the script code without having to recreate all the data.

Of course, all sorts of complications arise during this, and again it would take too long to go through them all. But as I hinted before one thing that saved me is that script objects are so closely connected to the engine data. When writing code that deals with serialization one of the biggest annoyances is that that class pointers with multiple inheritance can change their address when cast. Because the script language only returns void pointers (memory address with no type specified), you must be sure of the type it returns, something AngelScript allowed me to be.

Another fun thing with serialization are all the strange macros that you have to make. For example, this one was pretty fun:

#define ClassMemberClassOffset(aClass,aMember)
( size_t(static_cast(&(( (aClass*)1)->aMember)) ) -1 )

This one is used to make sure that class member offset address points correctly. It is one of the many things that make sure multiple inheritance problems, as explained above, does not screw things up. I got an entire header file just filled with fun stuff like this!

Closing notes
Once I had the system working it was quite an effort to add all the classes. There are countless classes that all need to be set up in specific way and everything to be saved needs to be initialized. Sitting several days in a row just coding stuff like this can really tear on the psyche. Fortunately, almost everything is added now, so the worst part should be over.

Another nice addition to the new script system is that it can auto generate the API function file for NotePad++. This allows NP++ to autocomplete any text that you write and you no longer need to keep function names or arguments list in your head (and skip the manual look-ups). What is annoying though is that it only works with global functions and NP++ cannot recognice what class a certain variable is in and show all its members (like visual assist can for C++). Does anybody know an editor that can manage this?

One thing that I was really eager to get working was threading. In Amnesia we use tons of special timer callbacks to set up events, and it would have been so nice to do something like this instead:

...
PlaySound("x");
Sleep(10)
PlaySound("y");
WaitUntilTriggered("a")
...

AngelScript does support this sort of thing, but there is currently no way of saving the state. So if the player where to save during a sleep, the game would be different when loaded. This is unfortunate, but I am looking into some other solutions instead. If anyone has ideas on this please share!

Right now I have only tried this system in some tests, but it still feels really good and works exactly as I want. It is just so great to be able to reload any code changes in a fraction of second. It allows for rapid iteration, makes one more productive and generally just makes it so much more enjoyable to code. It is also so gratifying to have an idea and then successfully implement it to the sort of quality you hoped for.

Addendum:
I earlier wrote that AngelScript was faster than Lua, which is not the case when it comes to code solely executed in the script (see here). So obviously my initial statement was not correct. However, I still think AngelScript must be faster in calling implemented c++ functions/types as there is pretty much no overlay involved. I do not have any data to back this up though, so do not take my word for it :)

Found this irresistibly interesting? We are hiring!

Tuesday, January 25, 2011

Physics and Heightmaps

When I thought that all problems with heightmaps was over, I stumbled upon something sort of tricky recently. The only thing I had left for heightmaps was to add physics to them. This seemed easy to do as it was basically just a matter of sending the raw heightmap data to the physics engine (newton game dynamics) However just as I had done this I realized that this was not enough: the terrain could have many different physical properties at different places (a spot with dirt, one with rock, etc).

The thing is that in physics simulations you give a material per shape, each material having certain properties (friction, etc) and special effects (sounds, etc). The heightmap is counted as a single shape, and thus it only has a single physics material. This was something I had totally forgotten. Luckily, the physics engine supports the assigning of special properties to each point in the heightmap. Once I found the proper info, it was pretty simple to add this (see here).

Now it was just a matter of adding extra material values to the heightmap (basically just an array of single integers, the id of the physics material at each point). My initial idea was that this could be "painted" on as an extra step, and to be sure I asked Jens what he thought about it. His reaction was that this would be way too much work and wondered if it could not be auto-generated instead. We already had a physics material assigned to each render material, so the basic info was easily accessible.

However, when I started to thinking about this, I found the actual auto-generation increasingly tricky. How should we determine which of the many blended materials to set the physics properties (blending was not possible)? Also, how do I get this information, considering that the blend textures can saved as compressed textures, into a CPU buffer?

The way we chose to solve the material picking was that the top visible (meaning over a certain limit opacity) blend material always sets the physics material. This allows map creators to set priority on materials in terms of physics, by simply placing them at different places among all blend materials. For example, if a material like gravel should also have its effects shown no matter what it is blended with, then it should be placed high up in the list. While currently untested this seem pretty nice and can also be tweaked a bit (like having something else determining priority).

The generation of this data was made by rendering the blend textures to an off-screen target and then grabbing the data into normal memory. What this meant is that the GPU would decompress any packed textures for us. This also solved some other problems, like the need to resize the texture according to heightmap resolution. Once the data is grabbed from the GPU it is just a matter to loop through it, check values and write to the final buffer.

Problem was finally solved and physics properties auto-generated!

With this little post I hope to show that there often is more to a problem than what is visible at first. Also, this shows another advantage of using normal texture splatting (more info here), instead of megatextures or similar. With the auto-generation of physics, it is much easier to create and update the terrain, something extremely important when you are a small team like us.

Would be very interesting what other techniques people use (or known of) for setting up physics properties on terrain!

Thursday, January 20, 2011

Tech Feature: Undergrowth

Introduction
After a little break with updates on the rendering system, holidays and super secret stuff, I could finally get back to terrain rendering this week. This meant work on the final big part of the terrain system: Undergrowth. This is basically grass and any kind of small vegetation close to the ground.

As always, I started out doing a ton of research on the subject to at least have a chance of making proper decisions at the start. The problem with undergrowth/grass is that while I could find a lot of resources, most were quite specific, describing techniques that only worked in special cases. This is quite common when doing technical stuff for games; while there are a lot of nice information, only a very small part is usable in an actual game. This is especially true when dealing with any larger system (like terrain) and not just some localized special effect. In these cases reports from other developers are by far best, and writing these blog posts is partly a way to pay back what I have learned from other people's work.

Now on with the tech stuff!

Plant Placement Data
The first problem I was faced with was how to define where the undergrowth should be. In all of the resources I found, there was some kind of density texture used (meaning a 2D image where each pixel defines the amount of plants at that point). I did not like this idea very much though, mainly because I would be forced to have lots of textures, one for each undergrowth type, or to not allow overlapping plants (meaning the same area on the map would not be able to contain two different types of undergrowth). There are ways past this (e.g. the FrostBite engine uses sort of texture atlases), but then making it work inside the editor would be a pain, most likely demanding pre-processing and a special editor renderer. I had to do something else.

What I settled on was to use area primitives, simple geometrical shapes that defined where the undergrowth would be. The way this works is that each primitive define a 2D area were plants should be placed. It then also contains variables such as density, allowing one to place thick grass at one place, and a sparser area elsewhere.

I ended up implemented circle and convex polygons primitives for this, which during tests seem to work just fine.

Generating the Plants
The next problem was how generate the actual geometry. My first idea was to simply draw the grass for each area, but there were some problems with this. One major was that it would not look good with overlapping areas. If areas of the same density overlapped, the cross section would have twice the density of the combined area. This did not seem right to me. Also, it was problematic to get a nice distribution only using areas and I was unsure how to save the data.

Once again, the report on the FrostBite engine gave me an idea on how to approach this. They way they did this was to fill a grid a with probability values. For each grid point a random number between 0 and 1 is generated and then compared to to the one saved in the grid. If the generated number is lower than the saved, a plant is generated at that point, else not. Each plant is then offset by random amount, creating a nice uniform but random distribution of the plants!

This system fit perfect with the undergrowth areas and simplified it too. Using this approach, an undergrowth area does not need to worry about generating the actual plants, but only to generate numbers on the grid.

The final version works like this:

There is an undergrowth material for each type of plant that is used on a level. This material specifies the max density of each plant and thus determines how the grid should look. A material with a high density will have a grid with many points and one with a low density will have few grid points. Each point (not all of course, some culling is used) on the grid is checked against a area primitive and a value is calculated. This is then repeated for each area, adding contributions from all areas that cover the same grid point.

This solves the problem of overlapping areas, as the density can never become larger than max defined by the grid. It allows makes it possible to have negative areas, that reduce the amount of undergrowth in a certain place. This way, the two simple area primitives I have implemented can be used for just about any kind of undergrowth layout.

Cache system
Now it is time to discuss how to generate the actual plants. A way to do this is to just generate the geometry for the entire map, but that would take up way too much memory and be quite slow. Instead, I use a cache system that only generate grass close to the camera (this is also how FrostBite does it).

The engine divides the entire terrain into a grid of quads, and then generates cache data for each quad that is close enough to the camera. For each quad, it is checked what areas intersect with it, and layers are made for each undergrowth material that it contain. Then for each layer, plants are generated based on the method described above. The undergrowth material also contains texture and model data as well as a bunch of other properties. For example, the size can be randomized and different parts of the texture used, all to add some variety to the patch of undergrowth. Finally plant is also offset in height according to the heightmap.

This cache generation took quite some work to get good enough. I had problems with the game stuttering as you traveled through a level, and had to do various tricks to make it faster. I also made sure that no more than one patch is generated at each frame (unless the camera is teleported or similar).

Rendering
Once the cache system was in place, rendering the plants were not that much of a problem. Each generated patch comes with the grass in world coordinates, so it is as simple as it can get. The only fancy stuff happening is that grass in the distance is dissolved. This means that the grass does not end with a sharp border, but smoothly fades out.

In the above image you can see how the grass dissolves at distance. Here it looks pretty crappy, but with proper art, it is meant that grass and ground texture should match, thus making the transition pretty much unnoticeable.

Another thing worth mentioning, is that the normal for each grass model is the same as the ground. This gives a nice look to many plants, but an individual plant gets quite crappy shading. Undergrowth is meant to be small and not seen close-up though, so I think this should work out fine. Also, when making grass earlier (during development Amnesia), normal normals (ha...) were used and the result was quite bad (sharp shading, etc).

Animation
Static grass is boring, so of course some kind of animation is needed. What I wanted was two different kinds of animation: A global wind animation (unique for each material) and also local animation due to events in a limited area (someone walking through grass, wind from a helicopter, etc).

My first idea was to do all of these on the cpu, meaning that I would need to resend all the geometry to the graphics card each frame. This would allow me to use all kinds of fanciness for animation (like my dear noise and fractals) and would easily allow for lots of local disturbances.

However, I did some thinking and decided that this would be a bad idea. Not only does the sending of data to the graphics card take up time, but there might be some pretty heavy calculations needed (like rotating normals) for a lot plants, so the cpu burden would be very heavy. Instead I chose to do everything on the GPU.

Implementing the global wind animation was quite simple; i was just a matter of sending a few new variables to the grass shader. But it was a bit harder to come up with the actual algorithm. Perhaps I did not look hard enough, but I could find very little help on this area, so I had to do a lot of experimenting instead. The idea was to get something that was fast (i.e. no stuff like Perlin noise allowed) and yet have a natural random feel to it. What I ended up with was this:

add_x = vec3(7.0, 3.0, 1.0) * VertexPos.z * wind_freq + vec3(13.0, 17.0, 103.0);
offset.x = dot( vec3(sin(fT*1.13 + add_x.x), sin(fT*1.17 + add_x.y), sin(fT + add_x.y)), vec3(0.125, 0.25, 1.0) );

add_y = vec3(7.0, 3.0, 1.0) * (VertexPos..x + vOffset.x) * wind_freq + vec3(103.0, 13.0, 113.0);;
offset.y = dot( vec3(sin(fT*1.13 + add_y.x), sin(fT*1.17 + add_y.y), sin(fT + add_y.z)), vec3(0.125, 0.25, 1.0) );

This is basically a couple of fractually nested sin curves (fbm basically) that take the current vertex position as input. The important thing to note is the prime numbers such as vec3(7.0, 3.0, 1.0) without these the cycles of the sin curves overlap and the end result is a very cyclic, boring and unnatural look.

The offset generated is then applied differently depending on the height of the plant. There will be a lot of swaying at the top and none at the bottom. To do this the base y-coordinate of the plant is saved in a secondary texture coordinate and then looked up in the shader.

Now finally, the local animation. To do this entities called ForceFields are used. (Thanks Luis for the name suggestion! It made doing the boring parts so much more fun to make.) These are entities that come with a radius and a force value, and is meant to create effects on the graphics that they touch. Right now only grass is affected, but later on effects on ropes, cloth, larger plants , etc are meant to be added.

These effect of these are applied in the shader and currently I support a maximum of four ForceFields per cache patch. In the shader I either do none, a single entity or four at once. This means that if three entities affect a patch, I still render the outcome of four, but fill the last one's data with dummy (null) values. Using four is actually almost as fast as using a single. Because of how GPUs work, I can do a lot of work for all four entities in the same amount of instructions as for a single. This greatly cut down the amount of work that is needed.

Again, just like with the global wind, it was hard work to come up with a good algorithm for this. My first idea was to simply push each plant away from the center of the force field, but this looked really crappy. I then tried to add some randomness and animation to this in order to make it nicer. As inspiration, I looked at Titan Quest, which has a very nice effect when you walk through grass. After almost a days work the final algorithm look something like this:

fForce = 1 - distance(vtx_pos, force_field_pos)/force_field_radius

fAngle = T + rand_seed*6.28;
fForce *= sin(force_field_t + fAngle);

vDir = vec2(sin(fAngle), cos(fAngle));
vOffset.xz += vDir * fForce;

Rand seed is variable that is saved in the secondary texture coordinate and is generated for each plant. This helps gives more random and natural feel to it.

Here is how it all looks in action:

Note: Make sure to check in HD!

And in case you are wonder all of this is ugly, made in 10 seconds, graphics.

End notes
Now that undergrowth is finally done, it means that all basic terrain features are implemented! In case you have missed out on earlier post here is a summary:

Terrain Geometry
Fractals and Noise
Terrain Texturing

I hope this has been of use and/or interest to somebody! :)

Next up for me is some final terrain stuff (basically just some clean-up) and then I will move on to more gameplay related stuff. More on that later though...

Thursday, December 2, 2010

Tech feature: Light Masking

So just wanted to give a quick info on a brand new feature: light box masks.

When placing lights in some rooms, it is common that light bleeds through walls, and show up in other rooms close by. The obvious way to fix this is to add shadows, but shadows can be pretty expensive (especially for point lights), so it is not often a viable solution. In Amnesia we solved this through careful placement, yet bleeding can be seen in some places.

To fix this I added a new feature that is able to limit the lights range with a box. This way the light can cast light as normal but is cut off before reaching an adjacent area. This pretty much does the job of shadows, but is much cheaper.

It turned out to be pretty simple to implement as well. In the renderer, different geometrical shapes are used to render lights (spheres for point lights and pyramids for spots) which make sure the light only affects needed pixels. To implement the masking, these shapes where simply exchanged for a box and then with some small shader changes it all worked.

Without masking:

With mask:

Wednesday, December 1, 2010

Bye, bye Pre-Pass lighting

I have an announcement to make.

I am dumping pre-pass lighting.

A couple of weeks ago I started to remaking the renderer from a deferred shader to a pre-pass lighting one. Directly after implementing it, I wrote this post. At first, pre-pass lighting sounded great: faster light rendering and more variation in materials. Having seen that companies such as Crytek and Insomniac Games used it, I thought it would be the next logical step to take.

However, even as implemented it, the problems began. The first one was that specular lighting has to be made through hacks or something that makes it closer to deferred lighting. The next was that implementation become more messy. I suddenly needed to redraw all objects in two separate passes and this made the material and shader code harder to maintain. Normal deferred shading has this nice design where all material info is rendered in one pass to one buffer. But in pre-pass lighting, this spread out and makes more annoying to add new stuff and to update existing.

Still, I stuck to it, because I was sure that the speed and material variety would make up for it. One of the features I was looking forward to was making more interesting decals, with normals and such. Since only the light data is written to an accumulation buffer I thought this would allow me to easily put more effects to the decals. However, I quickly realized that I had been quite foolish and not considered that pretty much every interesting part of a materials is added when lighting it. The surface normals, specular, etc are all baked into the light data. So I ended up doing tricks that I could actually work with normal deferred shading.

So what ended up with was lighting of worse quality, compared deferred shading, and with no more room for special effects. Still, this rendering is much faster right? Well, I did some checks which I collected in this post. It turns out that pre-pass is actually slower unless in very specific situations. None of the improvements I was hoping for turned out to be true.

Still, I stuck to it. I am not sure why, but I guess I did not want to face the truth after having put so much time and effort into it. Going back to the old renderer was something I did not want to consider.

Then last week, as I was starting making undergrowth for the terrain, it suddenly happened. I realized that I had to render the vegetation twice, creating more overdraw and making it a lot more cumbersome to implement. At this point I decided that I should seriously consider going back to the old deferred renderer. What I was most worried about about was that it would exclude us from consoles, but I found out that games like Burnout Paradise used a deferred shader too, and assuring me that consoles would still be possible to do.

This post by Adrian Stone, with an in-depth discussion on the subject, sealed the deal for me and I got to work with going back to deferred shading. I had actually come across Adrian's post before when implemented pre-pass lighting, but never read it carefully. I guess it would not had made me stop then since I wanted to check it out myself, but it is interesting to see how one can convince oneself that something is correct, to the point of avoid contradictory sources. This is a very important lesson to learn and one should always be prepared to reconsider and "kill your darlings".

Right now I have fully implemented the deferred shader again and even updated it a bit too. For one thing, I fixed so the decals support all the feature I had in the pre-pass lighting shader. Since we are aiming for a little higher specs (shader model 3 or 4) for our next game, I took that into account and was able to add some other fun stuff. Examples are colored specular and saving the emission in the g-buffer (allowing to cheaply to a variety of effects).

I am really happy to back to the old renderer and now that I am adding new features things are going a lot smoother. The pre-pass renderer was not all in vain though. I cleaned up the rendering code a lot and it also made me rethink how some features could be added. Last but not least, it also reminded me that I should never get too attached to an idea.

Wednesday, November 24, 2010

Tech Feature: Terrain textures

I have finally finished the part of the terrain rendering that I spent most time researching and thinking about: texturing. This is a quite big problem, with many methods available, each having its own pros and cons.

I was looking for something that gave a lot of freedom for the artists, that was fast and that allowed that the same algorithm could be used in both game and editor. The last point was especially important since we had much success with our WYSIWYG-editor for Amnesia, and we did not want terrain to break this by requiring some complicated creation process.

Even once I started working on the textures, I was unsure on the exact approach to take. I had at least decided to use some form of texture splatting as the base. However there is a lot of ways to go about this, the two major directions being to either do it all in real-time or to rendering to cache textures in some manner.

Before doing any proper work on the texturing algorithm I wanted to see how the texturing looked on some test terrain. In the image below I am simply project a tiling texture along the y-axis.

Although I had checked other games, I was not sure how good this the y-axis projection would look. What I was worried of was that there would be a lot of stretching at slopes. It turned out that it was not that bad though and the worst case looks something like this:

While visible it was not as bad as I first thought it would be. Seeing this made me more confident that I could project along the y-axis for all textures, something that allowed for the cached texture approach. If I did all blending in real-time I would have been able to have a special uv-mapping for slopes, but now that y-axis projection worked, this was no longer essential. However, before I could start on testing texture caching, I need to implement the blending.

The plain-vanilla way to do is, is to have an alpha texture for each texture layer and then draw one texture layer after another. Instead of having many render passes, I wanted to do as much blending in a single draw call. By using a an RGBA texture for the alpha I could do a maximum of 4 at the same time. I first considered this, but then I saw a paper by Martin Mittring from Crytek called "Advanced virtual texture topics" where an interesting approach was suggested. By using an RGB texture up to 8 textures could be blended, by letting each corner of an rbg-cube be a texture. A problem with this approach is that each texture can only be nicely blended with 3 other corners (textures), restricting artists a bit. See below how texture layers are connected (a quick sketch by me):

Side note: Yes, it would be possible to use an RGBA texture with this technique and let the corners of a hyper cube represent all of the textures. This would allow each texture type to have 4 textures it could blend with and a maximum of 16 texture layers. However, it would make life quite hard for artists when having to think in 4D...

When implemented it looks like this (note he rgb texture in the upper right corner):

However, I got into a few problems with this approach, that I first thought where graphics card problems, but later turned out to be my fault. During this I switch to using several layers of RGBA textures instead, blending 4 textures at each pass. When I discovered that is was my own error (doh!), I had already decided on using cache textures (more on that in a jiffy), which put less focus on render speed of the blending. Also this approach seemed nicer for artists. So I decided on a pretty much plain-vanilla approach, meaning some work in vain, but perhaps I can have use for it later on instead.

Now for texture caching. This method basically works as the mega texture method using in Quake Wars and others. But instead of loading pieces of a gigantic texture at run-time, pieces of the gigantic texture is generated at run-time. To do this I have a several render textures in memory that are updated with the content depending on what is in view. Also, depending on the geometry LOD I use, I vary the texture resolution rendered to and make it cover a larger area. So texture close to the view use large textures and far away have much lower.

I first thought had to do some special fading between the levels and was a bit concerned on how to do this. However, it turned out that this was taken care of the trilinear texture filtering quite nicely (especially when generating mipmaps for each rendered texture). When implemented the algorithm proved very fast as the texture does not have to be updated very often and I got very high levels of detail in the terrain.

Side note: The algorithm is actually used in Halo Wars and is mentioned in a nice lecture that you can see here. Seeing this also made me confident that it was a viable approach.

The algorithm was not without problems though, for example the filtering between patches (different texture caches) created seams, as can be seen below:

(click to enlarge, else it will not be seen)

The way I fixed this was simply to let each texture have a border that mimicked all of the surrounding textures. While the idea was simple, it was actually non-trivial to implement. For example, I started out with a 1 pixel border, but had to have a 8 pixel border for the highest 1024x1024 textures to be able to shrink it. Anyhow, I did get it working, making it look like this:

(Again, click image to see full size!)

Next up was improving the blending. The normal blending for texture splatting can be quite boring and instead of just using a linear blend I wanted to spice it up a bit. I found a very nice technique for this on Max McGuire's blog, which you can see here. Basically each material gets an alpha that determines how fast each part of it fades. The algorithm I ended up with was a bit different from the one outlined in Max's blog and looks like this:

final_alpha = clamp( (dissolve_alpha- (1.0 - blend_alpha ) / (dissolve_alpha * (1-fade_start), 0.0, 1.0);

Where final_alpha is used to blend the color for a texture and fade_start determines at which alpha value the fade starts (this allows the texture to disappear piece by piece). blend_alpha is gotten from the blend texture, and dissolve_alpha is in the texture, telling when parts of the texture fades out.

So instead of having to have blending like this:

It can look like this:

Now next step for me was to allow just not diffuse textures, but also normal mapping and specular. This was done by simply rendering to more render targets, so each type had a separate texture. This would not have been possible to do if I had blended in real-time as I would have reached the normal limit of 16 texture limits quite fast. But now I rendered them separately, and when rendering the final real-time texture I only need to use a texture for each type (taken from the cache textures). Here is how all this combined look:

You can see small version of each cache texture at the top.

Now for a final thing. Since the texture cached are not rendered very often I can do quite a lot of heavy stuff in them. And one thing I was sure we needed was decals. What I did was simply to render a lot of quads to the textures which are blended with the existing texture. This can be used to add all sorts of extra detail to map and almost require no extra power. Here is an example:

I am pretty happy with these features for now although there are some stuff to add. One thing I need to do is some kind of real-time conversion to DXT texture for the caches. This would save quite a lot of memory (4 - 8 times less would be used by terrain) and this would also speed up rendering. Another thing I want to investigate is to add shadows, SSAO and other effects when rendering each cache texture. Added to this are also some bad visual popping when levels are changed (this only happens when zooming out a steep angle though) that I probably need to fix later on.

Now my next task will be to add generated undergrowth! So expect to see some swaying grass in the next tech feature!

Monday, November 8, 2010

Tech Feature: Noise and Fractals

Introduction
Now that I have a working algorithm for terrain rendering, I wanted to try making some of it procedurally. This would not be used in order to generate levels, but instead to help artists add some extra detail and perhaps for some effects. The natural world is very noisy and fractal place, so in order a to get a nice looking environment, these two features are crucial.

Noise
When doing noise for natural phenomena, one normally wants some kind of coherent noise. Normal white noise, when nearby pixels are not correlated in any way, looks like this:

This is no good when one wants generate terrain and the like. Instead the noise should have a more smooth feel to it. To get achieve this, one fades between different random values, creating smooth gradients. A way to do this is to generate a pseudo-random number (pseudo because a certain coordinate, will always return the same random value) for whole number points, and then let the fractional parts between these be interpolations. For example, consider the 1D point 5.5. To get the value for this coordinate the pseudo-random values for 5 and 6 are gotten. Lets say they are 10 and 15. These are then interpolated and since 5.5 lies right between them, it is given the value 12.5 ( (10+15)/2 ). This technique is actually very similar to image magnification, where the whole numbers represent the original pixels.

Generating random numbers this way, something like this is gotten:

This looks okay, but the interpolations are not very smooth and looks quite ugly. This can be fixed by using a better kind of interpolation. One way to to do this is using cosine-interpolation, which smoothen the transition a bit.

This looks a lot better, but the height map image still looks a bit angular, and not that smooth. However, we can smooth it even further by using cubic interpolation. This ties nicely into the image magnification analogy I made early as cubic is a common type of filter for that. It works by not only taking into account the two points to blend between, but the points next to them as well. In our above example this would be the points 4 and 7 (which are next to 5 and 6). It looks like this:

This gives a much smoother appearance, but it (as well as the other algorithms above) has some other problems. Because the height values for each whole pixel are completely random, it is gives a very chaotic impression. Many times one wants a more uniform look instead. To fix this something called Perlin noise is used. What makes this algorithm extra nice is that it based on gradients instead of absolute values for each pixel. Each whole pixel is assumed to have the value 0, and then a gradient determines how the value changes between it and a neighboring pixel. This allows it to be much more uniform look:

Because of it is based on gradients, it also makes it possible to take the derivative of it, which can be used to generate normal maps (something I am not using though). It is also quite fast, pretty much identical to the cosine interpolation. The cubic interpolation, which requires more random samples, is almost twice as slow.

Fractals
Now that a coherent noise function is implemented it can be used to generate some terrain. The screens above does not look that realistic though and to improve the look something called Fractal Brownian Motion can be used. This is a really simple technique and works, like all fractals, by iterating an algorithm over and over. What is iterated is the noise function, starting off with a large distance between the whole pixel inputs (low frequency) and then using smaller and smaller distances (higher frequency) for each iteration. The higher the frequency the smaller the influence, resulting in the low frequency noise creating the large scale features and the high frequency creating the details.

The result of doing so can produce something like this:

Suddenly we get something that looks a lot more like real terrain!

There is lots of stuff that can be done with this and often very simple alteration can lead to interesting results. Here is some iterated fractal noise that as been combined with a sine-function afterwards:

End notes
There is a lot more fun stuff that can be done using noise and I have just scratched the surface with this. It is a really versatile method with tons of usages for graphics. The problem is that that it can be quite slow though and my implementation will not be used for any real-time effects. However, Perlin noise can be simulated on the GPU, allowing it for realtime usage, and this is something I might look into later.

Next up is the hardest part of the terrain rendering - texturing! I am actually still not sure how to do it, but have tons of ideas. Can never get enough of info though, so if anybody know any good papers on terrain texturing, please share!

Thursday, November 4, 2010

Tech Feature: Terrain geometry

Introduction
The past two weeks I have been working on terrain, and for two months or so before that I have (at irregular intervals) been researching and planning this work. Now finally the geometry-generation part of the terrain code is as good as completed.

The first thing I had to decide was what kind of technique to use. There are tons of ways to deal with terrain and a lot of papers/literature on it. I have some ideas on what the super secret project will need in terms of terrain, but still wanted to to keep it as open as possible so that the tech I made now would not become unusable later on. Because of this I needed to use something that felt customizable and scalable, and be able to fit the needs that might arise in the future.

Generating vertices
What I decided on was a an updated version of geomipmapping. My main resources was the original paper from 2000 (found here) and the terrain paper for the Frostbite Engine that power Battlefield: Bad Company (see presentation here). Basically, the approach works by having a heightmap of the terrain and then generate all geometry on the GPU. This limits the game to Shader Model 3 cards (for NVIDIA at least, ATI only has it in Shader model 4 cards in OpenGL) as the height map texture needs to be accessed in the vertex shader. This means fewer cards will be able to play the game, but since we will not release until 2 years or so from now that should not be much of a problem. Also, it would be possible to add a version that precomputes the geometry if it was really needed.

The good thing about doing geomipmapping on the GPUis that it is very easy to vary the amount of detail used and it saves a lot of memory (the heightmap takes about about a 1/10 of what the vertex data does). Before I go into the geomipmapping algorithm, I will first discuss how to generate the actual data. Basically, what you do is render one or several vertex grids that read from the heightmap and then offset the y-coordinate for each vertex. The normal is also generated by taking four height samples around current heightmap texel. Here is what it looks in in the G-buffer when normal and depth are generated from a heightmap (which is also included in the image):

Since I spent some time with figuring out normal generation algorithm, here is some explaination on that. The basic algorithm is as follows:

h0 = height(x+1, z);
h1 = height(x-1, z);
h2 = height(x, z+1);
h3 = height(x, z+1);
normal = normalize(h1-h0, 2 * height_texel_ratio, h3-h2);

What happens here is that the slope is calculated along the x-axis and then z-axis. Slope is defined by:
dx= (h1-h0) / (x1-x0)
or put in words, the difference in height divided by the difference in length. But since the distance is always 2 units for both the x and z, slope we can skip this division and simply just go with the difference in height. Now for the y-part, which we wants to be 1 when both slopes are 0 and then gradually lower as the other slopes get higher. For this algorithm we set it to 2 though since we want to get the rid of the division with 2 (which means multiplying all axes by 2). But a problem remains, and that is that actual height value is not always in the same units as the heightmap texels spacing. To fix this, we need to add a multiplier to the y-axis, which is calculated like this:

height_texel_ratio = max_height / unit_size

I save the heightmap in a normalized form, which means all values are between 1-0, and max_height is what each value is multiplied with when calculating the vertex y-value. The unitsize variable is what a texel represent in world space.

This algorithm is not that exact as it does not not take into account the diagonal slopes and such. It works pretty nice though and gives nice results. Here is how it looks when it is shaded:

Note that here are some bumpy surfaces at the base the hills. The is because of precision issues in the heightmap I was using (only used 8bits in the first tests) and is something I will get back to.

Geomipmapping
The basic algorithm is pretty simple and is basically that the longer a part of the terrain is from the camera, the less vertices are used the render it. This works by having a single grid mesh, called patch, that is drawn many times, each time reperesenting a different part of the terrain. When a terrain patch is near the camera, there is a 1:1 vertex-to-texel coverage ratio, meaning that the grid covers a small part of the terrain in the highest possible resolution. Then as patches gets further away, the ratio gets smaller, and and grid covers a greater area but fewer vertices. So for really far away parts of the environment the ratio might be something like 1:128. The idea is that because the part is so far off the details are not visible anyway and each ratio can be a called a LOD-level.

The way this works internally is that a quadtree represent different the different LOD-levels. The engine then traverse this tree and if a node is found beyond a certain distance from the camera then it is picked. The lowest level nodes, with the smallest vertex-to-pixel ratio, are always picked if no other parent node meet the distance requirement. In this fashion the world is built up each frame.

The problem is now to determine what distance that a certain LOD-level is usable from and the original paper has some equations on how to do this. This is based on the change in the height of the details, but I skipped having such calculations and just let it be user set instead. This is how it looks in action:

White (grey) areas represent a 1:1 ratio, red 1:2 and green 1:4. Now a problem emerges when using grids of different levels next to one another: You get t-junctions where the grids meet (because where the 1:1 patch has two grid quads, the 2:1 has only one) , resulting in visible seams. The fix this, there needs to be special grid pieces in the intersections that create a better transition. The pieces look like this (for a 4x4 grid patch):

While there are 16 border permutations in total, only 9 are needed because of how the patches are generated from the quadtree. The same vertex buffer is used for all of these types of patches, and only the index buffer is changed, saving some storage and speeding up rendering a bit (no switch of vertex buffer needed).

The problem is now that there must be a maximum of 1 in level difference between patches. To make sure of this the distance checked, which I talked about earlier, needs to take this into account. This distance is calculated by taking the minimum distance from the previous level (0 for lowest ratio) and add the diagonal of the AABB (where height is max height) from the previous level.

Improving precision
As mentioned before, I used a 8bit texture for height for the early tests. This gives pretty lousy precision so I needed to generate one with higher bit depth. Also, older cards must use a 32bit float shader in the vertex shader, so having this was crucial in several ways. To get hold of this texture I used the demo version of GeoControl and generated a 32bit heightmap in a raw uncompressed format. Loading that into the code I already had gave me this pretty picture:

To test how the algorithm worked with larger draw distances, I scaled up the terrain to cover 1x1 km and added some fog:

The sky texture is not very fitting. But I think this shows that the algorithm worked quite well. Also note that I did no tweaking of the LOD-level distances or patch size, so it just changes LOD level as soon as possible and probably renders more polygons because of the patch size.

Next up I tried to pack the heightmap a bit since I did not want it to take up too much disk space. Instead of writing some kind of custom algorithm, I went the easy route and packed the height data in the same manner as I do with depth in the renderer's G-buffer. The formula for this is:

r = height*256
g = fraction(r)*256
b = fraction(g)*256

This packs the normalized height value into three bit color channels. This 24 bit data gives pretty much all the accuracy needed and for further disk compression I also saved it as png (which has non-lossy compression). It makes the heightmap data 50% smaller on disk and it looks the same in game when unpacked:

I also tried to pack it as 16 bit, only using R and B channel, which also looked fine. However when I tried saving the 24bit packed data as a jpeg (which uses lossy compresion) the result was less than nice:

Final thoughts
There is a few bits left to fix on the geometry. For example, there is some popping when changing LOD levels and this might be lessened by using a gradual change instead. I first want to see how this looks in game though before getting into that. Some pre-processing could also be used to mark patches of terrain that never need the LOD with highest detail and so on. Using hardware tesselation would also be interesting to try out and it should help add surfaces much smoother when close up.

These are things I will try later on though as right now the focus is to get all the basics working. Next up will be some procedural content generation using perlin noise and that kind stuff!

And finally I willl leave you with a screen container terrain, water and ssao:

Friday, October 22, 2010

Pre-pass lighting redux

Introduction
After writing the previous post on pre-pass lighting I started doing some tests, to see how it compares to the old deferred renderer. The results that I got where pretty interesting, so thought I might as well share them. Also note that this post might be a bit more technical than the previous.

The good thing with these renderers is that they both share the basic material data. So I can use the same data for both HPl2 and HPl3. HPL3 comes with the few more features for decals but for tests, it is easy to just skip them. When setting up the test I went with a very simple scene, it just the same box model rendered several times, a floor and lights. Some times it is best to test with proper game scenes, but I wanted to something that could be easily tweaked and gave simpler output. This means that the tests are not 100% accurate of in-game performance, but even testing a level in game is not that, as framerate varies a lot depending on where in a level one looks. So usually benchmarking has some kind of fly-through, but that is of the scope from what I intended to do.

Note that HPl2 test was built in Visual Studio 2003, while HPL3 uses the 2010 version. I do not think this should matter much though, even if the optimization routines differ, simply because pretty much all of the work is done on the GPU. The graphics card I did all my testing on is a Radeon 5850 HD (and others where tried for some tests). And as a final note, all of the data is given as average frame time (in milliseconds!) and not as frames per second. As Emil Persson points out, FPS is not a very good way to compare performance.

Test #1
Now with my setup details out of the way, let's get down to the details. I first started out with a scene like this:

1 x box, xz-plane floor, 1x spot light + shadow
which game me the following results:
HPL2: 0.78ms
HPL3: 0.84ms
Difference: +7.7%
This means, that given a simple scene like this the old renderer is actually faster! This is not that strange though since the scene does not have many lit screen pixels, most of the image being sky. Thus, the extra pass extra made with the pre-pass renderer matters more than an lighting speed-ups. Also, the decrease in draw buffer (3 to 2) in the g-buffer does not make up for the extra pass.

Test #2
4000 x boxes, 1 x point light, x-z plane floor
HPL2: 14.9
HPL3: 18.5
Difference: +24%
As expected when there is a lot of things to render, the pre-pass lighting is even slower. That extra pass shows on the performance. Remember though that 4000 objects is quite much and an important thing for good performance on GPUs is to have as few draw calls as possible.

Test #3
1 x boxes, 1000 x point light, x-z plane floor
HPL2: 30.0
HPL3: 29.2
Difference: -2.7%
As noticed, once the scene is filled with lights, pre-pass lighting is faster, but only so by a slight amount. Especially considering the large amount of lights. (I later realised that the actual lit screen pixels where quite few, something fixed later on in test #5).

Test #4
4000 x boxes, 1000 x point light, x-z plane floor
HPL2: 47.5
HPL3: 52.0
Difference: +10%
Doing a really stressful test (the number of lights and objects are really large) it seems like the old deferred renderer wins out. This was actually a bit unexpected and dissappointing to me as I thought that the pre-pass lighting should not be this far behind. But taking the little difference in test 3 into account, it is not that suprising. Still, after these tests it is clearly shown that pre-pass lighting is far from a giant speed up compared to deferred shading and it actually seems slower in most cases.

I also tried to skip the early-z pass for pre-pass lighting (I use early-z in both renderers on all other tests). This is basically a pass where the z-buffer is set up, and makes sure later passes only draws visible pixels. From reading Crytech papers, it does not seem like the the Crysis 2 engine has this though (and same seems true for other engines), so I tried to do a quick and dirty test of not using it and got this data: 48.7 (+2.5%)
This means that even without the early z test, the pre-pass was still slower. However, I did not attempts to reduce overdraw (like sorting front to back) and it might be possible for optimizations here. However, when rendering front to back, there will be a lot more state switching as you cannot sort according to texture, etc as efficiently, so I wonder if the data might not even be worse in a more realistic scenario.

I also tried this test out on a few other other cards (again with full early-z testing):
Geforce 240gt: 125, 137 (+9.6%)
Geforce 320M: 240, 240 (+/- 0%)
This gave the indication that on some cards pre-pass might actually be better, and that it might not be as clear-cut as the first tests seemed to show.

As a final variation on this test, I added illumination maps to all textures, a feature that requires an extra pass in the old engine. I also removed the height map rendering. This gave me: 50.6, 50.0 (-1.2%)
This is a very tiny speed up considering that the methods now have the same amount of passes and that pre-pass lighting has faster light rendering and a smaller g-buffer.

Test #5
488 x boxes, 30 x point light, x-z plane floor
Radeon 5850 HD: 7.4, 7.8 (+5.4%)
Geforce 240gt: 18, 19 (+5.5%)
Geforce 320M: 50.0, 45.5 (-9%)
Geforce 9800gtx: 9.5, 9.5 (0%)

In this test I change to a more realistic number of lights and draw calls. I also aligned the lights so the lit pixels covered the entire screen, which I did not do above. As can be seen, on my computer (the 5850) deferred shading still wins, but on a less powerful card the pre-pass lighting is much faster. This difference might be a bandwidth issue and some cards might have problems pushing the data amounts required for deferred shading.

I also did a tweak to this test and turned down the number of draw calls a bit:
316x boxes, 30 x point light, x-z plane floor
Giving: 6.4, 6.6 (+3%)
This further reduced the difference and if I did the hackish removal of early z, pre-pass lighting plunged down to: 5,2 (-18%)
Even though this removal of early z is not very realistic, the results show that I need to investigate it. Something I will do once I get a more proper scene up and running.

Finally, I also tried to give all the boxes illumination (and turning back on early z test):
6.8, 6.6 (-2.9%)
This clearly shows how you get the illumination almost for free in pre-pass, and that it costs a bit more with the deferred shader. This is not surprising though, given that it requires an extra pass, but hints that further effects can be more efficiently implemented when using pre-pass lighting.

Conclusions
The tests clearly show that my previous assumption that light rendering in pre-pass lighting would be much faster was incorrect. It is a bit faster, but only noticeable so when really stretching the limit and then only by a small fraction. This makes me conclude that one should not use pre-pass lighting to have faster light rendering. However, as can be seen on the test with the Geforce 320M, the pre-pass lighting technique matters a lot more on older hardware, and it might actually be of greater use there.

There is not any vast differences in the techniques though and instead the choice should be based on other merits. Given that pre-pass lighting allows for so much more variety in materials, I will keep it for HPL3, but I will not be expecting any rises in framerate anymore.

I hope this post will prove useful for those who are thinking of using either rendering method, and for the rest it might be an interesting insight on how testing is done (at least how I do it). Again, sorry for the lack of pretty picture, which I promise to make up for!