Sunday, November 28, 2010

OpenCL, first iteration

OpenCL is a general purpose computing framework. It is designed for parallel computing so it runs very well in modern GPUs, which are parallel by nature. The whole idea is to have a single "job" executed by a large number of threads at the same time.

Voxels are a natural fit for OpenCL. Many voxel algrorithms run locally, meaning that the algorithm only needs to look at one or a few voxels at a time. There are no dependencies between a voxel and other voxels far away.

I already had an implementation of Dual Contouring for the CPU which would be a good start for a port to OpenCL. A straight port was out of the question, almost in every case OpenCL will require you to re-think your problem from a new angle. Still, having the traditional CPU implementation gave me a good idea of the issues I needed to address.

This first iteration was only to test the waters. I wanted to see if a naive approach would produce acceptable results. The bare minimum I could do was:

  1. For each point in the voxel grid, evaluate the 3D field function
  2. For each edge in the grid, see if the edge crossed the surface and compute the intersection point
  3. For each voxel in the grid, see if its edges had an intersection and compute the inner point for the voxel

I could then create the resulting mesh in the CPU. I would need to iterate through all the edges again, if an edge was crossed, I would output a quadrilateral containing the inner points for the four neighboring voxels.

From the start I saw this would require to run three different "jobs" or "kernels" as they are called in OpenCL-speak. One to evaluate the points in the grid, another for intersecting edges, and one last for computing inner points.

A kernel is like a traditional C function that takes multiple parameters. Some parameters are inputs, some are outputs. In most cases, arrays are used to move data in and out.

For instance, the first kernel for computing the grid function would take an array of floats as an input. Then it would compute one single value for the function and put it into the array, making sure the value was put in the correct index. Why only one value? This is where the parallel way of thinking comes in: this same kernel will run for each point of the array, actually hundreds of them may be running at the same time. OpenCL will spawn as many instances of your kernel as necessary so the entire array can be covered. 

It did not take long to implement this version. The results were good, but it was only a few times faster than the CPU implementation. I knew this was not good enough. So what was the problem with the naive approach? It was not difficult to see it was wasting a lot of processing power.

The CPU implementation, being serial by nature, had the chance to skip further processing edges, voxels and points when it realized they were not crossed by the surface. For instance, it would compute the inner point of a voxel only if any of the voxel edges was intersected by the surface. If there was no intersection, the next CPU cycles would be spent processing the next voxel. 

The key in the sentences before is the word "If".

With OpenCL and parallel computing in general, "if" statements won't save you any processing, unless you can guarantee the same outcome for every time the if statement is evaluated. Since every thread is running the same code in parallel, it saves very little when you realize that no further processing is needed and the function can just bail out. It is already too late because a thread was allocated to the task anyway. 

The naive approach was doing an awful amount of work that was not needed at all. If I wanted really complex field functions with many octaves of Perlin noise, or distance fields to point clouds for rocks and vegetation, I would have to do better. 

I was also solving Quadratic Error Functions to compute the location of the points inside the voxels. This required normal vectors for each intersection point. The way to compute these normals is by using the gradient of the 3D field in the point, which meant evaluating the field function another four times. The amount of computation was staggering. I clearly needed to optimize the inputs of my OpenCL kernels, so they would look only to the elements that were contributing to the surface.

This realization took me to the second iteration using OpenCL, which will be the subject of my next post.





Monday, November 22, 2010

Space Warps and Octrees

When you think about voxels, the image you see in your mind is that of the evenly sized stacked boxes:


This arrangement gives equal importance to every point in space. In practice, however, space is not always that interesting. Actually most of it is either empty or entirely solid and thus impossible to see. If all space is not the same, why voxels should be?

If we could have only voxels in regions of interest, then we would save a lot of memory and processing time. There are two different techniques to address this.

The first is Space Warps. In homogeneus space voxels are perfect cubes. Each edge measures exactly the same. What would happen if we could place voxel corners anywhere we like?


While the total number of voxels did not change, they now cover the same space in a different way. Voxels in the center are smaller, voxels to the outside are bigger. Since the topology of the voxel grid did not change, it is guaranteed that the same algorithms that run in a regular grid will still run here. Marching Cubes and Dual Contouring, for instance, remain the same.

The advantage is that for the same processing cost and memory footprint, we now have more detail allocated to the voxels in the center. If we could pinch space around the features we consider interesting, and relax it around empty air or entirely solid regions, we would end up with improved triangle resolution at no cost.

The second technique is Octrees, to be more specific: Sparse Voxel Octrees. Sparse Voxel Octree

As we saw before, for most applications we are only interested in the surface of voxels. That is where things stops being solid and the empty space begins. Octrees are ideal for representing that. 

Above all an octree is a way to divide space. You start with a box, which would encompass everything you want to represent. If you need more detail in the box, let's say a surface is going trough it, you then divide it into eight boxes. Then you look at each of the boxes and you ask the same question again, is there any detail in here. If there is, you break the box into another eight smaller boxes and ask the question for each one of them.

Eventually you will have a lot of small boxes in the regions where there is detail, and a few large boxes that contain only air or solid. 

You can use Quadratic Error Functions (QEF) to simplify boxes even more. If you find that the error of collapsing eight boxes into its parent box is small enough, you can just discard the smaller boxes. 

Each terminal node in the octree, a leaf, can  be considered a voxel. With little modifications, Dual Contouring can be run on those voxels. The main difference is that with a regular grid, Dual Contouring would run on every edge. Now with the Octree, some edges that belong to a larger voxel may also be shared by a smaller voxel. It must be made sure that only the small edges are considered.

The results are quite impressive. Instead of producing a surface that is evenly covered by triangles, you now see that areas of interest have a lot of triangles while flat and less detailed areas only have a few.

So which of these techniques did I choose to implement? It was not simple. I was not entirely happy by some distortions I was seeing in my tests with Space Warps. On the other hand, since I wanted to use OpenCL, I knew it would be cheap to compute many points at once, so a brute force method could end up being faster than using an octree. I was especially concerned about memory coherence if I moved the contouring to an octree.

In my future posts you will see which approach made the cut, and which one did not.


Friday, November 19, 2010

From Voxels to Polygons

Voxels are just a way to store 3D data. Somehow they need to be visualized. At this point you have two avenues: use volumetric rendering, or extract a surface out of the voxel data and display it using a traditional triangle mesh.

Assuming there is enough processing power on the client, probably the best is to use ray tracing and volumetric rendering from a Sparse Voxel Octree. In the long run this may become the predominant method. With generic GPU computing frameworks like OpenCL and CUDA a paradigm shift at the hardware level is not necessary anymore. The migration will happen as OpenCL/CUDA cards become the norm.

Content creation tools will catch up quickly. There are already some impressive tools that work directly on voxels. Check out 3D Coat if you haven't. Then it will be bye-bye to polygonal meshes, UV-Mapping, texture unwrapping and other oddities of the raster world.

But it is too soon to ditch polygons. What if you want your engine to run on mobile devices, tablets, WebGL, or even just plain old PCs? If that is your case, it makes more sense right now to keep voxels on the content-generation side and use traditional triangle meshes on the clients. That is what I intended to do.

As I described in an earlier post, I tried extracting the surface out of the voxels by relaxing the voxel grid until it became smooth. It produced fairly good results, but not good enough for every case. I needed to look at other methods for extracting surfaces. At this point I had given up on simplistic binary voxels where each cell would be entirely solid or entirely empty. I knew I had to keep different voxel densities.

There was Marching Cubes. This is the grand-daddy of all surface extraction algorithms. This method is so old that computers were really slow when it was invented. It is very simple to implement, it is all based on table lookups so it lends very well for GPU implementation.

The general idea is that for any voxel there is a fixed number of ways a surface may cross it. Based on the density values at the voxel corners, it possible to know if the surface is crossing the edge between the corners. There is a finite number of combinations, as the image below shows:


I borrowed this image from a nice article in GPU Gems 3. If you are interested in Marching Cubes and GPU terrain generation, this is one you should read.

Marching Cubes biggest problem is that it cannot produce sharp features. Everything looks blobby coming out of Marching Cubes. Since I wanted sharp corners, especially for architecture, Marching Cubes was not a good option.

Why so keen on sharp corners? It is not only that they look good. If you remember my goals, I needed to stream geometry to clients. Having sharp features from the onset would provide the best simplification for the meshes sent to the client. Marching Cubes creates too much noise around sharp features, it would never translate into clean meshes.

The lack of sharp features in the resulting surface is caused by all the guessing Marching Cubes does. While it may compute the right intersection points along the edges, it knows nothing about the behavior of the surface inside the cube. The bigger the voxel resolution, the larger this error becomes. You can see this in the following image:
This image I borrowed from another nice article about isocontouring. This one has everything you need to know, even source code.

What if we could know more about the surface's behavior inside the voxel, predict where it goes? If, in addition to the intersection point, we take into account the direction of the surface at that point, it is possible to have a much better idea:

The first section (a) shows the output of Marching Cubes. If there had been a sharp corner in any of the voxels, it would have been smoothen out. The second section (b) shows that if you know the normal vector to the surface at the intersection point, you may still be able to detect a sharp corner inside the voxel.

Marching Cubes is a not good framework to add this since it can only consider points that lay along the voxel edges. In this case we clearly need to position points anywhere inside the voxel's space. There is another method that does exactly this: Dual Contouring.

Dual Contouring looks at each edge of the voxel at a time. If the corners around  the edge have different signs, meaning one is inside the volume and the other is outside, the surface is crossing the edge.

Now, each edge has four neighbor voxels. If the surface is crossing the edge, it also means that it necessarily is crossing the four neighbor voxels. You can say there is at least one point inside each voxel that lays in the surface. If you then output a quadrilateral using the points for the four neighboring voxels, you will have a tiny square patch of the surface. Repeating this for every edge in the voxel data produces the entire surface.

You still need to find the position of this point for each voxel. The best way is to look at all the edges for the voxel and see which ones are intersected by the surface. Then, using the surface normal at each intersection point, you can predict where all these intersections converge inside the voxel. For that it is best to use Quadratic Error Functions (QEF). At this point you may have multiple planes crossing different edges of the voxel, and QEF will find a point that is closest to the intersection of all these planes.

If you want more information on this method, you can check this article.

So I implemented my own Dual Contouring with QEF. The results were encouraging.  The method was fast, and still produced very nice sharp features. All the videos I have posted in YouTube were obtained using this method.

In a future posts, I will describe how I represented everything in the world as a 3D field so I could get surfaces out using Dual Contouring. I will also describe how I moved the Dual Contouring implementation to the GPU using OpenCL. Stick around for that one, you may find it interesting.

Wednesday, November 17, 2010

Doughnuts and Coffee Mugs

My first attempts to extract a smooth surface out of the voxels had failed. Relaxation was not good enough to preserve the features of traditional meshes once they were converted into voxels. Maybe it was a sign that I should move away from voxels?

There was another advantage in voxels that was key for the results I had envisioned. I will take a break from the surface extraction topic to discuss this.

It is mainly about doughnuts and coffee mugs. But I will come to them later.

The number of pixels in the screen is constant. Objects, when viewed in perspective, appear in different size depending on how close they are. A rock next to the viewer may take 1000 pixels on the screen. The same rock when viewed from a distance, may be only 10 pixels. It would take 100 rocks the same size to fill the same screen space a single rock occupied when viewed up close. It is fairly obvious that in a perspective view objects multiply in the distance.

This is a big problem for most 3D engines. There are a few tricks for this, like a dense fog that eats everything in the distance, or popping in the objects in when they are close enough. In most cases engines use a Level Of Detail (LOD) system, which switches to low-resolution models for objects far away. Still, this is not enough to have details preserved at large distances.

I wanted to have the horizon line very far away, and I wanted all the complexity of the world visible even at that distance. If there was a village a couple of miles away, I wanted the windows in the houses to appear as tiny dots, and still be able to see all the walls and roofs for the entire village.

Current terrain engines are able to display very large expanses of land using a technique called Clipmaps. The scene around the viewer is broken into multiple tiles. The tiles get bigger depending on how far they are from the viewer. A tile next to the viewer may be just a few square meters, but a tile next to the horizon line may be kilometers wide.



The nice thing about clipmaps is that each tile, no matter where it is located, has roughly the same amount of information. Due to the perspective projection tiles far away are shrunk. Even if they cover a larger expanse of terrain, they end up taking the same space on screen than nearby tiles. As a result, it is possible to cover very large areas at a fairly constant degree of detail.

This technique is normally restricted to the terrain. Any objects on top of it, like vegetation and architecture, are not part of the clipmap. This is where the traditional LOD approach comes in, but it is not good enough to handle all the instances necessary for a distant horizon line. Distance to horizon is severely restricted an the benefits of the terrain clipmap go to waste.

I wanted to use clipmaps for all the virtual world's geometry. The classical clipmap algorithm is 2D. Terrain is usually represented as a heigthtmap. In my case I would need to extend clipmaps to 3D. There are a few non-trivial issues there, like how to generate seams between tiles of different resolution, but I will address them in a future post.

I set my tile resolution at 64x64x64. Each unit would be a decimeter for the tiles closest to the viewer. That would give tiles of 6.4 meters. The next level tiles would be twice that length, 12.8 meters. If I wanted the horizon at 5km, then some of the tiles would be one kilometer length and even more.

This all sounds very nice in theory. Now remember that each tile, no matter its size, must have roughly the same amount of information. I could easily see my small 6.4 meter tile fitting in a budget of 1000 triangles, but how can I fit an entire one cubic kilometer of my world into 1000 triangles and still make it look good?

This is where the doughnuts and mugs come into play. Traditional mesh optimization methods can reduce a lot the number of polygons in a mesh, but they either cannot simplify the topology of the mesh at all, or are not very good at that. An example will help:

Let assume we have a large wall with a small window. The window in this case is just a square whole in the wall. Traditional mesh simplification methods collapse edges in the mesh until the desired triangle count is reached. Still, no matter how much you crank the optimization up, the algorithm will preserve the hole in the wall. If the algorithm could remove the hole -it is a tiny hole anyway- it could save a lot of triangles.

The problem here is that the topology of a wall is different than the topology of a wall with a hole. It is not like the doughnut and the mug, which are topologically identical:

This is what mathematicians call the Genus of the object, which is probably a fancy way to say how many holes a solid has. Traditional mesh optimization cannot cross that barrier, they cannot close holes in the solid to make the object simpler. There are other mesh optimization methods that use clustering of vertices. While they do modify the topology of the mesh, it is hard to control the results.

But if we optimize in voxel space, then it is adifferent story. Voxels can be filtered, summed up and transformed in many advantageous ways. In the example of the large wall with the small window, the voxels of the wall eventually would take the space of the window, naturally filling up the hole.

I knew voxels would provide an edge when it was time to synthesize the larger tiles of my clipmap. I just needed a better way to extract the surfaces out of them.

Sunday, November 14, 2010

Just Relax

It was settled then: I would use voxels to represent everything in my virtual world.

I knew I had to pick a voxel resolution that was small enough to handle the detail I wanted, but still coarse enough so the data volumes were manageable. I chose one tenth of a meter, which is close to four inches. That was good enough for rocks, boulders, even architectural elements like stairs and columns.

It is actually possible to have details smaller than the voxel resolution, like a tree branch with a diameter of one inch. The catch was that the next non-empty element, the next branch for instance, would necessarily be separated by twice the voxel resolution. So I could have two one-inch diameter sticks, but the closer they could ever be was eight inches. You can look into the Sampling Theorem if you want to understand where those limits come from.

Those were good enough for the basic geometry of the world. Smaller details would come from traditional texturing. That is normal and parallax mapping, and even instancing for tree leaves and grass. 

The following image will give you a sense of the amount of detail that can be achieved with this resolution, at least for terrains. The white ball near the top is one meter diameter.


But voxels are square by nature. A naive rendering of my voxels would result in conspicuous four-inch boxes. Somehow the boxes needed to become smooth and shape the world in a realistic way. The boxes, as seen below, needed to turn into the sphere:


This is a very old problem. Medical scans produce 3D data which is very similar to voxels. There are several algorithms that would take the 3D data and render it as a series of triangles. This problem in general is referred to as isosourface extraction, which means obtaining a surface out of a three dimensional field.

The best known of these algoritms is Marching Cubes. Other similar algorithms are Marching Thetraedron and Dual Contouring. The idea behind them is simple, and there are many implementations available. They look at different corners of the voxel and depending if the corner is solid or not, they discover that the resulting surface is actually crossing the voxel somewhere. The surface fragment is output and the algorithm proceeds to look at the next voxel.

My problem with these algorithms was that in order to guess the right position of the surface, you not only needed to know whether a voxel was solid or not, you also needed to know its "density". The virtual world was not a clean cut between black and white anymore, it was all shades of gray. In that sense, at a given point you may have 30% air and 70% gravel. 

This is not trivial. It is like comparing a raster image with a vector image. With the raster image, like a BMP, everything is simple. Each pixel has one distinct color. If you want to modify the image, you just change the color value of the pixel and that's it. With a vector image, you never have individual points, just a series of mathematical functions that evaluate to shapes. If you want to modify the image, you either alter the parameters of the existing shapes or add new ones.

If I wanted to use any of those algorithms, I would have to represent everything in my virtual word as a mathematical function. Or at least, I would need to provide a diffuse boundary to all the voxel construction tools I might devise. I didn't like that. I hoped for an approach in which it was realy simple to modify the voxel.

So I found a simple way to come up with the surfaces. I realized that if each voxel corner was moved to the averaged position of its neighboring corners, the voxel grid would soon converge into a smooth approximation of the surface. It would take only seven iterations to produce the sphere in the earlier image.

This method was even able to produce sharp boundaries. The contribution of neighboring voxels to the averaged position depended on material type. As the grid relaxation increased on each iteration, tension lines would naturally appear in the boundaries between different materials. 

The following image shows the results of this method:


The material boundaries appear as red voxels in the section at the left. As you can see, they translate into sharp features once the surface is generated.

Since it was so easy to work with the voxel, I soon found myself adding noise, and then applying convolution filters to the voxels. It was a very simple way to get interesting results. I even implemented an L-system for some complex volumes like the shell below:

But it did not take long before I discovered the relaxation approach would not take me as far as I wanted to go. I started turning triangle-based geometry into voxels and the results were never good. I was creating the models in traditional 3D software like 3ds max and then importing them as voxels. I was getting a lot of noise I could not deal with. It seemed that black and white would not cut it, that I really needed to consider voxel density.

I dumped all the relaxation code and started looking at the voxels from a fresh perspective. I was back to square one.


Saturday, November 13, 2010

I see a Voxel in your future

So I was looking for a good way to represent a 3D virtual world. Most virtual worlds, like the ones in massively multiplayer games, or even traditional game level scenes, are built using textured triangles. It is a simple and quick way to get very good results. This has been the method of choice since the early days of real-time rendering, graphics hardware has evolved so the entire work can now be done on the graphics card.

Content creation tools also evolved around the idea of textured polygons. There are very mature tools that will let you control the position of each vertex in your model, and the color of ever single pixel.

For my project I wanted to bypass the artist. None of the existing tools made much sense in my case. The expanse of terrain was so large, it would take many man/years to create by hand. If my project required a high degree of proceduralism, why then not go entirely procedural?

For this reason I chose to use voxels to represent everything in my virtual world. If this word is new to you, you can figure it out by extending the concept of a pixel. A pixel is one tiny dot in your screen, or in one image. Any image you can say it is composed of a two dimensional table, each cell of the table is a pixel that describes what color the image is at that location.

A voxel is a similiar concept, but in 3D. If you  think about it, voxels are a much more natural way to represent things in 3D than textured triangles. You may argue that the entire world is a big three-dimensional matrix, each cell of the matrix is a voxel that describes what type of element or material is there: air, water, wood, gravel, etc.

The problem with voxels is that you need way too many of  them to get results comparable to textured triangles. Also, the video hardware has evolved in a way that does not favor voxels. For that reason I chose to have an hybrid approach: Use voxels to create the world, but then translate the voxels into textured triangles so they can be quickly rendered in average systems.

The following picture shows this concept:


The first section shows the voxel. As you can see it looks like a series of stacked boxes. Each box is a voxel. The next section shows a triangle mesh. This mesh is obtained from the voxel. It is comparable to what an artist would have created using polygon modeling tools. The last section shows the same mesh, now textured.

At this point my task seemed pretty clear. I just needed to generate the world as voxels and then create the algorithms to translate those voxels into traditional textured triangles.

It may sound simple, but it was not. In the next post I will describe my first failed attempt.

Friday, November 12, 2010

This blog will document my attempts in creating a fully procedural world. It is a hobby, so there is no promise that any of  this will be ever released.

If you are into computer graphics, massive online games, digital botany and architecture, voxels and parallel computing, you may want to stick around and check for updates.

This is a screen capture of the progress I have made so far:



It is still far from where I want to be, but I'm happy with the results seen here. You can find more videos on my YouTube Channel page: http://www.youtube.com/user/MCeperoG

A few years ago I started wondering how far you can go into creating a fully procedural world. This is a recurrent idea. It has been implemented with some success many times before. Still, I was not satisfied with what I could find. The procedural worlds out there seemed too repetitive, and their generation patterns soon became evident.

The real world itself is full of ever-repeating patterns, but in this case we take them as natural. I had the feeling that maybe the techniques were right, that the problem was just the degree to which  they were applied. As it appeared, it would take many layers of complex patterns to make a single believable world. The current implementations were not rich enough to trick the eye.

My goal was to create a rather large virtual environment, around 100 square kilometers, and allow people to connect and explore it over the Internet. Maybe play a game on it. It would cover not only different types of terrain, but also vegetation and architecture. And I wanted all that generated with a minimum of artistic input, but still be visually appealing.

So I started by devising a framework that could support the amount of detail I felt was necessary. Geometric detail could go as fine as one 10 centimeters (~4 inches), and the texturing maybe a few texels per square centimeter. The texturing of the world would be unique, meaning that each texture pixel would be mapped exclusively to one place in the world's geometry.

Considering the world size, this would translate into a few Terabytes of geometry and textures. Soon it became evident that I would need to stream the world's representation to the viewers as they moved. The sheer size of the world data made it prohibitive to pack  it as a one-time download.

All these requirements shaped the solution I chose. In my next post I will introduce the basic building block that made all possible.