fredag 11 september 2015

GPU programming in Haskell using GPipe - Part 2


< Previous episode: Hello triangle

Welcome back! In last part you got your first triangle through GPipe. This time we are going to examine Buffers and PrimitiveArrays in more detail.

Buffers

In the "Hello world" example we made last time, we used a Buffer to store the triangle's positions and colors. From this buffer, we created a PrimitiveArray that we fed to the shader. Later on, we will see that Buffers can be used for other things as well.

A Buffer in GPipe is an array of data that is stored on the GPU. It is mutable just like IOArray or STArray and just like those it also lives in a monad, in this case the ContextT monad. Let's look at the function that creates buffers first:

newBuffer :: (MonadIO m, BufferFormat b) => Int -> ContextT ctx os m (Buffer os b)

A buffer has type Buffer os b, where os is the same as the ContexT's os. As you might remember from last time this os type parameter is used to keep context bound objects from leaving the monad, and Buffer is such an object.

newBuffer just takes one argument: the number of elements in the buffer to create. A buffer has mutable elements, but the number of elements is immutable. The type of the buffer's elements is denoted by b, and as you can see this b is constrained by the type class BufferFormat b. Before I show you that type class, let's look at the function you will use to fill your buffer with data from the CPU side:

writeBuffer :: MonadIO m => Buffer os b -> BufferStartPos -> [HostFormat b] -> ContextT ctx os m ()

This function takes a buffer to write to and a zero indexed position in the buffer to start at, nothing strange with that, but then it takes a list of HostFormat b... What's up with that? A buffer's contents doesn't have the same representation on the host as in the buffer, which lives on the GPU. (From now on I am going to use the term "host" when I mean the normal CPU-living Haskell world, as opposed to the GPU world.) HostFormat b is an associated type in the type class BufferFormat b. Let's take a look at that class:

class BufferFormat f where
  type HostFormat f
  toBuffer :: ToBuffer (HostFormat f) f

The sole purpose of this class is to provide a representation for the buffer elements' type on the host, as well as a conversion from the host to the buffer representation. Here are some examples of instances of this class, and their host representations:

f  HostFormat f
B FloatFloat
B Int32Int32
B Word32Word32
B2 FloatV2 Float
B2 Int32V2 Int32
B2 Word32V2 Word32
B2 Int16V2 Int16
B2 Word16V2 Word16
(a, b)(HostFormat a, HostFormat b)
V2 aV2 (HostFormat a)
There are many more, including B3, B4 and larger tuples. See the full list on hackage.

A Float on the host will become B Float in a Buffer. B a is an opaque type that you can't inspect the value of or do any calculations with, e.g. there is no Num instance for B Float. Buffers doesn't expose a way to apply functions on their elements anyway (e.g. Buffer has no Functor instance), but we will soon create VertexArrays from our Buffers and then this will become a subject.

GPipe also defines the B2 a, B3 a and B4 a types. For a selected set of as, B2 a is the buffer representation of V2 a on the host. V2 a is also an instance of BufferFormat with V2 (HostFormat a) as host representation, which means that both V2 (B Float) and B2 Float has the same host representation: V2 Float. Both these buffer formats have the same size and even internal layout, but the B2 Float version can be used more efficient as we will see later. For that reason, always try to use B-types over V-types in buffers when possible. Then why is there a BufferFormat instance for V2 a at all? The main use case is matrices, e.g. V4 (V4 Float) on the host can be stored in a buffer as V4 (B4 Float).

Another interesting thing you may have noticed from studying the BufferFormat instance list is that there are B2 Int16 and B2 Word16 instances but no B Int16 or B Word16 instances. This is because vertex attributes has to be 4 byte aligned on some hardware, and GPipe enforces this through it's types. Int16 and Word16 are both 2 bytes, so you need to have a vector of at least two of them. There are actually B3 Int16 and B3 Word16 instances, but these will pad their data with 2 byte extra. The motivation for all of this is that you could always go for a B Int32 instead of a B Int16 if it existed, they would work with the same shaders and would just be the same size anyway if we had padded the latter. A B3 Int32 on the other hand take 12 bytes while a padded B3 Int16 only takes 8, so there is a distinct use case for that one. A B4 Int16 also takes 8 bytes, but that wouldn't work with the same shaders as will become evident in the next part of this tutorial.

Now let's look at the toBuffer member of the BufferFormat type class. It has the type ToBuffer (HostFormat f) f. ToBuffer is something that is called an arrow in Haskell. It is like a function (in this case HostFormat f -> f), but more general. Let's look at the BufferFormat (a, b) instance as an example:

{-# LANGUAGE Arrows #-}

instance (BufferFormat a, BufferFormat b) => BufferFormat (a, b) where
  type HostFormat (a,b) = (HostFormat a, HostFormat b)
  toBuffer = proc ~(a, b) -> do
                a' <- toBuffer -< a
                b' <- toBuffer -< b
                returnA -< (a', b')

Arrow notation almost looks like a lambda (using the special keyword proc) returning a monadic action. But this is not a monad. The main difference from a monad is that you cannot select action based on the arrow return values. This is why arrow actions have an arrow tail (-<); anything between the <- and -< of an arrow may not reference anything outside them (a, b, a' or b' in this case). This enforces that every invocation of toBuffer must go through the same series of arrow actions, independent on the values of the actual input data. Another additional requirement that GPipe has is that it needs to be able to produce values lazily, thus the tilde (~) in the proc pattern. The only ToBuffer arrow actions GPipe defines for you that you can use inside your own implementation of toBuffer are the other instance's toBuffer methods. You are going to see this pattern where an arrow is used to define a conversion between two domains appear in more places of GPipe as we continue through the tutorial.

Vertex arrays

Ok, you are now experts on buffers! Let's put them to some use:

newVertexArray :: Buffer os a -> Render os (VertexArray t a)

You run this function in a Render monad to create a VertexArray t a. A vertex array is like view of a buffer, and newVertexArray doesn't copy any data. Since we operate inside the Render monad (that is run by the render function, which doesn't allow return values) and Buffers can only be modified outside this monad (in the ContextT monad), conceptually you may think of a VertexArray as a copy of the Buffer. But it's really not. But treat it like one.

VertexArray t a is an array of vertices where each vertex is an element of type a, that is the same type as the elements of the Buffer you created it from. Don't worry about the type parameter t for now, I'll get to that in a bit. The VertexArray has as many vertices as there are elements in the originating Buffer, but in contrast to Buffers you may trim a VertexArray using the functions dropVertices or takeVertices. These works exactly like drop or take works on normal lists:

takeVertices :: Int -> VertexArray t a -> VertexArray t a Source
dropVertices :: Int -> VertexArray () a -> VertexArray t a Source

VertexArrays also has a Functor instance, which allows you to fmap over its vertices. This is when the opacity of the B-types I talked about earlier comes into play! Now that you get to do stuff with your B-values, you will notice that the options are rather limited. You will merely pick elements from structures like tuples and/or build new such structures with the values you have. There are however a couple of functions that operate on the B-values that you can use here:

toB22 :: forall a. (Storable a, BufferFormat (B2 a)) => B4 a -> (B2 a, B2 a)
toB3 :: forall a. (Storable a, BufferFormat (B3 a)) => B4 a -> B3 a
toB21 :: forall a. (Storable a, BufferFormat (B a)) => B3 a -> (B2 a, B a)
toB12 :: forall a. (Storable a, BufferFormat (B a)) => B3 a -> (B a, B2 a)
toB11 :: forall a. (Storable a, BufferFormat (B a)) => B2 a -> (B a, B a)

These may split B-vectors into smaller parts. Notice that there are no functions that can combine them again though.

You may also zip two VertexArrays together with the zipVertices function, which works exactly like zipWith on normal lists; you provide a function to combine the elements of the two argument VertexArrays and the resulting VertexArray will be the length of the shorter of the two input arrays:

zipVertices :: (a -> b -> c) -> VertexArray t a -> VertexArray t' b -> VertexArray (Combine t t') c

(Again, don't mind the strange first type parameter in the returned VertexArray, I'll explain that later.)

Zipping vertex arrays is what corresponds to using non-interleaved arrays in OpenGl, while a vertex array from a single buffer of a compound element type (such as a tuple of two B-values) corresponds to interleaved arrays. This is just the functional and type safe way to do it!

Primitive arrays

Now that you have trimmed, zipped and mapped your vertex array into perfection, it's time to create a primitive array. The simplest way to create one is with this function:

toPrimitiveArray :: PrimitiveTopology p -> VertexArray () a -> PrimitiveArray p a

You always need a primitive topology, besides your array of vertices, to create a PrimitiveArray. The primitive topology denotes how the vertices should be connected to form primitives, and is one of these data constructors:
data PrimitiveTopology p where
  TriangleList :: PrimitiveTopology Triangles  
  TriangleStrip :: PrimitiveTopology Triangles  
  TriangleFan :: PrimitiveTopology Triangles  
  LineList :: PrimitiveTopology Lines  
  LineStrip :: PrimitiveTopology Lines  
  LineLoop :: PrimitiveTopology Lines  
  PointList :: PrimitiveTopology Points

In most cases you will work with triangles. Lets look at how the three triangle topologies look like:

(a) - TriangleStrip, (b) - TriangleFan, (c) - TriangleList
(Image courtesy of OpenGl specification by Khronos)


In a TriangleStrip, every vertex forms a triangle with the previous two vertices, alternating the winding of the vertices for every other triangle. That means that the first triangle is formed by vertices 1-2-3 in that order, the next by 2-4-3, then 3-4-5, 4-6-5, and so on. For TriangleFan, every triangle is formed by the first vertex in the array together with every two consecutive vertices, in that order. For TriangleList, every three vertices simply forms a triangle; there is no sharing of vertices between triangles.

The vertices always comes in counter clock wise order for a triangle that is front facing (which means that all triangles but the right-most in the image above are back facing, just as an example on how intuitive the OpenGl specification can be). The facing of a triangle will matter later when we rasterize it, when you may choose to only rasterize front facing or back facing triangles.

Primitive arrays may not be trimmed like vertex arrays, but it does have a Functor instance so you may fmap over it just like with vertex arrays. It also has a Monoid instance, which allow you to concatenate two PrimitiveArrays together into one using mappend. This makes it possible to create a PrimitiveArray consisting of several disjoint triangle strips, but more efficient ways of achieving that are presented in the next two sections.

Index arrays

It is common that a vertex is used by not only two consecutive triangles in a strip, but also by triangles in another strip. It would be quite wasteful to duplicate all the shared vertices data for each strip, and for this reason you can use an index array instead:

toPrimitiveArrayIndexed :: PrimitiveTopology p -> IndexArray -> VertexArray () a -> PrimitiveArray p a

Instead of forming primitives from taking consecutive vertices in a VertexArray, this function will take indices from an IndexArray and use those to pick vertices from the VertexArray. Multiple elements in the IndexArray may refer to the same vertex. The primitive topology works the same for this function, but is applied to the IndexArray instead. For example if TriangleStrip is used the first triangle is formed by the vertices referred by the first three indices, the next one is formed by the second, fourth and third index, and so on.

You create an IndexArray with

newIndexArray :: forall os b a. (BufferFormat b, Integral a, IndexFormat b ~ a) => Buffer os b -> Maybe a -> Render os IndexArray

Almost like creating a VertexArray, but the type of the elements of the Buffer you create it from is also constrained by this type family:

type family IndexFormat a where
  IndexFormat (B Word32) = Word32  
  IndexFormat (BPacked Word16) = Word16  
  IndexFormat (BPacked Word8) = Word8 

This means that indices are either Word32, Word16 or Word8. Remember that I previously told you that all buffer element types needed to be 4-byte aligned? Index arrays actually require all elements to be tightly packed, but still supports indices of type Word16 and Word8. This means that buffers of those two element types cannot be used as vertex arrays. That's why the buffer representation of Word16 and Word8 are BPacked Word16 and BPacked Word8. They work exactly like their B-counter parts, with the exception that there are no VertexInput instances for any BPacked a. VertexInput is the type class that is used when creating primitive streams from primitive arrays, which we will do in the next part of this tutorial. As you might have guessed by now, the type family IndexFormat a evaluates to the same types as the associated type HostFormat a.

Besides a buffer of indices, newIndexArray also takes a Maybe a as an argument. This is denoting an optional primitive restart index, i.e. a special index value that if encountered in the index array while assembling primitives signals that the current topology should end and the next index to be the beginning of a new topology. This makes it possible to have multiple triangle strips in a single IndexArray by just separating them with a special index, which is more efficient than concatenating multiple PrimitiveStreams using their Monoid instance.

Index arrays may be trimmed just like vertex arrays, but with the functions takeIndices and dropIndices instead. It does not have a Functor instance (that wouldn't make sense) or a Monoid instance.

Instanced primitive arrays

The last thing I'll show you in this episode is instanced primitive arrays. Imagine that you want to create a triangle mesh of a temple, where you have many identical pillars placed in a row. Instead of duplicating the triangles for each pillar, or making a single pillar primitive array that you concatenate with itself multiple times, you can create an instanced primitive array. The function looks like this:

toPrimitiveArrayInstanced :: PrimitiveTopology p -> (a -> b -> c) -> VertexArray () a -> VertexArray t b -> PrimitiveArray p c

It resembles the zipVertices function in that it takes two VertexArrays and a binary function to combine the vertices of these two arrays, but toPrimitiveArrayInstanced doesn't zip the two arrays together. Instead, it will create one primitive array of the first vertex array for each vertex in the second vertex array, and concatenate the results. In our example with temple pillars, the first array then contains the strip for a single pillar, while the second array contains a position for each pillar to instantiate, resulting in a primitive array where each vertex contains it's individual position within the pillar, as well as the instances position within the temple. You would then need a shader that combined those two positions together into the final position. This is the most efficient way to render multiple instances of the same object.

If you want to use instanced primitive arrays and indexed primitive arrays at the same time, there is a function for that too:

toPrimitiveArrayIndexedInstanced :: PrimitiveTopology p -> IndexArray -> (a -> b -> c) -> VertexArray () a -> VertexArray t b -> PrimitiveArray p c

To make instancing even more powerful, you may replicate the vertices in one array a fixed number of times each and then zip it with another array and use the result as instances in toPrimitiveArrayInstanced. E.g. you could have a vertex array with three different colors and replicate each color 5 times and then zip it with an array of 15 positions and use this as instances to our temple to get 15 pillars colored in three different shades for variation. The function you use to do this is

replicateEach :: Int -> VertexArray t a -> VertexArray Instances a

This will replicate each vertex in the argument array as many times as dictated by the first argument. Notice the Instances type in the first type parameter of the resulting VertexArray. Maybe you have noticed that this parameter has previously been () or just t. If this parameter to VertexArray is Instances then the VertexArray can only be used for instances, i.e. as last argument in a call to toPrimitiveArrayInstanced or toPrimitiveArrayIndexedInstanced. If you go back and look at the types of functions taking or returning VertexArrays above, you will see that
  • replicateEach returns a VertexArray that must be used as instances.
  • dropVertices may not be used on any VertexArray that must be used as instances.
  • zipVertices returns a VertexArray that must be used as instances if either of the input arrays must be used as instances.
I was a bit unfair just now, because that last bullet was not something you could see from looking at the function type alone, you needed this definition as well:

type family Combine t t' where
  Combine () Instances = Instances  
  Combine Instances () = Instances  
  Combine Instances Instances = Instances  
  Combine () () = ()

Once you have your PrimitiveArray, the type information whether instancing, indexing or both were used is gone. This means that you may mappend an instanced primitive array with a non-instanced, and that the shader you send a primitive array to doesn't care if it was instanced or indexed.

No code examples this time, so I'll leave as an exercise to apply what you learned this time on the example from previous part. Next time we will finally enter the Shader!


Inga kommentarer:

Skicka en kommentar