onsdag 21 oktober 2015

GPU programming in Haskell using GPipe - Part 4


< Previous episode: Shaders and primitive streams

In this part I will show you how to work with textures and samplers in GPipe. But before that, we should take a closer look at the FragmentStream that was created from the rasterization process that was shown all the way back in part 1, since most use cases for sampling textures are with fragment streams.

FragmentStreams

To recap from part 1: The rasterize function is what creates a FragmentStream from a PrimitiveStream:
rasterize :: forall p a s os. FragmentInput a =>
  (s -> (Side, ViewPort, DepthRange)) -> PrimitiveStream p (VPos, a) -> Shader os s (FragmentStream (FragmentFormat a))

It takes a primitive stream with vertices consisting of a tuple (position, a), and creates a fragment stream of FragmentFormat a values. FragmentFormat is an associated type of the FragmentInput type class, which is yet another one of those classes that use an arrow to convert values from one domain to another. In this case, it basically only converts S V a to S F a. S F Float values are created from interpolating the primitive's corners' vertices, while S F Int or S F Word values are simply copied from the last vertex of the primitive. There is also a Flat data constructor that can be used to suppress interpolation of S V Float values. They will turn into S F Float values in the fragments by just copying the last vertex's value in each primitive.

Remember that the vertices' positions are expressed as homogeneous 3D coordinates, i.e. V4 (S V Float), due to perspective correction. In perspective correct interpolation, each S V Float value will first be divided by the position's 4th component (w), and then interpolated. The interpolated value will then be divided by the interpolated reciprocal of w. This means that if w is the same for all three vertices of a triangle, the interpolation will be a simple linear one. To circumvent perspective correction for a specific S V Float value, you can wrap it in a NoPerspective data constructor, then it will just behave as if w was constant.

Textures

A texture is just like a buffer a storage of data on the GPU, but textures are optimized for being indexed in more than one dimension, and can also be dynamically indexed using S-values in a PrimitiveStream or FragmentStream. When sampling a value from a texture, the GPU also provides native hardware support for interpolating between multiple elements.

Just like buffers, textures are objects that we create in the ContextT monad. There are six kinds of textures in GPipe:
  • Texture1D os a
  • Texture2D os a
  • Texture3D os a
  • Texture1DArray os a
  • Texture2DArray os a
  • TextureCube os a
Just as buffers, they are parameterized on the object space, meaning they can not escape the context. They also have an element type (a), but this is not as flexible as it is for buffers, which we'll see if we study the functions that creates new textures:
newTexture2D :: forall ctx os c m. (ContextHandler ctx, TextureFormat c, MonadIO m) =>
  Format c -> Size2 -> MaxLevels -> ContextT ctx os m (Texture2D os (Format c)) Source
(Throughout this tutorial I will mostly use 2D textures as an example, but analogous functions for the other texture types exists as well. E.g. there exists a newTexture3D as well, that takes a Size3 instead of Size2 as argument.)

The element type of the new texture is denoted by a value of the type Format c, which is a GADT in the Graphics.GPipe.Format module. If you for instance use the format RGB8 as parameter to newTexture2D, the texture will have type Texture2D os (Format RGBFloat). From that type we can read that it contains three components (RGB) that will be represented as Float values when we sample it. Internally, each component will be stored as an 8-bit fixed point value in the range [0.0, 1.0]. If we instead would have used RGB32F, the resulting texture would have had the same type, but internally would have used 32-bits floating point values for each component instead which would yield better precision but cost three times as much memory (RGB8 is most likely padded with an extra byte internally).

newTexture2D also takes a size in form of a V2 Int (aka Size2) and a max level count as an Int (aka MaxLevels). A texture contains extra levels of detail (LODs), which are lower resolution versions of the texture, where each next level of detail has half the size in every dimension of the previous level. If maximum number of LODs are used, the last level will consist of a single texel (another word for texture element). LODs are used if the texture is sampled at a low frequency. E.g. if the texture is mapped on an object that is far away it would cover fewer pixels on screen than the texture contains, and without LODs some texels of the texture would be skipped over, resulting in aliasing in the rendered image. Most of the times all levels are not needed, which is why newTexture2D and friends take a MaxLevels argument. Number of levels is still always limited by the size of the texture, which you will get if you use maxBound for the MaxLevels argument. If you don't want extra levels of detail, you use 1 as the MaxLevels argument. The documentation for DirectX has some nice pictures of how the LODs look like for all the different texture types (just look at the pictures and don't bother too much with the details of the text since it is a bit DirectX specific).

You can use texture2DLevels (and similarly for other texture types) to retrieve the number of levels a texture was created with. There is also a texture2DSizes that return a list of all the LODs sizes, starting with the largest size (i.e. the size that was provided to the newTexture2D function).

Lower LODs are usually a down scaled version of the previous LOD, but this is not enforced by GPipe. Instead you can write each LOD individually, using functions like writeTexture2D or writeTexture2DFromBuffer:
writeTexture2D :: forall b c h ctx os m. (ContextHandler ctx, MonadIO m, BufferFormat b, ColorSampleable c, BufferColor (Color c (ColorElement c)) h ~ b, h ~ HostFormat b) => Texture2D os (Format c) -> Level -> StartPos2 -> Size2 -> [h] -> ContextT ctx os m ()
writeTexture2DFromBuffer :: forall b c h ctx os m. (ContextHandler ctx, MonadIO m, BufferFormat b, ColorSampleable c, BufferColor (Color c (ColorElement c)) (HostFormat b) ~ b) => Texture2D os (Format c) -> Level -> StartPos2 -> Size2 -> Buffer os b -> BufferStartPos -> ContextT ctx os m ()

The Level argument in both of these selects which level to write to, using a zero based index (i.e. 0 selects the level of highest resolution). Next two arguments are a start position and size within this level to write to.

For writeTexture2D, you then provide a list of h values, where h is constrained by this little riddle: BufferFormat b, ColorSampleable c, BufferColor (Color c (ColorElement c)) h ~ b, h ~ HostFormat b. In short, h is the second parameter in any of the equations in the BufferColor type family where the first argument matches the texture's element type representation. E.g. for Format RGBFloat we will use any of the BufferColor equations that has (V3 Float) as first parameter, so in that case a [V3 Float] or a [V3 Word8] or any of the 7 others would work.

For writeTexture2DFromBuffer, you provide a Buffer instead of a list of values. This buffer needs to have an element type that corresponds to one of the results of the BufferColor equations where the first argument matches the texture's element type representation. In the Format RGBFloat example, a buffer of type Buffer os (B3 Float) or Buffer os (Normalized (B3 Word8)), among others, would work. For writeTexture2DFromBuffer, you also need to provide a zero based start index in the buffer to start reading from (to achieve the same effect for writeTexture2D you just use drop on the list you provide).

Most of the times you only write LOD level 0 using any of the writeTextureXXX functions, and then generate the other levels. For this you simply call the generateTexture2DMipmap action and friends.

A texture has two purposes in GPipe: To be sampled from and/or to be drawn to. The next part of this tutorial will be devoted to the latter, so let's look at sampling now.

Samplers

A texture can be sampled dynamically using a S-value from a PrimitiveStream or FragmentStream. But in order to do this, we need to create a sampler from that texture, that in addition to the texture also has a filtering and edge mode setting. You create a sampler inside the Shader monad by any of the newSamplerXXX functions. Here's how the 2D version looks like:
newSampler2D :: forall os s c. ColorSampleable c =>
  (s -> (Texture2D os (Format c), SamplerFilter c, (EdgeMode2, BorderColor c))) -> Shader os s (Sampler2D (Format c))

It gets all its arguments from the shader environment: a texture, a filter and a tuple of an edge mode and a border color. Let's look at the filter first.

The SamplerFilter c type is a GADT with two constructors:
data SamplerFilter c where
  SamplerFilter :: (ColorElement c ~ Float) => 
                     MagFilter -> MinFilter -> LodFilter -> Anisotropy -> SamplerFilter c
  SamplerNearest :: SamplerFilter c

If you have a texture with a Float component format (e.g. RGB8 or RGB32F) then you may use either of the two constructors, but if your texture has Int or Word components you may only use the SamplerNearest constructor. This will effectively use no filtering at all: at each sample, the lod that closest matches the sampling frequency will be used and the nearest texel of that LOD will be picked. Sampling frequency is calculated from how far apart adjacent fragments samples the texture. If the texture is sampled in a primitive stream instead there is no notion of sampling frequency, and as we will see soon the types of GPipe will in that case require us to either pass in an explicit sampling gradient or explicitly select a LOD.

When you use SamplerFilter on the other hand you have more options. For each sample you make, there will always be one or two closest LODs based on the sampling frequency, and for each LOD there will be two closest texels per dimension to choose from (e.g. for a 2D texture there will be 4 texels to choose from and for 3D there will be 8). MagFilter, MinFilter and LodFilter are all type synonyms for the enum Filter. The MagFilter is used when the texture is sampled at a too high frequency (i.e. the texture cover more pixels on screen than lowest LOD has texels for). Using Nearest in this case will simply pick the closest texel of LOD 0 while Linear will use linear interpolation of the closest texels. When sampling at lower frequency the MinFilter will be used, in conjunction with the LodFilter. If MinFilter is Nearest a single texel will be picked from each of the two closest LOD levels, while if it is Linear a value will be linearly interpolated for each of the levels. Then LodFilter will dictate whether one of the two levels' values will simply be picked or if the result should be interpolated from both of them.

The last argument to SamplerFilter is Anisotropy, which is a type synonym for Maybe Float. Anisotropic filtering is actually not in the core OpenGl 3.3 specification but is defined in an extension, in fact the only one GPipe uses besides core functionality. It is useful when the frequency of the sampling varies for fragments in the x- and y-axises (e.g. when sampling the texture on a surface that slopes wrt the view, like the walls when looking down a corridor). In this case a too high LOD (i.e. of too low resolution) will been selected for one of the axises with unnecessary blurring as a result. When using Maybe a for Anisotropy, up to a samples will instead be taken from the line of anisotropy and the average will be returned as final sample value. Using Nothing for Anisotropy is equivalent to Maybe 1.0. I don't know why a is a Float, but that's how OpenGl likes it.

The last thing we needed to create a sampler was a tuple (EdgeMode2, BorderColor c) where EdgeMode2 is simply a type synonym for V2 EdgeMode. EdgeMode is used to instruct the sampler what to do if a coordinate outside the texture region is provided, and we can give different instructions for each dimension of the texture, hence an EdgeMode2. For a 3D texture, you would have provided an EdgeMode3 instead. EdgeMode is an enum with four possible values: Repeat, Mirror, ClampToEdge or ClampToBorder. Repeat will make the texture tile in the given dimension, while Mirror makes every other tile be mirrored. ClampToEdge will clamp all sample coordinates to the range of the texture so it simply returns the first or last value from the texture in the given dimension. ClampToBorder will make any samples outside the texture in the given dimension return a predefined constant color instead. This color is defined by the second element of the tuple and if neither of the dimensions uses the edge mode ClampToBorder, the border color can be given as undefined.

Since colors like the border color is such a central concept of GPipe it deserves a section of its own:

Colors

A texture, as well as the window surface our context owns, can have a variable number of color channels. A texture can also have different types of channels: Float, Int or Word (for the window's surface the channel type is always Float). In plain old OpenGl e.g. accidentally using a float border color with an int texture would result in an error. In GPipe on the other hand, these invariants are enforced through types, and your program will always work once it compiles.

The BorderColor c type we just saw is a type synonym that expands to Color c (ColorElement c). Several other similar type synonyms exists in GPipe:
type BorderColor f = Color f (ColorElement f)
type ColorSample x f = Color f (S x (ColorElement f))
type FragColor f = Color f (S F (ColorElement f))
type ColorMask f = Color f Bool

They all are a combination of the associated types Color f a and ColorElement f from the ColorSampleable type class, where f is one of the types used as parameter to the Format f GADT. E.g. the format RGB8 had the type Format RGBFloat, so in this case f would be RGBFloat.

Here are all the instances of the ColorSampleable type class and their associated types:

f
Color f a    
ColorElement f
RGBAFloatV4 aFloat
RGBAIntV4 aInt
RGBAWordV4 aWord
RGBFloatV3 aFloat
RGBIntV3 aInt
RGBWordV3 aWord
RGFloatV2 aFloat
RGIntV2 aInt
RGWordV2 aWord
RFloataFloat
RIntaInt
RWordaWord
DepthaFloat
DepthStencil   aFloat

For an RGB8 texture where f is RGBFloat then BorderColor RGBFloat = V3 Float, ColorSample x RGBFloat = V3 (S x Float), FragColor RGBFloat = V3 (S F Float) and ColorMask RGBFloat = V3 Bool.

Now if you look back at the type of writeTexture2D, and in particular the constraint BufferFormat b, ColorSampleable c, BufferColor (Color c (ColorElement c)) h ~ b, h ~ HostFormat b, you'll understand why we looked for equations in BufferColor where the first parameter was V3 Float for RGB8 textures.

Outside the shader you could use texture2DSizes and friends to get the sizes of a texture's all lods. Inside the shader you can instead get the size of a sampler's lod with sampler2DSize (and similar for other texture types).

Sampling

Let's put the sampler we just created to use!

{-# LANGUAGE ScopedTypeVariables, PackageImports, TypeFamilies #-}
module Main where

import Graphics.GPipe
import qualified "GPipe-GLFW" Graphics.GPipe.Context.GLFW as GLFW
import "lens" Control.Lens
import Control.Monad (unless)
import Data.Word (Word32)
import Control.Applicative (pure)

main =
  runContextT GLFW.defaultHandleConfig $ do
    win <- newWindow (WindowFormatColor RGB8) (GLFW.defaultWindowConfig "Checkers")
    vertexBuffer :: Buffer os (B2 Float) <- newBuffer 4
    writeBuffer vertexBuffer 0 [V2 0 0, V2 1 0, V2 0 1, V2 1 1]
    tex <- newTexture2D R8 (V2 8 8) 1
    let whiteBlack = cycle [minBound,maxBound] :: [Word32]
        blackWhite = tail whiteBlack
    writeTexture2D tex 0 0 (V2 8 8) (cycle (take 8 whiteBlack ++ take 8 blackWhite))
    shader <- compileShader $ do
      primitiveStream <- toPrimitiveStream id
      let primitiveStream2 = fmap (\pos2d -> (make3d pos2d, pos2d)) primitiveStream
      fragmentStream <- rasterize (const (FrontAndBack, ViewPort (V2 0 0) (V2 500 500), DepthRange 0 1)) primitiveStream2
      let filter = SamplerFilter Nearest Nearest Nearest Nothing
          edge = (pure Repeat, undefined)
      samp <- newSampler2D (const (tex, filter, edge))
      let sampleTexture = pure . sample2D samp SampleAuto Nothing Nothing
          fragmentStream2 = fmap sampleTexture fragmentStream
      drawWindowColor (const (win, ContextColorOption NoBlending (pure True))) fragmentStream2

    renderLoop win $ do
      clearWindowColor win 0.5
      vertexArray <- newVertexArray vertexBuffer
      shader (toPrimitiveArray TriangleStrip vertexArray)

make3d (V2 x y) = projMat !*! viewMat !* V4 x y 0 1
  where
    viewMat = lookAt' (V3 1 2 2) (V3 0.5 0.5 0) (V3 0 1 0)
    projMat = perspective (pi/3) 1 1 100

renderLoop win rendering = do
  render rendering
  swapWindowBuffers win
  closeRequested <- GLFW.windowShouldClose win
  unless (closeRequested == Just True) $
    renderLoop win rendering

-- Copy of lookAt from linear with normalize replaced with signorm
lookAt' eye center up =
  V4 (V4 (xa^._x) (xa^._y) (xa^._z) xd)
     (V4 (ya^._x) (ya^._y) (ya^._z) yd)
     (V4 (-za^._x) (-za^._y) (-za^._z) zd)
     (V4 0     0     0     1)
  where za = signorm $ center - eye
        xa = signorm $ cross za up
        ya = cross xa za
        xd = -dot xa eye
        yd = -dot ya eye
        zd = dot za eye

This simple example will create a small 8x8 texture and give it a checkers pattern. We create a square from 4 vertices, and use a rotation and a perspective projection matrix to transform it into our view, with this result:



I have highlighted the expression that samples the texture in the example above: sample2D samp SampleAuto Nothing Nothing. It evaluates to a function of type V2 (S F Float) -> ColorSample F RFloat, which simplifies to V2 (S F Float) -> S F Float. The input, commonly referred to as an uv-coordinate, is a 2D coordinate (since it's a 2D texture) where (0.0, 0.0) maps to the top left corner of the top-left most texel in the texture, and (1.0, 1.0) maps to the bottom right corner of the bottom-right most texel.

In addition to the parameters we gave when creating the sampler, you also give a bunch of parameters at the time of sampling to the sample2D function (and its 5 siblings for the other texture types). Specifically, you can specify a SampleLod, a SampleProj and a SampleOffset value.

SampleLod is used to control which lod should be used and is defined like this:
data SampleLod vx x where
  SampleAuto :: SampleLod vx F
  SampleBias :: FFloat -> SampleLod vx F
  SampleLod :: S x Float -> SampleLod vx x
  SampleGrad :: vx -> vx -> SampleLod vx x

SampleAuto and SampleBias are only available if you are sampling inside a fragment stream. In a vertex stream you have to resort to one of the last two. SampleAuto will simply select a lod based on the sampling distance between this and adjacent fragments. This requires you to not do the sampling inside a branch (such as ifB), or the sampling distance to adjacent fragments may become undefined and wrong lod may be used. SampleBias lets you add a bias to select a higher lod (i.e. with lower resolution) than automatically selected. This enables you to create cheap blurring effects (the higher bias, the more blur). SampleLod simply picks a lod without any automatic calculations, and with SampleGrad, you provide the gradients for the lod calculations yourself. If you really need to sample inside a branch, you can actually get the gradients for the coordinate using the functions dFdx and dFdy. These two functions also have the limitation that they need to be performed outside any branching or they'll also get undefined return values.

The SampleProj x type is a synonym for Maybe (S x Float). If you give a Just value as this parameter, then the input coordinate's components will be divided by this value. This might seem like an odd addition to the API, but the hardware is already doing divisions behind the curtains for perspective correction (which is done on hardware), so you get this extra division "for free" since it can just multiply it with the existing denominator. You can use SampleProj to zoom the texture for instance. Using Nothing for SampleProj is equivalent to Just 1.

The last parameter to sample2D besides the actual sampling coordinate is SampleOffset2 x, which is just a type synonym for Maybe (V2 Int). This lets you add a constant (note the type: Int and not S x Int) offset in texels to any sample made. The offset has to stay within certain hardware dependent limits, in practice within [-8,7].



A texture that has a Format Depth or Format DepthStencil format, sampling will return the depth value as a float in the range [0,1]. A common use case for depth textures are shadow maps, in which you rather want to compare the depth value with a reference. For this, you can create a specialized shadow sampler with newSampler2DShadow and friends. These actions also require a ComparisonFunction that will be used when sampling. Sampling with shadow samplers is not done with the usual sample2D function, instead you use sample2DShadow, and similar for other texture types. These shadow sample functions also take a reference value as input, and will compare the texels of the texture with this reference value, using the ComparisonFunction that the sampler was created with to result in a value of 0 or 1 for each texel. These will then be sampled with the samplers filter to produce a final sample float value in the range [0,1] with lower values being "more false" and higher ones "more true".



Before we end this chapter, you should play around with the different parameters I've presented. Try replacing (make3d pos2d, pos2d) with (make3d pos2d, fmap NoPerspective pos2d) to see why we need perspective correct interpolation of the uv-coordinates. How would a Linear magnifying filter look like? In vertexBuffer, you can change each element's component of value 1 with value 2 instead and then try other edge modes than pure Repeat. What if you specify different edge modes for x and y? Remember to replace undefined with an actual color if you try ClampToBorder



torsdag 1 oktober 2015

GPU programming in Haskell using GPipe - Part 3


< Previous episode: Buffers and arrays

Welcome to the Shader!

Finally! This is where the real fun happens! In previous episode we learned many different ways to create a PrimitiveArray. Remember from the first episode that this PrimitiveArray could be turned into a PrimitiveStream inside the Shader monad? Well, that's what we will do next.

Primitive streams

A quick recap from episode 1: This is how you create a PrimitiveStream: 
toPrimitiveStream :: forall os f s a p. VertexInput a => (s -> PrimitiveArray p a) -> Shader os s (PrimitiveStream p (VertexFormat a))

This is a Shader action that uses a function to get a PrimitiveArray from the shader environment, and then returns a PrimitiveStream. There are two things to remember about Shader actions:
  1. Whatever is returned from a Shader action can never leave the Shader monad. The only thing you can do with a Shader monad is to give it to compileShader, and that one is expecting a Shader os s (), i.e. it doesn't accept any return values.
  2. Whenever you see a Shader action wanting a function to retrieve some from the shader environment (in the toPrimitiveStream case, x is a primitive array), then x is something that may vary with different invocations of the shader without requiring another call to compileShader.
In the transition from array to stream, our vertices will turn from type a to type VertexFormat a. This is very similar to how buffers worked: there is a VertexInput type class that defines what may be turned into vertices in primitive streams:

class BufferFormat a => VertexInput a where
  type VertexFormat a
  toVertex :: ToVertex a (VertexFormat a)

toVertex is an arrow action, just as toBuffer was. If you want to create your own instance of VertexInput you define your toVertex member in terms of the toVertex members of the GPipe provided instances. Let's look at some examples of these instances:
a  VertexFormat f
B FloatS V Float
B Int32S V Int
B Word32S V Word
Normalized (B Int32)S V Float
Normalized (B Word32)S V Float
B2 FloatV2 (S V Float)
B2 Int32V2 (S V Int)
B2 Word32V2 (S V Word)
B2 Int16V2 (S V Int)
B2 Word16V2 (S V Word)
Normalized (B2 Int32)V2 (S V Float)
Normalized (B2 Word32)V2 (S V Float)
Normalized (B2 Int16)V2 (S V Float)
Normalized (B2 Word16)V2 (S V Float)
(a, b)(VertexFormat a, VertexFormat b)
V2 aV2 (VertexFormat a)
There are many more, including B3, B4 and larger tuples. See the full list on hackage.

Almost looks like a straight copy from the table in the previos episode, doesn't it? One important difference between the BufferFormat class and the VertexInput class is that the former was defined on the resulting type in the conversion and had an associated type for the originating type (HostFormat), while VertexInput is instead defined on the originating type and uses an associated type for the resulting type (VertexFormat). So in the table above, a B Float will turn into a S V Float and not the other way around.

S x a is what is called a lifted type, and represents an a in an x "setting" (to avoid using the word "context" that already has a distinct meaning in GPipe). The setting in this case is V, which you might have guessed stands for "Vertex". There is one other setting in GPipe and that is F that stands for Fragment.

You use an S x a almost just as you would use a normal a: For example S V Float has Num, Floating and Fractional instances so you can do things like add, divide or take sin of them.

Ints and Words looses their specific size when entering the world of shaders, so e.g. both Int16 and Int32 becomes just Int in an S-type.

There is a Normalized newtype wrapper that you can put an integral B-value in (when fmaping on the PrimitiveArray) to make it become a lifted floating value. The GPU will do this by mapping a signed (i.e. IntXX) integral's range [minBound, maxBound] to the floating point range [-1.0, 1.0] on the shader side, and an unsigned's (i.e. WordXX) range to [0.0, 1.0].

Buffer vector types (such as B2) will turn back into their normal V-vector form (e.g. V2), but with the elements converted to S-values. Normal V-vectors may also be used as input and will become the same V-vector of S-values on the shader side. The difference is that converting a V2 Int32 for instance would take two shader attribute slots while converting a B2 Int32 would only take one. Each graphics hardware has a limit on how many attribute slots can be used at once, and GPipe will detect if this limit is reached and throw an exception from the compileShader call in that case. To help staying under that limit, prefer using B-vectors instead of V-vectors in buffers and primitive arrays when possible. Actually, the topic of error handling in GPipe is so important that it should have its section of its own:

Error handling in Gpipe

There are three levels of errors you may encounter when working with GPipe:
  1. Type check errors
    This is the whole idea of GPipe: GHC will tell you when you try to do something that is statically known to not work (something that usually would be a runtime error when working with plain OpenGl). Most invariants in GPipe are checked this way.
  2. Runtime GPipeExceptions
    Whenever a hardware dependent limit is reached, like number of attributes used by a PrimitiveArray, a GPipeException is thrown. You may always catch one of those to maybe show a user friendly error message and/or fallback to a simpler solution.
  3. Runtime error calls
    For operations that involves indexing, e.g. writeBuffer, an error may be thrown if you use an index that is out of bounds (just as the !! operator on a list would). Since these kinds of errors are not hardware dependent and would always be thrown, they are considered to be programmer errors and as such are not guaranteed to be catchable in a safe way (e.g. the interior OpenGl state might be garbled after this).

Enough of that, back to the shader

So now we have a PrimitiveStream with vertices built up from S-values. A PrimitiveStream is a Functor and a Monoid, just as PrimitiveArray was, so you can fmap functions on it's vertices and mappend several streams into one. But since your vertices are made of S-values and not B-values, you can actually do computations on them that will be run as shader code on the GPU! Let's try it out! In the "Hello world" program you did in part 1 of this tutorial, add these two lines to your shader:
      let primitiveStream2 = fmap (\(pos,clr) -> (pos - V4 1 1 0 0, clr / 10)) primitiveStream  
      let primitiveStream3 = primitiveStream `mappend` primitiveStream2  

Also replace the last argument to rasterize from primitiveStream to primitiveStream3. Rebuild and you should see this picture:


It looks almost like the original one, but on top of the old triangle you see a new one, translated exactly 250 pixels down and left, and with one tenth of the brightness of the original triangle. Why 250 pixels? Remember from part 1 that the canonical view space is defined as [(-1,-1,-1), (1,1,1)], i.e. a cube with size 2, and since we used a view port of size 500 a move of 1 unit will translate to 250 pixels.

Why was the new triangle drawn on top of the old? Switch place with the operands to mappend and you'll see: the order of primitives in a PrimitiveStream (and also the fragments in a FragmentStream) matters. Since we didn't used any depth or stencil tests when drawing (I'll explain what that is later as well) the new triangle became on top because it was drawn later.

If you knew OpenGl from before, you might be interested in how the GLSL shader that drew these triangles look like. Actually, there wasn't one shader but two! This is where GPipe starts showing it's modularity and composability compared to plain GLSL code. Let's take it even further and add two more lines to the shader:
      let rotationMatrix a = V4 (V4 (cos a) (-sin a) 0 0)   
                                (V4 (sin a) (cos a) 0 0)  
                                (V4 0    0    1 0)  
                                (V4 0    0    0 1)  
      let primitiveStream4 = fmap (first (rotationMatrix (-0.2) !*)) primitiveStream3   

And don't forget to change the rasterize argument to primitiveStream4 now. When running, both triangles should have been rotated 0.2 radians clock wise around origo:


(You see the old bright triangle getting clipped on top and right because we use a view port that is smaller than the actual window.)

As you can see, in GPipe there is nothing stopping you from fmaping functions on streams more than once. How many GLSL shaders have GPipe created behind the curtains in this case? Still just two! If you are interested in how the shaders GPipe generates looks like and use Windows I can recommend the tool RenderDoc (a linux UI is also planned). With the latest version you can inspect any OpenGl 3.2+ application, and watch all API calls, all data and all GLSL shader source code.

A note on performance: Combining PrimitiveArrays with mappend has usually better performance than combining two PrimitiveStreams, so unless you need to fmap different functions on them do the mappend before they turn into streams with toPrimitiveStream.

Linear algebra in GPipe

If you've never done any kind of 3D programming before that last rotation code might be all new to you. In that case I'm afraid you'll have to pick up some basic linear algebra before venturing on, since that is essential for any graphics programming. E.g. you need linear algebra in your PrimitiveStream to transform your vertices from object space and world space to the canonical viewspace that the rasterizer expects. Linear algebra and 3D math is however out of scope for this tutorial. This seems to become a pretty good online book on the subject (at least I have enjoyed Akenine-Möller's other books before), but at the time of writing this it is not complete yet. If any reader has suggestions on good 3D math tutorials, then please share those in the comments field below!

Linear algebra in GPipe is preferably done with the linear package, since support for it's vector types are built in to GPipe. If you import the Graphics.GPipe module, you will even get all of linear imported for free!

When doing 3D math, there are a number of things that can be made in two different ways
  • Coordinate systems can be right handed or left handed
  • Transform matrices can be made to be multiplied with vectors from left or from right
  • Matrices can be defined in row major or column major order
The linear package is mostly mimicking what was the convention in early versions of Opengl: it uses a right handed system before perspective or orthographic projection (you look down negative z) but left handed after (far plane gets z=+1 and near plane z=-1), and you multiply any matrices returned from linear's functions such as mkTransformation from the right. Matrices are however unlike OpenGl defined in row major order in the linear package (which will require an extra transpose when building base matrices from left, up and forward vectors). 

Just remember that these assertions are a property of the functions of the linear package, and you are free to ditch those in favor of whatever linear algebra functions you like (the underlying OpenGl has no preferences really).

Uniforms

In the example above, we rotated the two triangles clock wise 0.2 radians. What if we want to increase the angle every frame? We can't make the angle parametric since the shader is precompiled, and there's no way to get an arbitrary value from the shader environment (there is no ask method, remember). Instead, you use uniforms!

A uniform in GPipe is a shader value that you get from a single buffer element. Uniforms in GPipe are built upon something that is called Uniform Buffer Objects (UBOs) in OpenGl, but don't worry if you don't know what that is. You get a uniform from a buffer with this shader action:
getUniform :: forall os s b x. UniformInput b => (s -> (Buffer os (Uniform b), Int)) -> Shader os s (UniformFormat b x)

Uh oh, another type class with another associated type? Yep, I told you that pattern was going to occur several times in GPipe. I think you know how that works by now so I'll just say it works almost exactly like VertexInput, except there are no instances for smaller Ints and Words than 32 bits. Check out the haddock for it here.

getUniform is one of those Shader functions that take a shader environment function as argument. In this case, that function is used to retrieve a tuple of a Buffer and an index into that buffer. The element type of the Buffer has to be Uniform b (where b is an instance of UniformInput). The reason for this is that uniforms have very restrictive requirements for alignment, and by creating a buffer with the element type wrapped in the newtype wrapper Uniform, you instruct GPipe to use that alignment. A buffer of this kind can still be used to create VertexArrays if you really wanted to (you can just remove the Uniform wrapper with a fmap on the VertexArray) but keep in mind that a buffer created with Uniform will have a quite large element alignment (usually 128 bytes, it's hardware dependent), and hence large padding for each element.

The uniform value you get with getUniform is just like the vertices in your PrimitiveStream built up from S-values, but it isn't bound to a stream of its own and can be used when fmaping with any other stream instead. Let's try it out:

{-# LANGUAGE ScopedTypeVariables, PackageImports, TypeFamilies, FlexibleContexts #-}

module Main where

import Graphics.GPipe
import qualified "GPipe-GLFW" Graphics.GPipe.Context.GLFW as GLFW
import Control.Monad (unless)
import Data.Monoid
import Control.Arrow (first)

main =
  runContextT GLFW.defaultHandleConfig $ do
    win <- newWindow (WindowFormatColor RGB8) (GLFW.defaultWindowConfig "Uniforms")
    vertexBuffer :: Buffer os (B4 Float, B3 Float) <- newBuffer 3
    writeBuffer vertexBuffer 0 [ (V4 (-1) 1 0 1, V3 1 0 0)
                               , (V4 0 (-1) 0 1, V3 0 1 0)
                               , (V4 1 1 0 1, V3 0 0 1)
                               ]

    uniformBuffer :: Buffer os (Uniform (B Float)) <- newBuffer 1

    shader <- compileShader $ do
      primitiveStream <- toPrimitiveStream id
      let primitiveStream2 = fmap (\(pos,clr) -> (pos - V4 1 1 0 0, clr / 10)) primitiveStream
      let primitiveStream3 = primitiveStream `mappend` primitiveStream2
      let rotationMatrix a = V4 (V4 (cos a) (-sin a) 0 0)
                                (V4 (sin a) (cos a) 0 0)
                                (V4 0    0    1 0)
                                (V4 0    0    0 1)
      uniform <- getUniform (const (uniformBuffer,0))
      let primitiveStream4 = fmap (first (rotationMatrix uniform !*)) primitiveStream3
      fragmentStream <- rasterize (const (FrontAndBack, ViewPort (V2 0 0) (V2 500 500), DepthRange 0 1)) primitiveStream4
      drawWindowColor (const (win, ContextColorOption NoBlending (V3 True True True))) fragmentStream

    loop vertexBuffer shader win uniformBuffer 0

loop vertexBuffer shader win uniformBuffer angle = do
  writeBuffer uniformBuffer 0 [angle]
  render $ do
    clearWindowColor win (V3 0 0 0)
    vertexArray <- newVertexArray vertexBuffer
    let primitiveArray = toPrimitiveArray TriangleList vertexArray
    shader primitiveArray
  swapWindowBuffers win

  closeRequested <- GLFW.windowShouldClose win
  unless (closeRequested == Just True) $
    loop vertexBuffer shader win uniformBuffer ((angle+0.1) `mod''` (2*pi))

(The additions for uniforms are highlighted)

First we create an additional buffer with just one element that we name uniformBuffer. We then get the value from this buffer as a uniform in the shader (using const since we always want to use this buffer and index 0). Last, we send uniformBuffer to the loop function along with an angle value so that we can write the angle value to that buffer each iteration. When we render our shader with shader primitiveArray, it will use the current value of the buffer that we just wrote. We then increment the angle value for the next recursion of the loop, with a constantly rotating image as result.

Working with lifted S-values

To avoid numerical instability after several lapses I used mod'' on the incremented angle to make it wrap back to 0 each lap. This function is a method of the Real' typeclass, defined in the Graphics.GPipe.Expr module. This type class, as well as the type classes Convert, Integral' and FloatingOrd, are provided since the prelude version is too tied to non-lifted real world types (e.g. the prelude Integral type class has a toInteger method that we couldn't have defined for S-values). These GPipe-specific type classes also have instances for normal types such as Float and Double, so that's why we could use mod'' in our example above.

Boolean operations in the Prelude are particularly non-supporting towards lifted types. For example ==, < and && all return non-lifted Bools. To alleviate this, GPipe uses the Boolean package which provides type classes that are parametric on the type of boolean, so that we can use S V Bool in conditional functions. All lifted versions of operators have a * postfix and all lifted version of functions have a B postfix. Boolean also provides an ifB function that works the same way as the normal if statement but on lifted booleans (just without the then and else syntax). Here's an example on a function that uses conditionals on S-values:
f :: S V Float -> S V Float -> S V Float
f x y = ifB (x <* y &&* x /=* 3) (x * 2) (minB y 3)

Besides ifB provided by Boolean, GPipe also defines three other ways of doing conditional branching:
ifThen :: forall a x. ShaderType a x => S x Bool -> (a -> a) -> a -> a
ifThenElse :: forall a b x. (ShaderType a x, ShaderType b x) => S x Bool -> (a -> b) -> (a -> b) -> a -> b
ifThenElse' :: forall a x. ShaderType a x => S x Bool -> a -> a -> a

They don't seem to bring much more than just another constraint than ifB, and they are indeed almost functionally equivalent, except in rare cases when using implicit derivatives in a FragmentStream; I'll cover that in next part of the tutorial. Another difference is that these functions in many cases generate more efficient shader code for the GPU than ifB. Rule of thumb is to use the most specialized if-function whenever possible, and basically only use ifB if you want a generic function that works on normal non-lifted values as well.

The ShaderType constraint is also used by this function:
while :: forall a x. ShaderType a x => (a -> S x Bool) -> (a -> a) -> a -> a

You can't use a regular recursive function to express loops with S-values, since you cannot select a base case to stop the recursion based on a lifted S x Bool value. Instead, you can use this while function, which takes a conditional function, a transformation function and an initial value. It then does the transformation repeatedly as long as the conditional function returns true. This loop is made on the GPU when running the shader, so if it never returns true your computer might hang (unless you have GPU hang detection like Windows which in most cases just kills your application). Here's an example of a function that multiplies a float value by 0.9 x number of times:
f :: S V Int -> S V Float -> S V Float
f x y = snd $ while ((<* x) . fst) (\(i,n) -> (i+1, n*0.9)) (0, y)

You may create your own instances of the ShaderType type class if you have a type which structure can be mapped to any of the existing instances'. Here is an example of such an instance:
{-# LANGUAGE FlexibleInstances, MultiParamTypeClasses, UndecidableInstances #-}
data MyType x = MyType (S x Float) (S x Int)

instance ShaderType (MyType x) x where
  type ShaderBaseType (MyType x) = ShaderBaseType (S x Float, S x Int)
  toBase x ~(MyType f i) = toBase x (f, i)
  fromBase x b = let (f,i) = fromBase x b in (MyType f i)

One thing to remember is to make toBase completely lazy, using tilde (~) in pattern matches and all. And yeah, UndecidableInstances is currently needed for own instances of ShaderType, sorry about that!

Combining Shader monads

Before we end this part of the tutorial, I just want to show you some cool stuff about the Shader monad.

First of all, in order to be able to reuse Shader actions in many different settings we need a way to change the shader environment. That is done with this combinator:
mapShader :: (s -> s') -> Shader os s' a -> Shader os s a

With mapShader you provide a function that extracts a sub environment from another environment, and a Shader monad that operates in that sub environment that then will be turned into a Shader that operates in the other environment.

The Shader monad is not only a monad, it is also a MonadPlus (and an Alternative which is basically the same thing). This enables us to create alternative branches with mplus (or <|>), and to discriminate among the branches with the guard function from Control.Monad. The left most branch without a guard False action will be the one that is run. (A Shader where all branches has guard False actions will throw an error.) This is all kind of cool, but what is even cooler is this:
guard' :: (s -> Bool) -> Shader os s ()

It looks almost exactly like guard, except that instead of a Bool it takes a function that retrieves a Bool from the shader environment. This enables us to select among branches at shader run time instead of shader compile time! You could for instance have a bool in the shader environment to denote whether we want to render shadows or not, and be able to toggle that in runtime.

Derived from mapShader and guard', GPipe defines these two useful shader combinators:
maybeShader :: (s -> Maybe s') -> Shader os s' () -> Shader os s ()
chooseShader :: (s -> Either s' s'') -> Shader os s' a -> Shader os s'' a -> Shader os s a

The first one, maybeShader, works almost like mapShader, only that the function provided only maybe returns a sub environment. If it doesnt (i.e. it returns Nothing) maybeShader does nothing. Since it might not run, no return value besides () can be expected.

chooseShader will run one of two provided Shader actions, each in a (possibly different) sub environment. Since exactly one of the actions always will be run, this may return a value as well.