söndag 1 november 2015

GPU programming in Haskell using GPipe - Part 5


< Previous episode: Textures and Samplers

We are almost at the end of this tutorial series, but before we depart I'll have to show you the ins and outs of drawing. Drawing is the process of having each fragment in a FragmentStream affect a pixel in an image, and the ultimate goal of any graphics program. We start with drawing to the window, and then we will look at drawing to off-screen images.

Window formats

Most GPipe programs have a shader that ends with drawing to the window. When we created our window back in part 1, we gave WindowFormatColor RGB8 as parameter to the newWindow call. This parameter described the format of the window, and is defined by this type:
data WindowFormat c ds where
  WindowFormatColor :: ContextColorFormat c => Format c -> WindowFormat c ()
  WindowFormatColorDepth :: ContextColorFormat c => Format c -> Format Depth -> WindowFormat c Depth
  WindowFormatColorStencil :: ContextColorFormat c => Format c -> Format Stencil -> WindowFormat c Stencil
  WindowFormatColorDepthStencilSeparate :: ContextColorFormat c => Format c -> Format Depth -> Format Stencil -> WindowFormat c DepthStencil
  WindowFormatColorDepthStencilCombined :: ContextColorFormat c => Format c -> Format DepthStencil -> WindowFormat c DepthStencil
  WindowFormatDepth :: Format Depth -> WindowFormat () Depth
  WindowFormatStencil :: Format Stencil -> WindowFormat () Stencil
  WindowFormatDepthStencilSeparate :: Format Depth -> Format Stencil -> WindowFormat () DepthStencil
  WindowFormatDepthStencilCombined :: Format DepthStencil -> WindowFormat () DepthStencil

The data stored for a window is called a frame buffer, which may have any combination of three different image buffers: A color buffer, a depth buffer and a stencil buffer. The WindowFormat type is parameterized on what type of color and what combination of depth or stencil buffers it contains. The types of these parameters comes from the constructors' Format arguments (same kind of Format as we previously have used for defining textures). () is used to denote the lack of buffers. In all examples so far, we have only used a window format with a RGB8 color buffer and no depth nor stencil buffers, hence a window format of type WindowFormat RGBFloat (). Windows' frame buffers only ever uses float colors, and never integral colors. 

Window drawing actions

Drawing is done as a Shader action, in fact the only one with a side effect! There are different drawing actions to choose from depending on which of the color, depth or stencil buffers you want to use (given that they exist):
drawWindowColor :: forall os s c ds. ContextColorFormat c => (s -> (Window os c ds, ContextColorOption c)) -> FragmentStream (FragColor c) -> Shader os s ()
drawWindowDepth :: forall os s c ds. DepthRenderable ds => (s -> (Window os c ds, DepthOption)) -> FragmentStream FragDepth -> Shader os s ()
drawWindowColorDepth :: forall os s c ds. (ContextColorFormat c, DepthRenderable ds) => (s -> (Window os c ds, ContextColorOption c, DepthOption)) -> FragmentStream (FragColor c, FragDepth) -> Shader os s ()
drawWindowStencil :: forall os s c ds. StencilRenderable ds => (s -> (Window os c ds, StencilOptions)) -> FragmentStream () -> Shader os s ()
drawWindowColorStencil :: forall os s c ds. (ContextColorFormat c, StencilRenderable ds) => (s -> (Window os c ds, ContextColorOption c, StencilOptions)) -> FragmentStream (FragColor c) -> Shader os s ()
drawWindowDepthStencil :: forall os s c ds. (DepthRenderable ds, StencilRenderable ds) => (s -> (Window os c ds, DepthStencilOption)) -> FragmentStream FragDepth -> Shader os s ()
drawWindowColorDepthStencil :: forall os s c ds. (ContextColorFormat c, DepthRenderable ds, StencilRenderable ds) => (s -> (Window os c ds, ContextColorOption c, DepthStencilOption)) -> FragmentStream (FragColor c, FragDepth) -> Shader os s ()

If you e.g. has a window format of type WindowFormat RGBFloat Depth, you may use all of these but anyone that has Stencil in their name.

Remember from part 1 that the window's frame buffer is double buffered and these drawing actions will draw to the hidden back buffer. To make the drawn image visible on screen, you need to make a call to swapWindowBuffers after you have done all your drawing.

In all previous examples we have been using clearWindowColor for clearing the window. There are clearWindowDepth, clearWindowStencil and clearWindowDepthStencil functions for clearing a window's depth or stencil buffers as well. What value you should use to clear the buffer with depends on the use case, but usually you use 0 (i.e. black) for colors and 1 for depths.

Common for all drawing actions are that they take two arguments: a function to retrieve window and drawing options from the shader environment, and a FragmentStream to draw. The options retrieved from the shader environment are different depending on which of color, depth or stencil buffers are used. The type of fragments in the FragmentStream is also dependent on whether color or depth is used. Let's look at how each of color, depth and stencil work separately.

Drawing colors

All of the drawing actions listed above that has Color in their name will retrieve at least a ContextColorOption c from the shader environment, and requires a FragmentStream with vertices containing FragColor c, even though some of them require additional options or fragment values for depth or stencil tests. The ContextColorOption c data type looks like this:
data ContextColorOption f = ContextColorOption Blending (ColorMask f)
type ColorMask f = Color f Bool
data Blending
  = NoBlending
  | BlendRgbAlpha (BlendEquation, BlendEquation) (BlendingFactors, BlendingFactors) ConstantColor
  | LogicOp LogicOp

It consists of a Blending and a ColorMask. The latter is a Color of Bool components, e.g. a V3 Bool for an RGB color buffer. You can use the color mask to suppress drawing of specific color channels by setting the corresponding components to False.

The Blending setting determines how each fragment gets combined with the image's previous pixel value to make a new pixel value. In most cases NoBlending is used, which makes each fragment simply overwrite the pixel's previous value (for the color channels where the ColorMask is True that is). If BlendRgbAlpha is used, then the new pixel value value will become the result of a pair of BlendEquations, where the first determines how the RGB components are blended and the second how the A component (if it exists) is blended. A BlendEquation may be one of five symbolic values that each represent a different function, so its not quite as flexible as you might have hoped. The fragment and the previous pixel values will also each be multiplied with a BlendingFactor (an enum of 15 predefined symbolic values) before the BlendEquations are performed.

If this blending is used:
BlendRgbAlpha (eqRGB, eqA) (BlendingFactors srcFactRGB destFactRGB, BlendingFactors srcFactA destFactA) color
then the RGB and Alpha of the new pixel will be calculated like this:
(pseudo code)
newPixelRGB = eqRGB (srcFactRGB color * fragmentRGB) (destFactRGB color * previousPixelRGB)
newPixelA = eqA (srcFactA color * fragmentA) (destFactA color * previousPixelA)

The ConstantColor argument is only used by some values of BlendingFactors, namely ConstantColor, OneMinusConstantColor, ConstantAlpha and OneMinusConstantAlpha. If neither of these are used by your Blending value, then you may use undefined for ConstantColor. The ConstantColor is always a V4 Float, no matter what the actual window color format is. This means that all components may not be used. The reason for this seemingly deficiency in the design will become apparent in a bit when we learn that we can draw to multiple color images at once (of possibly different formats) but can only use a single Blending for them all.

The last variant of Blending is LogicOp. This is yet another predefined symbolic function enum that will be used for any colors that has an internal integral representation. That means for example RGB8 (which has an internal unsigned byte representation per component even though you use S F Floats to write to it) or RGBUI (which has the same internal representation as RGB8), but not for example RGBA32F (which uses floating point values internally). For colors with internal floating point representations, using LogicOp is equivalent to NoBlending.

Depth test

The drawing actions that has Depth in their name can be used for any window that has a depth buffer. These drawing actions will all perform an additional depth test and write values to the window's depth buffer. A depth buffer consists of a fixed point value in the range [0, 1] for each pixel (with a precision of 16, 24 or 32 bits).

The Depth drawing actions want a DepthOption or a DepthStencilOption from the shader environment, and they all need a FragmentStream with fragments that contains a FragDepth value (which is just a synonym for S F Float).

The DepthOption is defined like this:
data DepthOption = DepthOption DepthFunction DepthMask
type DepthMask = Bool

The DepthFunction is like the BlendEquation an enum with predefined symbolic functions. For each fragment, this function will compare the fragments FragDepth value to the depth buffer's previous value. If the function returns true, the depth value will be updated with the fragments value (unless the DepthMask is False) and the color (if available) will be written according to the ContextColorOption. If the DepthMask is False, the test is still always performed, it's just the updating of the depth buffer value upon passed tests that is skipped.

The FragDepth value may be calculated in a normal fmap on the FragmentStream, but most commonly the rasterized depth is what you want. To get this and other rasterized values, you can use the function withRasterizedInfo:
withRasterizedInfo :: (a -> RasterizedInfo -> b) -> FragmentStream a -> FragmentStream b
  data RasterizedInfo = RasterizedInfo {
    rasterizedFragCoord :: V4 FFloat,
    rasterizedFrontFacing :: FBool,
    rasterizedPointCoord :: V2 FFloat
  }

This works just as fmap on a FragmentStream, but augments the function with a RasterizedInfo record that among other things contains the fragments position in window space: rasterizedFragCoord. The z component of this member is the rasterized depth and this is what you usually pass on as FragDepth to the drawing actions.

The common convention for depth tests is to use Less as DepthFunction, and let the depth value increase with distance from viewer. This is what I used in the spinning box example from the announcement of GPipe 2.

When the drawing action uses a DepthStencilOption instead, then the DepthOption is provided as an argument to it's constructor.

Stencil test

A window may also contain a stencil buffer, which contains an integral value of 1, 4, 8 or 16 bits. If a stencil buffer is available, then any of the drawing actions with Stencil in their names may be used. These will all perform a stencil test before any depth test or blending. The stencil draw actions may also update the stencil buffer's values differently when the stencil test pass or fail, or even when the stencil test pass but the depth test fail. This is unlike the depth test that may only update the depth buffer's values where the test pass.

A stencil drawing action will not require any special values from the fragments to perform it's test. It do require a StencilOptions value from the shader environment though, which is defined like this:
type StencilOptions = FrontBack StencilOption
data FrontBack a = FrontBack { front :: a, back :: a }
data StencilOption = StencilOption {
  stencilTest         :: ComparisonFunction,
  stencilReference    :: Int,
  opWhenStencilFail   :: StencilOp,
  opWhenStencilPass   :: StencilOp,
  stencilReadBitMask  :: Word,
  stencilWriteBitMask :: Word
  }

Different stencil tests is specified for front facing and back facing primitives (in case of triangles that is, lines and points are always front facing). For each of the two tests, a ComparisonFunction is used just like for depth tests (DepthFunction is just a type synonym for ComparisonFunction). As previously mentioned, fragments doesn't contain individual values to use for this comparison like they did for depth test, instead a stencilReference is provided. This value will first be clamped to the same range as the stencil buffer's values. Then it will be masked with stencilReadBitMask, as will the stencil buffer's value be, and then the stencilTest will be performed on these masked values.

If the test fails, the stencil value will be updated by the symbolic function specified by opWhenStencilFail. If it passes and no depth tests are to be made, the color (if available) will be blended and the stencil value will be updated with opWhenStencilPass instead. When updating the stencil value, the stencilWriteBitMask will be used to determine what bits will be updated.

If the drawing action is doing both stencil and depth tests, the option value retrieved from the shader environment will be
data DepthStencilOption = DepthStencilOption {
  dsStencilOptions              :: StencilOptions,
  dsDepthOption                 :: DepthOption ,
  opWhenStencilPassButDepthFail :: FrontBack StencilOp
  }

The dsStencilOptions and dsDepthOptions works as before, with the addition that the depth test is only performed if the stencil test passes. If the depth test would fail, then opWhenStencilPassButDepthFail will be performed on the stencil buffer's value instead of opWhenStencilPass.

Custom filtering of fragments

Both stencil test and depth test may discard fragments so their colors aren't drawn. There is yet another way of discarding fragments that doesn't involve additional buffers:
filterFragments :: (a -> FBool) -> FragmentStream a -> FragmentStream a

This function works just like filter does on normal lists: Only fragments where the predicate evaluates to true will be included in the resulting FragmentStream. If you already know OpenGl I can tell you that this corresponds to using discard in a fragment shader.

Drawing to texture images

There are many rendering techniques that requires you to do more than one pass through the pipeline, which requires a way to render fragments to an off-screen image instead of the window. In GPipe you will use a portion of a texture as an off-screen image. An image is a single 2D array of pixels while there are many different dimensionalities of textures (all which can have multiple LOD levels) so there are multiple images in each single texture (each image in a Texture1D or Texture1DArray has height 1). You get one of the images from a texture using any of the getTextureXXXImage functions. Just as with all the other texturing functions, there is one version for each of the six texture types. This is how the 2D variant looks like:
getTexture2DImage :: Texture2D os f -> Level -> Render os (Image f)

This will give you an Image f of a format f that is a reference to a specific level of the given texture. Any drawing to this image will affect the original texture. Since this could make it hard to reason about the state of a texture when also sampling from it, you may not use getTexture2DImage and newSampler2D on the same texture within the same Render monad (and analogous on the other texture types). Trying to do so will generate a run time error (I really wish Haskell had the region inference capabilities of Rust so this could have been checked in compile time). This means that you need to split up a two-pass rendering into two separate render calls.

Once we have one or more images, drawing to them isn't radically different from drawing to the window. You will use a different set of functions:
draw :: forall a os s. (s -> Blending) -> FragmentStream a -> (a -> DrawColors os s ()) -> Shader os s ()
drawDepth :: forall a os s d. DepthRenderable d => (s -> (Blending, Image (Format d), DepthOption)) -> FragmentStream (a, FragDepth) -> (a -> DrawColors os s ()) -> Shader os s ()
drawStencil :: forall a os s st. StencilRenderable st => (s -> (Blending, Image (Format st), StencilOptions)) -> FragmentStream a -> (a -> DrawColors os s ()) -> Shader os s ()
drawDepthStencil :: forall a os s d st. (DepthRenderable d, StencilRenderable st) => (s -> (Blending, Image (Format d), Image (Format st), DepthStencilOption)) -> FragmentStream (a, FragDepth) -> (a -> DrawColors os s ())

As you can see, you will in addition to the DepthOption provide an Image (Format d) to the versions using depth tests, and in addition to the StencilOptions an Image (Format st) to the stencil versions. All versions will take a Blending parameter from the shader environment, but apparently no image for the colors? No, instead these drawing actions expect a fragment stream of some a (plus a FragDepth for those doing depth test) and also want a function a -> DrawColors os s ().

DrawColors os s a is a monad in which you draw colors. When drawing to off-screen images, you may actually draw each fragment stream to multiple color images (but still only use a single image for depth test and a single image for stencil test). If the depth and stencil tests of the drawing action succeeds, then the a -> DrawColors os s () function will be run for each fragment's a. To draw a color inside the DrawColors monad, you use
drawColor :: forall c s os. ColorRenderable c => (s -> (Image (Format c), ColorMask c, UseBlending)) -> FragColor c -> DrawColors os s ()

Besides the FragColor c, which you get from the a that was passed to the a -> DrawColors os s () function, you also need to provide a function that retrieves the following from the shader environment: an image to draw the colors to, a ColorMask c and a Bool telling whether blending should be used or not (UseBlending is simply a synonym for Bool). The Blending itself was specified for the entire drawing action above meaning that if you want to draw multiple colors, you can choose which of them that will use blending but they'll have to use the same kind of blending if they do. For color images where UseBlending is False, it is equivalent to as if NoBlending was used (i.e. the image's previous pixel value will be overwritten with the fragment's value).

The number of colors you may draw to in the same drawing action is hardware dependent, and a GPipeException will be thrown from the compileShader call of any shader that exceeds this limit.

When using multiple images of different sizes in a single draw call (including the depth or stencil images) only fragments that lie inside all of the images will be drawn. The rest will simply be discarded and those parts of the larger images remain untouched.

To clear an entire image, you use the render actions clearImageColor, clearImageDepth, clearImageStencil and clearImageDepthStencil. They work exactly like their ...Window... counterparts, only that you need to tell which image to clear for each of them.

Demo time!

{-# LANGUAGE ScopedTypeVariables, PackageImports, TypeFamilies #-}
module Main where

import Graphics.GPipe
import qualified "GPipe-GLFW" Graphics.GPipe.Context.GLFW as GLFW
import "lens" Control.Lens
import Control.Monad (unless)
import Data.Word (Word32)
import Control.Applicative (pure)
import Data.Monoid (mappend)

main =
  runContextT GLFW.defaultHandleConfig $ do
    win <- newWindow (WindowFormatColor RGB8) (GLFW.defaultWindowConfig "Checkers")
    vertexBuffer :: Buffer os (B2 Float) <- newBuffer 4
    writeBuffer vertexBuffer 0 [V2 0 0, V2 1 0, V2 0 1, V2 1 1]
    tex <- newTexture2D R8 (V2 8 8) 1
    let whiteBlack = cycle [minBound,maxBound] :: [Word32]
        blackWhite = tail whiteBlack
    writeTexture2D tex 0 0 (V2 8 8) (cycle (take 8 whiteBlack ++ take 8 blackWhite))

    colorTex <- newTexture2D RG8 (V2 256 256) 1
    depthTex <- newTexture2D Depth16 (V2 256 256) 1

    shader1 <- compileShader $ do
      texMappedFragmentStream <- getProjectedFragments 256 (V3 0.5 (-0.8) (-0.8)) (V3 0.5 0.5 0) (V3 0 1 0)  textureMappedPrimitives
      solidFragmentStream <- getProjectedFragments 256 (V3 (-0.6) (-0.6) 0.8) (V3 0.25 0.25 0) (V3 0 1 0) solidPrimitives
      let filter = SamplerFilter Nearest Nearest Nearest Nothing
          edge = (pure ClampToEdge, 0)
      samp <- newSampler2D (const (tex, filter, edge))
      let sampleTexture = sample2D samp SampleAuto Nothing Nothing
          texMappedFragmentStream2 = filterFragments ((>* 0.5) . sampleTexture) texMappedFragmentStream
          texMappedFragmentStream3 = fmap (const (V2 1 0)) texMappedFragmentStream2
          solidFragmentStream2 = fmap (const (V2 0 1)) solidFragmentStream
          fragmentStream = solidFragmentStream2 `mappend` texMappedFragmentStream3
          fragmentStream2 = withRasterizedInfo (\a r -> (a, rasterizedFragCoord r ^. _z)) fragmentStream
      drawDepth (\s -> (NoBlending, depthImage s, DepthOption Less True)) fragmentStream2 $ \ a -> do
        drawColor (\ s -> (colorImage s, pure True, False)) a

    shader2 <- compileShader $ do
      fragmentStream <- getProjectedFragments 800 (V3 1 2 2) (V3 0.5 0.5 0) (V3 0 1 0) id

      let filter = SamplerFilter Linear Linear Nearest Nothing
          edge = (pure ClampToEdge, 0)
      samp <- newSampler2D (const (colorTex, filter, edge))
      let sampleTexture = sample2D samp SampleAuto Nothing Nothing
          fragmentStream2 = fmap ((\(V2 r g) -> V3 r 0 g) . sampleTexture) fragmentStream
      drawWindowColor (const (win, ContextColorOption NoBlending (pure True))) fragmentStream2

    renderLoop win [
      do
        vertexArray <- newVertexArray vertexBuffer
        let singleTriangle = takeVertices 3 vertexArray
        cImage <- getTexture2DImage colorTex 0
        dImage <- getTexture2DImage depthTex 0
        clearImageColor cImage 0
        clearImageDepth dImage 1
        shader1 $ ShaderEnvironment
            (toPrimitiveArray TriangleStrip vertexArray)
            (toPrimitiveArray TriangleList singleTriangle)
            cImage
            dImage
      ,
      do
        clearWindowColor win 0.5
        vertexArray <- newVertexArray vertexBuffer
        shader2 (toPrimitiveArray TriangleStrip vertexArray)
      ]

getProjectedFragments size eye center up sf = do
  primitiveStream <- toPrimitiveStream sf
  let primitiveStream2 = fmap (\pos2d -> (make3d eye center up pos2d, pos2d)) primitiveStream
  rasterize (const (FrontAndBack, ViewPort (V2 0 0) (V2 size size), DepthRange 0 1)) primitiveStream2

make3d eye center up (V2 x y) = projMat !*! viewMat !* V4 x y 0 1
  where
    viewMat = lookAt' eye center up
    projMat = perspective (pi/3) 1 1 100

renderLoop win renderings = do
  mapM_ render renderings
  swapWindowBuffers win
  closeRequested <- GLFW.windowShouldClose win
  unless (closeRequested == Just True) $
    renderLoop win renderings

-- Copy of lookAt from linear with normalize replaced with signorm
lookAt' eye center up =
  V4 (V4 (xa^._x)  (xa^._y)  (xa^._z)  xd)
     (V4 (ya^._x)  (ya^._y)  (ya^._z)  yd)
     (V4 (-za^._x) (-za^._y) (-za^._z) zd)
     (V4 0         0         0          1)
  where za = signorm $ center - eye
        xa = signorm $ cross za up
        ya = cross xa za
        xd = -dot xa eye
        yd = -dot ya eye
        zd = dot za eye

data ShaderEnvironment = ShaderEnvironment
  {
    textureMappedPrimitives, solidPrimitives  :: PrimitiveArray Triangles (B2 Float),
    colorImage :: Image (Format RGFloat),
    depthImage :: Image (Format Depth)
  }

In this example, we draw a checker textured quad and a solid triangle onto an off-screen image (with two color channels). The quad will have its fragments discarded where the texture has values lower than 0.5. The triangle and the quad are intersected so we use depth testing on them with the help of an additional depth image. The off-screen color image is then in another render pass mapped onto a quad that is rendered to the screen, producing this final result:


We could also have stored the off-screen image to disc by using readTexture2D, and not drawn anything to the screen at all, but I leave that as an exercise for the reader.




We have now covered the entire graphical pipeline in GPipe, from context creation and vertex buffers, all the way through the Shader via PrimitiveStreams and FragmentStreams and have finally come out on the other end!

If you haven't already, now would be a good time to study the haddocks. Play around with GPipe's all functions and combinators. Remember: If it compiles it will most likely run. If it doesn't run, it should at least give you a nice error message. Good luck!

onsdag 21 oktober 2015

GPU programming in Haskell using GPipe - Part 4


< Previous episode: Shaders and primitive streams

In this part I will show you how to work with textures and samplers in GPipe. But before that, we should take a closer look at the FragmentStream that was created from the rasterization process that was shown all the way back in part 1, since most use cases for sampling textures are with fragment streams.

FragmentStreams

To recap from part 1: The rasterize function is what creates a FragmentStream from a PrimitiveStream:
rasterize :: forall p a s os. FragmentInput a =>
  (s -> (Side, ViewPort, DepthRange)) -> PrimitiveStream p (VPos, a) -> Shader os s (FragmentStream (FragmentFormat a))

It takes a primitive stream with vertices consisting of a tuple (position, a), and creates a fragment stream of FragmentFormat a values. FragmentFormat is an associated type of the FragmentInput type class, which is yet another one of those classes that use an arrow to convert values from one domain to another. In this case, it basically only converts S V a to S F a. S F Float values are created from interpolating the primitive's corners' vertices, while S F Int or S F Word values are simply copied from the last vertex of the primitive. There is also a Flat data constructor that can be used to suppress interpolation of S V Float values. They will turn into S F Float values in the fragments by just copying the last vertex's value in each primitive.

Remember that the vertices' positions are expressed as homogeneous 3D coordinates, i.e. V4 (S V Float), due to perspective correction. In perspective correct interpolation, each S V Float value will first be divided by the position's 4th component (w), and then interpolated. The interpolated value will then be divided by the interpolated reciprocal of w. This means that if w is the same for all three vertices of a triangle, the interpolation will be a simple linear one. To circumvent perspective correction for a specific S V Float value, you can wrap it in a NoPerspective data constructor, then it will just behave as if w was constant.

Textures

A texture is just like a buffer a storage of data on the GPU, but textures are optimized for being indexed in more than one dimension, and can also be dynamically indexed using S-values in a PrimitiveStream or FragmentStream. When sampling a value from a texture, the GPU also provides native hardware support for interpolating between multiple elements.

Just like buffers, textures are objects that we create in the ContextT monad. There are six kinds of textures in GPipe:
  • Texture1D os a
  • Texture2D os a
  • Texture3D os a
  • Texture1DArray os a
  • Texture2DArray os a
  • TextureCube os a
Just as buffers, they are parameterized on the object space, meaning they can not escape the context. They also have an element type (a), but this is not as flexible as it is for buffers, which we'll see if we study the functions that creates new textures:
newTexture2D :: forall ctx os c m. (ContextHandler ctx, TextureFormat c, MonadIO m) =>
  Format c -> Size2 -> MaxLevels -> ContextT ctx os m (Texture2D os (Format c)) Source
(Throughout this tutorial I will mostly use 2D textures as an example, but analogous functions for the other texture types exists as well. E.g. there exists a newTexture3D as well, that takes a Size3 instead of Size2 as argument.)

The element type of the new texture is denoted by a value of the type Format c, which is a GADT in the Graphics.GPipe.Format module. If you for instance use the format RGB8 as parameter to newTexture2D, the texture will have type Texture2D os (Format RGBFloat). From that type we can read that it contains three components (RGB) that will be represented as Float values when we sample it. Internally, each component will be stored as an 8-bit fixed point value in the range [0.0, 1.0]. If we instead would have used RGB32F, the resulting texture would have had the same type, but internally would have used 32-bits floating point values for each component instead which would yield better precision but cost three times as much memory (RGB8 is most likely padded with an extra byte internally).

newTexture2D also takes a size in form of a V2 Int (aka Size2) and a max level count as an Int (aka MaxLevels). A texture contains extra levels of detail (LODs), which are lower resolution versions of the texture, where each next level of detail has half the size in every dimension of the previous level. If maximum number of LODs are used, the last level will consist of a single texel (another word for texture element). LODs are used if the texture is sampled at a low frequency. E.g. if the texture is mapped on an object that is far away it would cover fewer pixels on screen than the texture contains, and without LODs some texels of the texture would be skipped over, resulting in aliasing in the rendered image. Most of the times all levels are not needed, which is why newTexture2D and friends take a MaxLevels argument. Number of levels is still always limited by the size of the texture, which you will get if you use maxBound for the MaxLevels argument. If you don't want extra levels of detail, you use 1 as the MaxLevels argument. The documentation for DirectX has some nice pictures of how the LODs look like for all the different texture types (just look at the pictures and don't bother too much with the details of the text since it is a bit DirectX specific).

You can use texture2DLevels (and similarly for other texture types) to retrieve the number of levels a texture was created with. There is also a texture2DSizes that return a list of all the LODs sizes, starting with the largest size (i.e. the size that was provided to the newTexture2D function).

Lower LODs are usually a down scaled version of the previous LOD, but this is not enforced by GPipe. Instead you can write each LOD individually, using functions like writeTexture2D or writeTexture2DFromBuffer:
writeTexture2D :: forall b c h ctx os m. (ContextHandler ctx, MonadIO m, BufferFormat b, ColorSampleable c, BufferColor (Color c (ColorElement c)) h ~ b, h ~ HostFormat b) => Texture2D os (Format c) -> Level -> StartPos2 -> Size2 -> [h] -> ContextT ctx os m ()
writeTexture2DFromBuffer :: forall b c h ctx os m. (ContextHandler ctx, MonadIO m, BufferFormat b, ColorSampleable c, BufferColor (Color c (ColorElement c)) (HostFormat b) ~ b) => Texture2D os (Format c) -> Level -> StartPos2 -> Size2 -> Buffer os b -> BufferStartPos -> ContextT ctx os m ()

The Level argument in both of these selects which level to write to, using a zero based index (i.e. 0 selects the level of highest resolution). Next two arguments are a start position and size within this level to write to.

For writeTexture2D, you then provide a list of h values, where h is constrained by this little riddle: BufferFormat b, ColorSampleable c, BufferColor (Color c (ColorElement c)) h ~ b, h ~ HostFormat b. In short, h is the second parameter in any of the equations in the BufferColor type family where the first argument matches the texture's element type representation. E.g. for Format RGBFloat we will use any of the BufferColor equations that has (V3 Float) as first parameter, so in that case a [V3 Float] or a [V3 Word8] or any of the 7 others would work.

For writeTexture2DFromBuffer, you provide a Buffer instead of a list of values. This buffer needs to have an element type that corresponds to one of the results of the BufferColor equations where the first argument matches the texture's element type representation. In the Format RGBFloat example, a buffer of type Buffer os (B3 Float) or Buffer os (Normalized (B3 Word8)), among others, would work. For writeTexture2DFromBuffer, you also need to provide a zero based start index in the buffer to start reading from (to achieve the same effect for writeTexture2D you just use drop on the list you provide).

Most of the times you only write LOD level 0 using any of the writeTextureXXX functions, and then generate the other levels. For this you simply call the generateTexture2DMipmap action and friends.

A texture has two purposes in GPipe: To be sampled from and/or to be drawn to. The next part of this tutorial will be devoted to the latter, so let's look at sampling now.

Samplers

A texture can be sampled dynamically using a S-value from a PrimitiveStream or FragmentStream. But in order to do this, we need to create a sampler from that texture, that in addition to the texture also has a filtering and edge mode setting. You create a sampler inside the Shader monad by any of the newSamplerXXX functions. Here's how the 2D version looks like:
newSampler2D :: forall os s c. ColorSampleable c =>
  (s -> (Texture2D os (Format c), SamplerFilter c, (EdgeMode2, BorderColor c))) -> Shader os s (Sampler2D (Format c))

It gets all its arguments from the shader environment: a texture, a filter and a tuple of an edge mode and a border color. Let's look at the filter first.

The SamplerFilter c type is a GADT with two constructors:
data SamplerFilter c where
  SamplerFilter :: (ColorElement c ~ Float) => 
                     MagFilter -> MinFilter -> LodFilter -> Anisotropy -> SamplerFilter c
  SamplerNearest :: SamplerFilter c

If you have a texture with a Float component format (e.g. RGB8 or RGB32F) then you may use either of the two constructors, but if your texture has Int or Word components you may only use the SamplerNearest constructor. This will effectively use no filtering at all: at each sample, the lod that closest matches the sampling frequency will be used and the nearest texel of that LOD will be picked. Sampling frequency is calculated from how far apart adjacent fragments samples the texture. If the texture is sampled in a primitive stream instead there is no notion of sampling frequency, and as we will see soon the types of GPipe will in that case require us to either pass in an explicit sampling gradient or explicitly select a LOD.

When you use SamplerFilter on the other hand you have more options. For each sample you make, there will always be one or two closest LODs based on the sampling frequency, and for each LOD there will be two closest texels per dimension to choose from (e.g. for a 2D texture there will be 4 texels to choose from and for 3D there will be 8). MagFilter, MinFilter and LodFilter are all type synonyms for the enum Filter. The MagFilter is used when the texture is sampled at a too high frequency (i.e. the texture cover more pixels on screen than lowest LOD has texels for). Using Nearest in this case will simply pick the closest texel of LOD 0 while Linear will use linear interpolation of the closest texels. When sampling at lower frequency the MinFilter will be used, in conjunction with the LodFilter. If MinFilter is Nearest a single texel will be picked from each of the two closest LOD levels, while if it is Linear a value will be linearly interpolated for each of the levels. Then LodFilter will dictate whether one of the two levels' values will simply be picked or if the result should be interpolated from both of them.

The last argument to SamplerFilter is Anisotropy, which is a type synonym for Maybe Float. Anisotropic filtering is actually not in the core OpenGl 3.3 specification but is defined in an extension, in fact the only one GPipe uses besides core functionality. It is useful when the frequency of the sampling varies for fragments in the x- and y-axises (e.g. when sampling the texture on a surface that slopes wrt the view, like the walls when looking down a corridor). In this case a too high LOD (i.e. of too low resolution) will been selected for one of the axises with unnecessary blurring as a result. When using Maybe a for Anisotropy, up to a samples will instead be taken from the line of anisotropy and the average will be returned as final sample value. Using Nothing for Anisotropy is equivalent to Maybe 1.0. I don't know why a is a Float, but that's how OpenGl likes it.

The last thing we needed to create a sampler was a tuple (EdgeMode2, BorderColor c) where EdgeMode2 is simply a type synonym for V2 EdgeMode. EdgeMode is used to instruct the sampler what to do if a coordinate outside the texture region is provided, and we can give different instructions for each dimension of the texture, hence an EdgeMode2. For a 3D texture, you would have provided an EdgeMode3 instead. EdgeMode is an enum with four possible values: Repeat, Mirror, ClampToEdge or ClampToBorder. Repeat will make the texture tile in the given dimension, while Mirror makes every other tile be mirrored. ClampToEdge will clamp all sample coordinates to the range of the texture so it simply returns the first or last value from the texture in the given dimension. ClampToBorder will make any samples outside the texture in the given dimension return a predefined constant color instead. This color is defined by the second element of the tuple and if neither of the dimensions uses the edge mode ClampToBorder, the border color can be given as undefined.

Since colors like the border color is such a central concept of GPipe it deserves a section of its own:

Colors

A texture, as well as the window surface our context owns, can have a variable number of color channels. A texture can also have different types of channels: Float, Int or Word (for the window's surface the channel type is always Float). In plain old OpenGl e.g. accidentally using a float border color with an int texture would result in an error. In GPipe on the other hand, these invariants are enforced through types, and your program will always work once it compiles.

The BorderColor c type we just saw is a type synonym that expands to Color c (ColorElement c). Several other similar type synonyms exists in GPipe:
type BorderColor f = Color f (ColorElement f)
type ColorSample x f = Color f (S x (ColorElement f))
type FragColor f = Color f (S F (ColorElement f))
type ColorMask f = Color f Bool

They all are a combination of the associated types Color f a and ColorElement f from the ColorSampleable type class, where f is one of the types used as parameter to the Format f GADT. E.g. the format RGB8 had the type Format RGBFloat, so in this case f would be RGBFloat.

Here are all the instances of the ColorSampleable type class and their associated types:

f
Color f a    
ColorElement f
RGBAFloatV4 aFloat
RGBAIntV4 aInt
RGBAWordV4 aWord
RGBFloatV3 aFloat
RGBIntV3 aInt
RGBWordV3 aWord
RGFloatV2 aFloat
RGIntV2 aInt
RGWordV2 aWord
RFloataFloat
RIntaInt
RWordaWord
DepthaFloat
DepthStencil   aFloat

For an RGB8 texture where f is RGBFloat then BorderColor RGBFloat = V3 Float, ColorSample x RGBFloat = V3 (S x Float), FragColor RGBFloat = V3 (S F Float) and ColorMask RGBFloat = V3 Bool.

Now if you look back at the type of writeTexture2D, and in particular the constraint BufferFormat b, ColorSampleable c, BufferColor (Color c (ColorElement c)) h ~ b, h ~ HostFormat b, you'll understand why we looked for equations in BufferColor where the first parameter was V3 Float for RGB8 textures.

Outside the shader you could use texture2DSizes and friends to get the sizes of a texture's all lods. Inside the shader you can instead get the size of a sampler's lod with sampler2DSize (and similar for other texture types).

Sampling

Let's put the sampler we just created to use!

{-# LANGUAGE ScopedTypeVariables, PackageImports, TypeFamilies #-}
module Main where

import Graphics.GPipe
import qualified "GPipe-GLFW" Graphics.GPipe.Context.GLFW as GLFW
import "lens" Control.Lens
import Control.Monad (unless)
import Data.Word (Word32)
import Control.Applicative (pure)

main =
  runContextT GLFW.defaultHandleConfig $ do
    win <- newWindow (WindowFormatColor RGB8) (GLFW.defaultWindowConfig "Checkers")
    vertexBuffer :: Buffer os (B2 Float) <- newBuffer 4
    writeBuffer vertexBuffer 0 [V2 0 0, V2 1 0, V2 0 1, V2 1 1]
    tex <- newTexture2D R8 (V2 8 8) 1
    let whiteBlack = cycle [minBound,maxBound] :: [Word32]
        blackWhite = tail whiteBlack
    writeTexture2D tex 0 0 (V2 8 8) (cycle (take 8 whiteBlack ++ take 8 blackWhite))
    shader <- compileShader $ do
      primitiveStream <- toPrimitiveStream id
      let primitiveStream2 = fmap (\pos2d -> (make3d pos2d, pos2d)) primitiveStream
      fragmentStream <- rasterize (const (FrontAndBack, ViewPort (V2 0 0) (V2 500 500), DepthRange 0 1)) primitiveStream2
      let filter = SamplerFilter Nearest Nearest Nearest Nothing
          edge = (pure Repeat, undefined)
      samp <- newSampler2D (const (tex, filter, edge))
      let sampleTexture = pure . sample2D samp SampleAuto Nothing Nothing
          fragmentStream2 = fmap sampleTexture fragmentStream
      drawWindowColor (const (win, ContextColorOption NoBlending (pure True))) fragmentStream2

    renderLoop win $ do
      clearWindowColor win 0.5
      vertexArray <- newVertexArray vertexBuffer
      shader (toPrimitiveArray TriangleStrip vertexArray)

make3d (V2 x y) = projMat !*! viewMat !* V4 x y 0 1
  where
    viewMat = lookAt' (V3 1 2 2) (V3 0.5 0.5 0) (V3 0 1 0)
    projMat = perspective (pi/3) 1 1 100

renderLoop win rendering = do
  render rendering
  swapWindowBuffers win
  closeRequested <- GLFW.windowShouldClose win
  unless (closeRequested == Just True) $
    renderLoop win rendering

-- Copy of lookAt from linear with normalize replaced with signorm
lookAt' eye center up =
  V4 (V4 (xa^._x) (xa^._y) (xa^._z) xd)
     (V4 (ya^._x) (ya^._y) (ya^._z) yd)
     (V4 (-za^._x) (-za^._y) (-za^._z) zd)
     (V4 0     0     0     1)
  where za = signorm $ center - eye
        xa = signorm $ cross za up
        ya = cross xa za
        xd = -dot xa eye
        yd = -dot ya eye
        zd = dot za eye

This simple example will create a small 8x8 texture and give it a checkers pattern. We create a square from 4 vertices, and use a rotation and a perspective projection matrix to transform it into our view, with this result:



I have highlighted the expression that samples the texture in the example above: sample2D samp SampleAuto Nothing Nothing. It evaluates to a function of type V2 (S F Float) -> ColorSample F RFloat, which simplifies to V2 (S F Float) -> S F Float. The input, commonly referred to as an uv-coordinate, is a 2D coordinate (since it's a 2D texture) where (0.0, 0.0) maps to the top left corner of the top-left most texel in the texture, and (1.0, 1.0) maps to the bottom right corner of the bottom-right most texel.

In addition to the parameters we gave when creating the sampler, you also give a bunch of parameters at the time of sampling to the sample2D function (and its 5 siblings for the other texture types). Specifically, you can specify a SampleLod, a SampleProj and a SampleOffset value.

SampleLod is used to control which lod should be used and is defined like this:
data SampleLod vx x where
  SampleAuto :: SampleLod vx F
  SampleBias :: FFloat -> SampleLod vx F
  SampleLod :: S x Float -> SampleLod vx x
  SampleGrad :: vx -> vx -> SampleLod vx x

SampleAuto and SampleBias are only available if you are sampling inside a fragment stream. In a vertex stream you have to resort to one of the last two. SampleAuto will simply select a lod based on the sampling distance between this and adjacent fragments. This requires you to not do the sampling inside a branch (such as ifB), or the sampling distance to adjacent fragments may become undefined and wrong lod may be used. SampleBias lets you add a bias to select a higher lod (i.e. with lower resolution) than automatically selected. This enables you to create cheap blurring effects (the higher bias, the more blur). SampleLod simply picks a lod without any automatic calculations, and with SampleGrad, you provide the gradients for the lod calculations yourself. If you really need to sample inside a branch, you can actually get the gradients for the coordinate using the functions dFdx and dFdy. These two functions also have the limitation that they need to be performed outside any branching or they'll also get undefined return values.

The SampleProj x type is a synonym for Maybe (S x Float). If you give a Just value as this parameter, then the input coordinate's components will be divided by this value. This might seem like an odd addition to the API, but the hardware is already doing divisions behind the curtains for perspective correction (which is done on hardware), so you get this extra division "for free" since it can just multiply it with the existing denominator. You can use SampleProj to zoom the texture for instance. Using Nothing for SampleProj is equivalent to Just 1.

The last parameter to sample2D besides the actual sampling coordinate is SampleOffset2 x, which is just a type synonym for Maybe (V2 Int). This lets you add a constant (note the type: Int and not S x Int) offset in texels to any sample made. The offset has to stay within certain hardware dependent limits, in practice within [-8,7].



A texture that has a Format Depth or Format DepthStencil format, sampling will return the depth value as a float in the range [0,1]. A common use case for depth textures are shadow maps, in which you rather want to compare the depth value with a reference. For this, you can create a specialized shadow sampler with newSampler2DShadow and friends. These actions also require a ComparisonFunction that will be used when sampling. Sampling with shadow samplers is not done with the usual sample2D function, instead you use sample2DShadow, and similar for other texture types. These shadow sample functions also take a reference value as input, and will compare the texels of the texture with this reference value, using the ComparisonFunction that the sampler was created with to result in a value of 0 or 1 for each texel. These will then be sampled with the samplers filter to produce a final sample float value in the range [0,1] with lower values being "more false" and higher ones "more true".



Before we end this chapter, you should play around with the different parameters I've presented. Try replacing (make3d pos2d, pos2d) with (make3d pos2d, fmap NoPerspective pos2d) to see why we need perspective correct interpolation of the uv-coordinates. How would a Linear magnifying filter look like? In vertexBuffer, you can change each element's component of value 1 with value 2 instead and then try other edge modes than pure Repeat. What if you specify different edge modes for x and y? Remember to replace undefined with an actual color if you try ClampToBorder



torsdag 1 oktober 2015

GPU programming in Haskell using GPipe - Part 3


< Previous episode: Buffers and arrays

Welcome to the Shader!

Finally! This is where the real fun happens! In previous episode we learned many different ways to create a PrimitiveArray. Remember from the first episode that this PrimitiveArray could be turned into a PrimitiveStream inside the Shader monad? Well, that's what we will do next.

Primitive streams

A quick recap from episode 1: This is how you create a PrimitiveStream: 
toPrimitiveStream :: forall os f s a p. VertexInput a => (s -> PrimitiveArray p a) -> Shader os s (PrimitiveStream p (VertexFormat a))

This is a Shader action that uses a function to get a PrimitiveArray from the shader environment, and then returns a PrimitiveStream. There are two things to remember about Shader actions:
  1. Whatever is returned from a Shader action can never leave the Shader monad. The only thing you can do with a Shader monad is to give it to compileShader, and that one is expecting a Shader os s (), i.e. it doesn't accept any return values.
  2. Whenever you see a Shader action wanting a function to retrieve some from the shader environment (in the toPrimitiveStream case, x is a primitive array), then x is something that may vary with different invocations of the shader without requiring another call to compileShader.
In the transition from array to stream, our vertices will turn from type a to type VertexFormat a. This is very similar to how buffers worked: there is a VertexInput type class that defines what may be turned into vertices in primitive streams:

class BufferFormat a => VertexInput a where
  type VertexFormat a
  toVertex :: ToVertex a (VertexFormat a)

toVertex is an arrow action, just as toBuffer was. If you want to create your own instance of VertexInput you define your toVertex member in terms of the toVertex members of the GPipe provided instances. Let's look at some examples of these instances:
a  VertexFormat f
B FloatS V Float
B Int32S V Int
B Word32S V Word
Normalized (B Int32)S V Float
Normalized (B Word32)S V Float
B2 FloatV2 (S V Float)
B2 Int32V2 (S V Int)
B2 Word32V2 (S V Word)
B2 Int16V2 (S V Int)
B2 Word16V2 (S V Word)
Normalized (B2 Int32)V2 (S V Float)
Normalized (B2 Word32)V2 (S V Float)
Normalized (B2 Int16)V2 (S V Float)
Normalized (B2 Word16)V2 (S V Float)
(a, b)(VertexFormat a, VertexFormat b)
V2 aV2 (VertexFormat a)
There are many more, including B3, B4 and larger tuples. See the full list on hackage.

Almost looks like a straight copy from the table in the previos episode, doesn't it? One important difference between the BufferFormat class and the VertexInput class is that the former was defined on the resulting type in the conversion and had an associated type for the originating type (HostFormat), while VertexInput is instead defined on the originating type and uses an associated type for the resulting type (VertexFormat). So in the table above, a B Float will turn into a S V Float and not the other way around.

S x a is what is called a lifted type, and represents an a in an x "setting" (to avoid using the word "context" that already has a distinct meaning in GPipe). The setting in this case is V, which you might have guessed stands for "Vertex". There is one other setting in GPipe and that is F that stands for Fragment.

You use an S x a almost just as you would use a normal a: For example S V Float has Num, Floating and Fractional instances so you can do things like add, divide or take sin of them.

Ints and Words looses their specific size when entering the world of shaders, so e.g. both Int16 and Int32 becomes just Int in an S-type.

There is a Normalized newtype wrapper that you can put an integral B-value in (when fmaping on the PrimitiveArray) to make it become a lifted floating value. The GPU will do this by mapping a signed (i.e. IntXX) integral's range [minBound, maxBound] to the floating point range [-1.0, 1.0] on the shader side, and an unsigned's (i.e. WordXX) range to [0.0, 1.0].

Buffer vector types (such as B2) will turn back into their normal V-vector form (e.g. V2), but with the elements converted to S-values. Normal V-vectors may also be used as input and will become the same V-vector of S-values on the shader side. The difference is that converting a V2 Int32 for instance would take two shader attribute slots while converting a B2 Int32 would only take one. Each graphics hardware has a limit on how many attribute slots can be used at once, and GPipe will detect if this limit is reached and throw an exception from the compileShader call in that case. To help staying under that limit, prefer using B-vectors instead of V-vectors in buffers and primitive arrays when possible. Actually, the topic of error handling in GPipe is so important that it should have its section of its own:

Error handling in Gpipe

There are three levels of errors you may encounter when working with GPipe:
  1. Type check errors
    This is the whole idea of GPipe: GHC will tell you when you try to do something that is statically known to not work (something that usually would be a runtime error when working with plain OpenGl). Most invariants in GPipe are checked this way.
  2. Runtime GPipeExceptions
    Whenever a hardware dependent limit is reached, like number of attributes used by a PrimitiveArray, a GPipeException is thrown. You may always catch one of those to maybe show a user friendly error message and/or fallback to a simpler solution.
  3. Runtime error calls
    For operations that involves indexing, e.g. writeBuffer, an error may be thrown if you use an index that is out of bounds (just as the !! operator on a list would). Since these kinds of errors are not hardware dependent and would always be thrown, they are considered to be programmer errors and as such are not guaranteed to be catchable in a safe way (e.g. the interior OpenGl state might be garbled after this).

Enough of that, back to the shader

So now we have a PrimitiveStream with vertices built up from S-values. A PrimitiveStream is a Functor and a Monoid, just as PrimitiveArray was, so you can fmap functions on it's vertices and mappend several streams into one. But since your vertices are made of S-values and not B-values, you can actually do computations on them that will be run as shader code on the GPU! Let's try it out! In the "Hello world" program you did in part 1 of this tutorial, add these two lines to your shader:
      let primitiveStream2 = fmap (\(pos,clr) -> (pos - V4 1 1 0 0, clr / 10)) primitiveStream  
      let primitiveStream3 = primitiveStream `mappend` primitiveStream2  

Also replace the last argument to rasterize from primitiveStream to primitiveStream3. Rebuild and you should see this picture:


It looks almost like the original one, but on top of the old triangle you see a new one, translated exactly 250 pixels down and left, and with one tenth of the brightness of the original triangle. Why 250 pixels? Remember from part 1 that the canonical view space is defined as [(-1,-1,-1), (1,1,1)], i.e. a cube with size 2, and since we used a view port of size 500 a move of 1 unit will translate to 250 pixels.

Why was the new triangle drawn on top of the old? Switch place with the operands to mappend and you'll see: the order of primitives in a PrimitiveStream (and also the fragments in a FragmentStream) matters. Since we didn't used any depth or stencil tests when drawing (I'll explain what that is later as well) the new triangle became on top because it was drawn later.

If you knew OpenGl from before, you might be interested in how the GLSL shader that drew these triangles look like. Actually, there wasn't one shader but two! This is where GPipe starts showing it's modularity and composability compared to plain GLSL code. Let's take it even further and add two more lines to the shader:
      let rotationMatrix a = V4 (V4 (cos a) (-sin a) 0 0)   
                                (V4 (sin a) (cos a) 0 0)  
                                (V4 0    0    1 0)  
                                (V4 0    0    0 1)  
      let primitiveStream4 = fmap (first (rotationMatrix (-0.2) !*)) primitiveStream3   

And don't forget to change the rasterize argument to primitiveStream4 now. When running, both triangles should have been rotated 0.2 radians clock wise around origo:


(You see the old bright triangle getting clipped on top and right because we use a view port that is smaller than the actual window.)

As you can see, in GPipe there is nothing stopping you from fmaping functions on streams more than once. How many GLSL shaders have GPipe created behind the curtains in this case? Still just two! If you are interested in how the shaders GPipe generates looks like and use Windows I can recommend the tool RenderDoc (a linux UI is also planned). With the latest version you can inspect any OpenGl 3.2+ application, and watch all API calls, all data and all GLSL shader source code.

A note on performance: Combining PrimitiveArrays with mappend has usually better performance than combining two PrimitiveStreams, so unless you need to fmap different functions on them do the mappend before they turn into streams with toPrimitiveStream.

Linear algebra in GPipe

If you've never done any kind of 3D programming before that last rotation code might be all new to you. In that case I'm afraid you'll have to pick up some basic linear algebra before venturing on, since that is essential for any graphics programming. E.g. you need linear algebra in your PrimitiveStream to transform your vertices from object space and world space to the canonical viewspace that the rasterizer expects. Linear algebra and 3D math is however out of scope for this tutorial. This seems to become a pretty good online book on the subject (at least I have enjoyed Akenine-Möller's other books before), but at the time of writing this it is not complete yet. If any reader has suggestions on good 3D math tutorials, then please share those in the comments field below!

Linear algebra in GPipe is preferably done with the linear package, since support for it's vector types are built in to GPipe. If you import the Graphics.GPipe module, you will even get all of linear imported for free!

When doing 3D math, there are a number of things that can be made in two different ways
  • Coordinate systems can be right handed or left handed
  • Transform matrices can be made to be multiplied with vectors from left or from right
  • Matrices can be defined in row major or column major order
The linear package is mostly mimicking what was the convention in early versions of Opengl: it uses a right handed system before perspective or orthographic projection (you look down negative z) but left handed after (far plane gets z=+1 and near plane z=-1), and you multiply any matrices returned from linear's functions such as mkTransformation from the right. Matrices are however unlike OpenGl defined in row major order in the linear package (which will require an extra transpose when building base matrices from left, up and forward vectors). 

Just remember that these assertions are a property of the functions of the linear package, and you are free to ditch those in favor of whatever linear algebra functions you like (the underlying OpenGl has no preferences really).

Uniforms

In the example above, we rotated the two triangles clock wise 0.2 radians. What if we want to increase the angle every frame? We can't make the angle parametric since the shader is precompiled, and there's no way to get an arbitrary value from the shader environment (there is no ask method, remember). Instead, you use uniforms!

A uniform in GPipe is a shader value that you get from a single buffer element. Uniforms in GPipe are built upon something that is called Uniform Buffer Objects (UBOs) in OpenGl, but don't worry if you don't know what that is. You get a uniform from a buffer with this shader action:
getUniform :: forall os s b x. UniformInput b => (s -> (Buffer os (Uniform b), Int)) -> Shader os s (UniformFormat b x)

Uh oh, another type class with another associated type? Yep, I told you that pattern was going to occur several times in GPipe. I think you know how that works by now so I'll just say it works almost exactly like VertexInput, except there are no instances for smaller Ints and Words than 32 bits. Check out the haddock for it here.

getUniform is one of those Shader functions that take a shader environment function as argument. In this case, that function is used to retrieve a tuple of a Buffer and an index into that buffer. The element type of the Buffer has to be Uniform b (where b is an instance of UniformInput). The reason for this is that uniforms have very restrictive requirements for alignment, and by creating a buffer with the element type wrapped in the newtype wrapper Uniform, you instruct GPipe to use that alignment. A buffer of this kind can still be used to create VertexArrays if you really wanted to (you can just remove the Uniform wrapper with a fmap on the VertexArray) but keep in mind that a buffer created with Uniform will have a quite large element alignment (usually 128 bytes, it's hardware dependent), and hence large padding for each element.

The uniform value you get with getUniform is just like the vertices in your PrimitiveStream built up from S-values, but it isn't bound to a stream of its own and can be used when fmaping with any other stream instead. Let's try it out:

{-# LANGUAGE ScopedTypeVariables, PackageImports, TypeFamilies, FlexibleContexts #-}

module Main where

import Graphics.GPipe
import qualified "GPipe-GLFW" Graphics.GPipe.Context.GLFW as GLFW
import Control.Monad (unless)
import Data.Monoid
import Control.Arrow (first)

main =
  runContextT GLFW.defaultHandleConfig $ do
    win <- newWindow (WindowFormatColor RGB8) (GLFW.defaultWindowConfig "Uniforms")
    vertexBuffer :: Buffer os (B4 Float, B3 Float) <- newBuffer 3
    writeBuffer vertexBuffer 0 [ (V4 (-1) 1 0 1, V3 1 0 0)
                               , (V4 0 (-1) 0 1, V3 0 1 0)
                               , (V4 1 1 0 1, V3 0 0 1)
                               ]

    uniformBuffer :: Buffer os (Uniform (B Float)) <- newBuffer 1

    shader <- compileShader $ do
      primitiveStream <- toPrimitiveStream id
      let primitiveStream2 = fmap (\(pos,clr) -> (pos - V4 1 1 0 0, clr / 10)) primitiveStream
      let primitiveStream3 = primitiveStream `mappend` primitiveStream2
      let rotationMatrix a = V4 (V4 (cos a) (-sin a) 0 0)
                                (V4 (sin a) (cos a) 0 0)
                                (V4 0    0    1 0)
                                (V4 0    0    0 1)
      uniform <- getUniform (const (uniformBuffer,0))
      let primitiveStream4 = fmap (first (rotationMatrix uniform !*)) primitiveStream3
      fragmentStream <- rasterize (const (FrontAndBack, ViewPort (V2 0 0) (V2 500 500), DepthRange 0 1)) primitiveStream4
      drawWindowColor (const (win, ContextColorOption NoBlending (V3 True True True))) fragmentStream

    loop vertexBuffer shader win uniformBuffer 0

loop vertexBuffer shader win uniformBuffer angle = do
  writeBuffer uniformBuffer 0 [angle]
  render $ do
    clearWindowColor win (V3 0 0 0)
    vertexArray <- newVertexArray vertexBuffer
    let primitiveArray = toPrimitiveArray TriangleList vertexArray
    shader primitiveArray
  swapWindowBuffers win

  closeRequested <- GLFW.windowShouldClose win
  unless (closeRequested == Just True) $
    loop vertexBuffer shader win uniformBuffer ((angle+0.1) `mod''` (2*pi))

(The additions for uniforms are highlighted)

First we create an additional buffer with just one element that we name uniformBuffer. We then get the value from this buffer as a uniform in the shader (using const since we always want to use this buffer and index 0). Last, we send uniformBuffer to the loop function along with an angle value so that we can write the angle value to that buffer each iteration. When we render our shader with shader primitiveArray, it will use the current value of the buffer that we just wrote. We then increment the angle value for the next recursion of the loop, with a constantly rotating image as result.

Working with lifted S-values

To avoid numerical instability after several lapses I used mod'' on the incremented angle to make it wrap back to 0 each lap. This function is a method of the Real' typeclass, defined in the Graphics.GPipe.Expr module. This type class, as well as the type classes Convert, Integral' and FloatingOrd, are provided since the prelude version is too tied to non-lifted real world types (e.g. the prelude Integral type class has a toInteger method that we couldn't have defined for S-values). These GPipe-specific type classes also have instances for normal types such as Float and Double, so that's why we could use mod'' in our example above.

Boolean operations in the Prelude are particularly non-supporting towards lifted types. For example ==, < and && all return non-lifted Bools. To alleviate this, GPipe uses the Boolean package which provides type classes that are parametric on the type of boolean, so that we can use S V Bool in conditional functions. All lifted versions of operators have a * postfix and all lifted version of functions have a B postfix. Boolean also provides an ifB function that works the same way as the normal if statement but on lifted booleans (just without the then and else syntax). Here's an example on a function that uses conditionals on S-values:
f :: S V Float -> S V Float -> S V Float
f x y = ifB (x <* y &&* x /=* 3) (x * 2) (minB y 3)

Besides ifB provided by Boolean, GPipe also defines three other ways of doing conditional branching:
ifThen :: forall a x. ShaderType a x => S x Bool -> (a -> a) -> a -> a
ifThenElse :: forall a b x. (ShaderType a x, ShaderType b x) => S x Bool -> (a -> b) -> (a -> b) -> a -> b
ifThenElse' :: forall a x. ShaderType a x => S x Bool -> a -> a -> a

They don't seem to bring much more than just another constraint than ifB, and they are indeed almost functionally equivalent, except in rare cases when using implicit derivatives in a FragmentStream; I'll cover that in next part of the tutorial. Another difference is that these functions in many cases generate more efficient shader code for the GPU than ifB. Rule of thumb is to use the most specialized if-function whenever possible, and basically only use ifB if you want a generic function that works on normal non-lifted values as well.

The ShaderType constraint is also used by this function:
while :: forall a x. ShaderType a x => (a -> S x Bool) -> (a -> a) -> a -> a

You can't use a regular recursive function to express loops with S-values, since you cannot select a base case to stop the recursion based on a lifted S x Bool value. Instead, you can use this while function, which takes a conditional function, a transformation function and an initial value. It then does the transformation repeatedly as long as the conditional function returns true. This loop is made on the GPU when running the shader, so if it never returns true your computer might hang (unless you have GPU hang detection like Windows which in most cases just kills your application). Here's an example of a function that multiplies a float value by 0.9 x number of times:
f :: S V Int -> S V Float -> S V Float
f x y = snd $ while ((<* x) . fst) (\(i,n) -> (i+1, n*0.9)) (0, y)

You may create your own instances of the ShaderType type class if you have a type which structure can be mapped to any of the existing instances'. Here is an example of such an instance:
{-# LANGUAGE FlexibleInstances, MultiParamTypeClasses, UndecidableInstances #-}
data MyType x = MyType (S x Float) (S x Int)

instance ShaderType (MyType x) x where
  type ShaderBaseType (MyType x) = ShaderBaseType (S x Float, S x Int)
  toBase x ~(MyType f i) = toBase x (f, i)
  fromBase x b = let (f,i) = fromBase x b in (MyType f i)

One thing to remember is to make toBase completely lazy, using tilde (~) in pattern matches and all. And yeah, UndecidableInstances is currently needed for own instances of ShaderType, sorry about that!

Combining Shader monads

Before we end this part of the tutorial, I just want to show you some cool stuff about the Shader monad.

First of all, in order to be able to reuse Shader actions in many different settings we need a way to change the shader environment. That is done with this combinator:
mapShader :: (s -> s') -> Shader os s' a -> Shader os s a

With mapShader you provide a function that extracts a sub environment from another environment, and a Shader monad that operates in that sub environment that then will be turned into a Shader that operates in the other environment.

The Shader monad is not only a monad, it is also a MonadPlus (and an Alternative which is basically the same thing). This enables us to create alternative branches with mplus (or <|>), and to discriminate among the branches with the guard function from Control.Monad. The left most branch without a guard False action will be the one that is run. (A Shader where all branches has guard False actions will throw an error.) This is all kind of cool, but what is even cooler is this:
guard' :: (s -> Bool) -> Shader os s ()

It looks almost exactly like guard, except that instead of a Bool it takes a function that retrieves a Bool from the shader environment. This enables us to select among branches at shader run time instead of shader compile time! You could for instance have a bool in the shader environment to denote whether we want to render shadows or not, and be able to toggle that in runtime.

Derived from mapShader and guard', GPipe defines these two useful shader combinators:
maybeShader :: (s -> Maybe s') -> Shader os s' () -> Shader os s ()
chooseShader :: (s -> Either s' s'') -> Shader os s' a -> Shader os s'' a -> Shader os s a

The first one, maybeShader, works almost like mapShader, only that the function provided only maybe returns a sub environment. If it doesnt (i.e. it returns Nothing) maybeShader does nothing. Since it might not run, no return value besides () can be expected.

chooseShader will run one of two provided Shader actions, each in a (possibly different) sub environment. Since exactly one of the actions always will be run, this may return a value as well.