The demo

All effects were written between 2014 and 2016. Initially I had no deadline to worry about, I just experiemented with different ideas. I watched a lot of demos on similar platforms such as the C64, MSX and Amstrad CPC for inspiration. I was a reluctant to ask for graphics and music since I didn't really know when (if ever) I was going finish the demo.

Around the end of 2015 I figured it was time to finish this project and decided to release the demo at Solskogen in July 2016.

In April 2016 I asked Kenneth (Response) if he wanted to make music for the demo. He also brought in Tom Frode (TFT) for some graphics work.

The demo itself was put together the last week before the party.

Intro


The intro part is just a simple slideshow loading the next picture while the current one is being shown. During the Tiki Tiki Ta-logo, the next two parts are loaded. I had plans crossfade between each picture but ran out of time.

Bouncing Ball

The bouncing texturemapped ball is implemented using the plane deformation effect described by Íñigo Quílez.

The ball is drawn using fake doublebuffering. Half of the framebuffer and half of the palette is reserved for displaying the ball. The other half of the the framebuffer is updated while the colors are set to the background color.

The ball is updated every 3 frames, but the scroll register is used to adjust the y position every frame. This makes the movement look a bit smoother than it really is.

A slowmo showing the ball being drawn every 3 frames

Planedeform

The plane deformation is implemented using a lookup table with pointers pointing into a texture. For each pixel, an x and y offset is added to the pointer to get the texel address. The x and y offset is adjusted each time the ball is drawn to move the texture around the ball. The texture is 32x64 pixels, with 5 bits used for the x offset and 6 bits used for the y address like this: .... .yyy yyyx xxxx. Overflowing the x offset is just ignored and results in the y offset being off by one. Overflowing the y offset is handled by duplicating the texture.

Since working with 4 bits per pixel with shifting and masking is painfully slow, the horizontal resolution is halved to get one byte per pixel.

The drawing code is inspired by return-oriented programming. Instead of storing the number of pixels to draw in a stride, the address into an unrolled drawing loop is stored instead. By pointing the stack pointer to this table, offsets can be read using the pop instruction and drawing is started by using the ret instruction.

Each stride in the lookup table looks like this:
        .dw     124
        .dw     planedeform_draw_step_1_num_8
        .dw     0xb29c, 0xb2dd, 0xb2fe, 0xb2ff
        .dw     0xb2e1, 0xb2e2, 0xb2c3, 0xb284
        .dw     planedeform_draw_step_1_next
At the beginning of the draw function, the stack pointer is stored in a temporary variable and set to the beginning of the lookup table. The code then jumps to planedeform_draw_step_1_next. The de register is used as the screen pointer and the bc register is used as the texture offset.
planedeform_draw_step_1_next:
        pop     hl      ; Load the screen offset
        add     hl, de  ; Add the screen offset to the screen pointer
        ex      de, hl  ; Move the result to de
        ret             ; Jump to the unrolled loop of draw instructions
The return instruction will in this case jump to planedeform_draw_step_1_num_8, which is a pointer in the middle of an unrolled loop with these instructions:
        pop     hl      ; Read the next texture offset from the lookup table
        add     hl, bc  ; Adjust the offset
        ld      a, (hl) ; Read the texel
        ld      (de), a ; Write the pixel
        inc     e       ; Move the screen pointer
After running these instructions 8 times, a return instruction will jump back to planedeform_draw_step_1_next. At the end of the table, a pointer to planedeform_draw_step_1_end is stored instead, where the stack pointer is restored and the function returns.

Writer

There's nothing much to say about the writer, it's pretty much a straight forward implementation. The next 3 parts are loaded when the screen is static.


Plasma




The plasma effect is one of the first effects I wrote. It is implemented by filling half the framebuffer with a dithered sine pattern and animating it by changing the scroll register every scanline using interrupts on HSYNC. To avoid ugly dither-artifacts, even scanlines always use an even line from the framebuffer and odd scanlines always use an odd line from the framebuffer.

To avoid artifacts at the left side of the screen when adjusting the scroll register at HSYNC, I had to leave black borders at the sides

The lower half of the framebuffer


Flames


The flame effect is using the same planedeform algorithm as the textureball with a few differences.

To make the effect run fast enough, only half the pixels on the screen are drawn. In addition to that, only a quarter of the pixels are updated each frame. This works just fine with the palette and texture used. No doublebuffering is used, the draw code runs faster than the beam.

The texture used by the draw code is 64x64 pixels, but this texture updated every frame from a packed texture of 64x256 pixels. The packed texture is stored using 2 bits delta encoding to save memory. The texture is converted from an image at compile time using lossy conversion.

The texture before delta encoding

The border transition is implemented by splitting an image into small chunks and drawing one chunk each frame.

Colorbars


This effect is implemented by filling the frame buffer with lines with widths from 64 to 120 pixels. The scroll register and the palette is updated each scanline to select the width and color using interrupts.

The animation and color of each face is precalculated at compile time. Some setup code which sets up registers and jumps to an unrolled draw loop is also generated.
The top of the framebuffer

This effect has more potential than what was used in the demo. It is possible to generate more faces and combine them in to create more complex movements. I am also fairly certain that it should be possible to store the generated faces in a more clever way.

Twister

The twister effect was written in the final week before deadline after a few failed experiments. The internal name is twister5. :)

I like twisters and wanted one in the demo. However, I did not want yet another scroll register-based effect, so I aimed for an horizontal one.

To make the effect run fast enough, only half the pixels are updated. Also, only the difference between each frame is actually drawn.

There are 64 different positions for for each pixel column (a quarter of a full rotation). For each position, 6 functions are generated to draw the difference with a delta of ±3. The drawing order of the pixels in each function is optimized by the code generator. The total size of the generated functions is about 11 kB.


Animation showing only the updated pixels each frame

Texturebars

The core of the texturebars effect is a bunch of generated functions that draws half a textured line. There are 31 possible widths and 62 functions in total. The functions all look like this:

; h = texture address (high byte)
; l = texture address (low byte) 
; stack pointer = video ram pointer
        ...
        ld      l, #0xcb
        ld      b, (hl)
        ld      l, #0xda
        ld      c, (hl)
        push    bc
        dec     l
        ld      b, (hl)
        ld      l, #0xc7
        ld      c, (hl)
        push    bc
        ...
The drawing functions use a push instruction to write two bytes (4 pixels), since this is the fastest way to write to memory. The generated code use dec l/inc l instead of loading an immediate value whenever possible to save both time and space. The total size of the generated drawing code is about 8 kB.

Since the drawing functions work on bytes but each pixel is 4 bits, the texture has to contain all two-texel combinations used. Each line also has to be stored 8 times, once for each intensity level. The total size of the texture in memory is 8 kB (the original texture is 32x16 pixels).

The animation is precalculated and compressed frame by frame at compile time and is decompressed at each frame. 

To make this effect run reasonably fast it only draws every other line. It also only run in 25 Hz and uses the same doublebuffer-technique as the textureball effect.

Image


The still image of the tiki figure was needed to have some time to load the final part.

The in-transition is basically the same routine as the one that draws the flame border, but the image is split in a different way. The transition is generated by using a grayscale image with the same size as the tiki figure and then splitting the tiki figure into small pieces based on the intensity of the grayscale image.

The image used when generating the transition
The out-transition just uses the scroll register to squeeze the image.

Rollercoaster



The rollercoaster effect is displayed by filling the framebuffer with black and white horizontal lines and changing the scroll register every scanline to select which line to display. The horizon is shown by changing the background color. The track data is precalculated and unpacked at runtime. 

The framebuffer
The track is defined as a sequence of straight lines and bezier curves. Each frame is raytraced (in 2D) to get the width and color for each line. Raytracing was chosen to get a more smooth track, which compresses better.

The lines and colors for each frame are compressed seperately. The lines are stored as a combination of straight lines, horizontal lines, a copy of the previous frame and quadratic curves. To get a better compression ratio some errors are tolerated. The colors are stored as a combination of a copy of the previous frame, bitfields and RLE. In total there are about 2500 frames stored in 45 kB, or about 18 bytes per frame.

The middle section of the track is looped 3 times to make the part long enough for the music.

The tool used for visualizing the track before rendering

Ingen kommentarer:

Legg inn en kommentar