Demo engine

The demo engine consists of the bootloader, loader, and a library.

Bootloader

The bootloader is located on the first sector (512 bytes) on the disk and is loaded by the ROM at address 0xFB00. The bootloader copies itself to address 0x8000 to be able to write to the screen (although it doesn't do that in the demo).

The next thing the bootloader does is to check if the VSYNC signal is available. If the signal is not found it prints an error message and halts.

Then the bootloader reads the loader and the library from the disk and jumps to the loader.

Loader

The loader is responsible for initializing the music, loading data from disk and running all the parts. It is located at address 0x0000-0x03FF. The disk layout is specified using a file with start position (track/side/sector) for each memory blob. The memory blob file contains the destination address and either compressed or uncompressed data.. A compressed memory blob can be loaded and uncompressed in-place, no extra memory is required.

Library

The library is split into two parts, one in the upper half of the memory which can be used at any time and one in the lower half of the memory which only can be used when the video RAM is not mapped in.

The low part is located at address 0x0400-0x2FFF and contains floppy routines, the music player and the song data. The song data is swapped before the final part to save memory, so during the final part only the memory up to 0x2000 is used.

The high part is located at address 0xF000-0xFEFF. The first 2 kB from 0xF000 to 0xF7FF are used as a buffer by the player (see the player section below). The rest is used for functions such as:
  • Video sync and other graphics functions
  • Accessing the system register
  • Unpacker (with or without music playback)
  • Memcopy and memset (with or without music playback)
  • Setting the AY registers from the ring buffer

Player

For playing the music the PT3 player by Sergey Bulba was used. The player was ported from the ZX Spectrum source.

The PT3 player originally use both the af' shadow register and the stack pointer for fast memory writes. This had to be rewritten since it was not compatible with my use of interrupts.

I also separated the AY sound register write code from the rest of the code to get more accurate playback and to be able to pre-render the register values. By using a 2048 byte ring buffer and 16 bytes for each entry (14 register values + 2 bytes for song position) about 2.5 seconds of music is precalculated. This was originally done to make the trackloader work, but was also a really nice when I had to switch songs before the final part. It was also easier to write effects since I could write the registers at VSYNC and then render the music anywhere within the frame.

The player uses at most about 11000 cycles per tick (about 1/8th of a frame). All effects were written and tested with a test song running to be sure that they didn't use too much CPU. 

Trackloader

Using TIKO for disk loading was out of the question early on. The TIKO code disables interrupts which meant that even playing music was impossible.

Programming the floppy controller directly was fairly easy. I used both the floppy controller documentation and disassembled the ROM to figure out how it worked. It took a few tries to get the emulator right (enough) though.

Reading the floppy and doing other things at the same time turned out to be much harder than first anticipated. Even though the floppy controller has an interrupt line to generate an interrupt when a byte is available, this line is not connected to anything on the Tiki. Reading the floppy has to be done by polling the status register and reading a byte when the data ready bit is set. A new byte is ready about every 128 cycles.

We interrupt this broadcast

My first approach was to use the timer to generate an interrupt slightly faster than every 128 cycles. Syncing to the screen and playing music was done by manually reading the parallel port (the VSYNC bit stays on for about 1024 cycles, so this works fine).

Since there was not enough time to check the data ready bit, the interrupt handler just reads the data register and stores the data in a buffer. This meant that some bytes were read twice. To handle this I had to make sure the data had no repeating bytes and then discard duplicate bytes when unpacking.

This worked fine in the emulator, but on real hardware there was a problem. It turned out reading the floppy data register when the ready bit is not set returns an undefined value. I had to do this in another way.

Slice and dice

The next approach was to split 2 kB of data into smaller chunks. The chunks were then spread around a track with each chunk being stored at least twice. The data was read by reading small blocks of data (144 bytes) at a time. Since this resulted in a lot of lost chunks, the read operation had to be restarted several times until all the chunks had been read. This worked, but was too slow (about 1 kB/second).

State of the union

The final approach was to get rid of interrupts altogether. I had to rewrite the PT3 player a bit to support pre-rendering (read the player section for more info).

This approach is best explained by looking at the floppy reader and music player as two separate state machines where each state is a tiny piece of code. With small enough states, a new single state machine with all the possible state combinations can be created. In total the combined state machine has 124 states (4 for the reader and 31 for the player).

This was implemented using an ugly mess of assembler macros. 

After some tricky debugging this worked as intended and reading the floppy was about as fast as possible. It also meant that the screen would have to be static during loading, but that was an acceptable tradeoff.

Ingen kommentarer:

Legg inn en kommentar