Maelstrom for Gameboy Advance - New Features

Contents

Introduction
New Features
General Speedups
Functions
#defines
Credits

Introduction

This document covers the changes made between Maelstrom V1 and V1.1. You should also refer to the main documentation for information on how Maelstrom as a whole was implemenented.

New features

General Speedups

Many small changes have been made to the code in general to improve efficiency without too much impact on readability. Firstly, some some very small functions, such as translating, have now been incorporated into other functions. Also, many variables that are accessed frequently have been changed to Int even if they fit into smaller types (u8 or u16, for example) as the hardware can access these more quickly.

Repetition of code has been used to avoid repeated comparisons, particularly in the line drawing code, as this executes faster. Some experimentation was also done to determine whether pre-calculating some values and storing them in variables was faster than re-calculating each time (for example, calculating and storing a bit-shifted value for color in the drawing function). It turned out that performing simple calculations repeatedly was much faster than performing additional memory accesses. The compiler may also run out of hardware registers when there are more variables.

The final step was to place time critical functions into IWRAM to improve performance. The most obvious candidate was the BLine line drawing function, but other functions which we spend a lot of time in (rotatetunnel() and transformenemies()) were also placed here. This has no effect when running on an emulator, but on real hardware reduces each memory access from three to one cycles (The figure of three applies to running in multiboot mode, in which case program code would be located in EWRAM (External Work RAM) by default. If running from cartridge, the figure varies, and we also incur time penalties for jumping, such as in For loops. (See "Programming the Nintendo Gameboy Advance: The Unofficial Guide", chapter 2, page 10 for more information.) Placing code in IWRAM eliminates these concerns.)

I also considered recompiling these time-critical functions in ARM instead of THUMB code, but ultimately decided that there would be no real speedup, partly due to the functions not dealing with a great deal of 32 bit data, which ARM excels at (Thanks to all the people on the HAM forum who contributed to this discussion).

HAM's profiling functions were used to determine the speedup effect of any changes made. Notice that the VisualBoyAdvance emulator runs much faster than an actual GBA for most tasks, particularly when in bitmap modes as here, so actual ticks in the emulator are not an accurate representation of the ticks elapsed on real hardware. However, it has generally been assumed that a speedup of 10% in the emulator will give 10% on the real hardware.

Functions

Only new or changed (since V1.0) functions are listed here. Please refer to the original documentation for information on how Maelstrom as a whole is written.

BLine

The line drawing function has been accelerated by removing the use of multiplications. Previously, the code would work out the address of each pixel from scratch each time, which involved a multiply. Since the pixels in lines are always adjacent to the previous and next pixels, only the first pixel needs to be calculated from scratch. Subsequent ones can be calculated simply using addition and subtraction. So, moving in Y requires an increment or decrement by 120 (because each address is actually two pixels), while moving in X requires us to keep track of whether our pixel is odd or even (and hence whether to write to the high or low byte in the current address), and moving to the next memory address if necessary. This technique speeds up line drawing by about 10% in the VisualBoyAdvance emulator.

Additionally, when dx is dominant, we can draw two pixels at once with a single operation when an odd then an even pixel are on the same row. This is due to each 16 bit memory address representing two pixels. However, actually implementing this technique causes the lines to "step" too much, so we have left it defaulted to a slightly simpler method. Even the faster method is not quick enough to increase the framerate from 20 to 30 FPS, so pursuing the problem would not be particularly fruitfull. The faster method (giving about 5% increase in speed) can be enabled by defining FASTDRAW.

Aside: Experimentation with the double speed, symmetrical line drawing algorithm from Graphics Gems (Glassner) provided surprisingly little acceleration in performance, being roughly equivalent to the algorithm used here, but using a much more complex algorithm. The double speed algorithm may decide on four pixels to draw for every loop, but needs to perform several comparisons and store more variables in order to achieve this. Depending on architecture, referencing a large number of variables will incur performance penalties as they can no longer be stored solely in the processor's internal registers, necessitating memory accesses. Therefore, optimisations to simple routines such as this can yield the same speed with greater ease.

Printscreen(char *string, int x, int y)

This function prints the supplied string to the screen. It passes through the string one character at a time and finds the correct character to print using a case statement. The characters are stored in the Character structure, which contains the sequence of points in the character. The function draws lines between consecutive pairs of points to complete the shape. Notice that some models repeat the first point in the last point. This is for closed shapes such as 0's. Also, the function assumes the string will fit on the screen and performs no culling.

Precalcnormals()

This function creates the normal vector for each tri, then each quad in turn using the Cross product rule. The results are stored in the Model structure. Notice the use of triangles AND quads to define the models. Quads and tris are used rather than tris alone because this removes the need to create flat rectangular surfaces out of multiple triangles. This saves time in two ways. Firstly, there is only one normal to work with, rather than two, and secondly we save ourselves from an unwanted internal line that would cross the quad's diagonal if we used two triangles. This line could be removed by adding flags to each line to indicate internal lines, but given the advantage of removing one normal, the solution used here seems superior.

Expireandcreateenemies()

This function has now been changed to generate a random number to make a choice between two models each time it generates a new enemy; either a cube or a pyramid.

Transformenemies()

The Transformenemies() function has now been changed to apply the same rotations to the tri and quad normals as it does to the vertices of the models. Notice that only the orientation of the normals are important and therefore the translations are not applied. As with the vertices of the model, the resultant transformed normals are stored in each individual ship instance.

Precalclines()

In order to best calculate the surface normals of the polygons of an object, we need to define the vertices of each in a given "winding" direction (in this case, anti-clockwise). Therefore, models are defined with the tri and quad fields referencing point indices winding in this direction when viewed from the intended visible side. The normals can then be calculated using the Cross product.

However, when we wish to draw the polygons, it is inefficient to simply go through each polygon and draw lines between each point in turn, as most lines would be shared between polygons and therefore drawn twice. What the Precalclines() function does is go through each polygon, and finds the lines required for each. It compares each line it finds with those already stored in a list. If it doesn't already exist, it is added to the list. Otherwise, it is not. Either way, the polygon then has its references re-written to point to the correct lines, rather than points. So, once the first line in a tri is calculated, the reference for the first line overwrites the reference for the first point, since this point data is now stored in the Line list and the vertex order is no longer required to calculate the normal.

So, for the first poly, we find line 0 to 1. It isn't already in the list, so we add it. The polygon then has '0' written to mtri[0][0], pointing it to the line just created, instead of a vertex. This is then repeated for the other lines in the poly, then the other polys. The other polygons will probably share lines, and hence won't add new items to the list in these cases, instead simply referencing lines that are already there.

Findvisible()

All lines in each ship instance are initially set to not visible. The function then uses the normal from each polygon, together with the view vector (the position of any vertex on the polygon, since our eye is at the origin), to determine whether the poly is visible or not. If it is, the associated lines are flagged as being visible. Notice that each line may be flagged as visible by multiple polygons, but this has very little overhead compared to rendering each line multiple times.

The models are then drawn by rendering only the enabled lines using the Drawenemies() function.

Destroythings()

This function searches for collisions between us and enemies, and between enemies and our shots.

Firstly, it compares each enemy in turn for collision with our ship. It tests first to see if it is on the same polygon as us, and only if this is true does it move onto a more detailed test. This involves adding together the Collision distances(the depth of the front/back most point on an object) defined as part of each model to form a "collision distance" (as opposed to collision volume). When the distance between the models is less than this distance, we have collided. This causes our ship to be made invalid, which triggers a routine in the Mainloop() to deal with loosing a life.

We then loop through all Valid shots and test them against each enemy to determine if they collided. If so, they are both made invalid. This time, the collision detection also tests for intersection of the bounding distances behind the enemy (this is not necessary with our ship, as it is always closest to us).

Mainloop()

When our ship collides with an enemy, as detected in Destroythings(), its Valid flag is cleared. This triggers a change in Mainloop's behaviour, ceasing the handling of the objects in the scene and instead handling the loss of a life. This involves:

#defines

The following defines are present for debugging purposes:

RELEASE: When defined, disables the debug printing to console, including the profiler output, as this slows down execution and prevents the executable from running on real GBA hardware. This has been enabled in the supplied executable.

INDESTRUCTIBLE: Prevents the test for our own destruction from being carried out, allowing us to move around the tunnel and shoot without having to avoid the enemy. This has been disabled in the supplied executable.

FASTDRAW: Increases line draw speed at the cost of visual quality. This has been disabled in the supplied executable.

Credits

Programming the Nintendo Gameboy Advance: The Unofficial Guide http://www.jharbour.com/gameboy/

3D Computer Graphics by Alan Watt.

Graphics Gems by Andrew S. Glassner

Emanuel Schleussinger for writing HAM

Thanks to all the great people on the HAM forum at: http://www.ngine.de

and also the GbaDev forum at: http://forum.gbadev.org

Dave Theurer for writing Tempest in the first place!