Maelstrom for Gameboy Advance - New Features
Contents
Introduction
New Features
General Speedups
Functions
#defines
Credits
This document covers the changes made between
Maelstrom V1 and V1.1. You should also refer to the main documentation
for information on how Maelstrom as a whole was implemenented.
- Scoring and collision detection with our
ship; we can now be destroyed!
- Simple line draw speedup (removing
multiplications, adding double pixel draw where appropriate)
- Text system. Storage of symbols and drawing
mechanism
- Collision detection of enemies with us using
size of each model involved
- Backface culling. Finding visibility with
dot product.
- Combining some transformation functions to
remove unnecessary loops and function calls. For example, combining
Translatetunnel() with Rotatetunnel() and Translateenemies() with
Transformenemies().
- Placing some time-critical functions into
IWRAM (Internal Work RAM) for increased speed (due to zero wait states).
Many small changes have been
made to the code in general to improve efficiency without too much
impact on readability. Firstly, some some very small functions, such as
translating, have now been incorporated into other functions. Also,
many variables that are accessed frequently have been changed to Int
even if they fit into smaller types (u8 or u16, for example) as the
hardware can access these more quickly.
Repetition of code has been
used to avoid repeated comparisons, particularly in the line drawing
code, as this executes faster. Some experimentation was also done to
determine whether pre-calculating some values and storing them in
variables was faster than re-calculating each time (for example,
calculating and storing a bit-shifted value for color in the drawing
function). It turned out that performing simple calculations repeatedly
was much faster than performing additional memory accesses. The
compiler may also run out of hardware registers when there are more
variables.
The final step was to place
time critical functions into IWRAM to improve performance. The most
obvious candidate was the BLine line drawing function, but other
functions which we spend a lot of time in (rotatetunnel() and
transformenemies()) were also placed here. This has no effect when
running on an emulator, but on real hardware reduces each memory access
from three to one cycles (The figure of three applies to running in
multiboot mode, in which case program code would be located in EWRAM
(External Work RAM) by default. If running from cartridge, the figure
varies, and we also incur time penalties for jumping, such as in For
loops. (See "Programming
the Nintendo Gameboy Advance: The Unofficial Guide", chapter 2,
page 10 for more information.) Placing code in IWRAM eliminates these
concerns.)
I also considered recompiling these
time-critical functions in ARM instead of THUMB code, but ultimately
decided that there would be no real speedup, partly due to the
functions not dealing with a great deal of 32 bit data, which ARM
excels at (Thanks to all the people on the HAM forum who contributed to this
discussion).
HAM's profiling functions were used to
determine the speedup effect of any changes made. Notice that the
VisualBoyAdvance emulator runs much faster than an actual GBA for most
tasks, particularly when in bitmap modes as here, so actual ticks in
the emulator are not an accurate representation of the ticks elapsed on
real hardware. However, it has generally been assumed that a speedup of
10% in the emulator will give 10% on the real hardware.
Only new or changed (since
V1.0) functions are listed here. Please refer to the original documentation for information
on how Maelstrom as a whole is written.
The line drawing function has been
accelerated by removing the use of multiplications. Previously, the
code would work out the address of each pixel from scratch each time,
which involved a multiply. Since the pixels in lines are always
adjacent to the previous and next pixels, only the first pixel needs to
be calculated from scratch. Subsequent ones can be calculated simply
using addition and subtraction. So, moving in Y requires an increment
or decrement by 120 (because each address is actually two pixels),
while moving in X requires us to keep track of whether our pixel is odd
or even (and hence whether to write to the high or low byte in the
current address), and moving to the next memory address if necessary.
This technique speeds up line drawing by about 10% in the
VisualBoyAdvance emulator.
Additionally, when dx is
dominant, we can draw two pixels at once with a single operation when
an odd then an even pixel are on the same row. This is due to each 16
bit memory address representing two pixels. However, actually
implementing this technique causes the lines to "step" too much, so we
have left it defaulted to a slightly simpler method. Even the faster
method is not quick enough to increase the framerate from 20 to 30 FPS,
so pursuing the problem would not be particularly fruitfull. The faster
method (giving about 5% increase in speed) can be enabled by defining
FASTDRAW.
Aside:
Experimentation with
the double speed, symmetrical line drawing algorithm from Graphics Gems
(Glassner) provided surprisingly little acceleration in performance,
being roughly equivalent to the algorithm used here, but using a much
more complex algorithm. The double speed algorithm may decide on four
pixels to draw for every loop, but needs to perform several comparisons
and store more variables in order to achieve this. Depending on
architecture, referencing a large number of variables will incur
performance penalties as they can no longer be stored solely in the
processor's internal registers, necessitating memory accesses.
Therefore, optimisations to simple routines such as this can yield the
same speed with greater ease.
This function prints the
supplied string to the screen. It passes through the string one
character at a time and finds the correct character to print using a
case statement. The characters are stored in the Character structure,
which contains the sequence of points in the character. The function
draws lines between consecutive pairs of points to complete the shape.
Notice that some models repeat the first point in the last point. This
is for closed shapes such as 0's. Also, the function assumes the string
will fit on the screen and performs no culling.
This function
creates the
normal vector for each tri, then each quad in turn using the Cross
product rule. The results are stored in the Model structure. Notice the
use of triangles AND quads to define the models. Quads and tris are
used rather than tris alone because this removes the need to create
flat rectangular surfaces out of multiple triangles. This saves time in
two ways. Firstly, there is only one normal to work with, rather than
two, and secondly we save ourselves from an unwanted internal line that
would cross the quad's diagonal if we used two triangles. This line
could be removed by adding flags to each line to indicate internal
lines, but given the advantage of removing one normal, the solution
used here seems superior.
This function has
now been
changed to generate a random number to make a choice between two models
each time it generates a new enemy; either a cube or a pyramid.
The
Transformenemies()
function has now been changed to apply the same rotations to the tri
and quad normals as it does to the vertices of the models. Notice that
only the orientation of the normals are important and therefore the
translations are not applied. As with the vertices of the model, the
resultant transformed normals are stored in each individual ship
instance.
In order to best
calculate
the surface normals of the polygons of an object, we need to define the
vertices of each in a given "winding" direction (in this case,
anti-clockwise). Therefore, models are defined with the tri and quad
fields referencing point indices winding in this direction when viewed
from the intended visible side. The normals can then be calculated
using the Cross product.
However, when
we wish to
draw the polygons, it is inefficient to simply go through each polygon
and draw lines between each point in turn, as most lines would be
shared between polygons and therefore drawn twice. What the
Precalclines() function does is go through each polygon, and finds the
lines required for each. It compares each line it finds with those
already stored in a list. If it doesn't already exist, it is added to
the list. Otherwise, it is not. Either way, the polygon then has its
references re-written to point to the correct lines, rather than
points. So, once the first line in a tri is calculated, the reference
for the first line overwrites the reference for the first point, since
this point data is now stored in the Line list and the vertex order is
no longer required to calculate the normal.
So, for the
first poly,
we find line 0 to 1. It isn't already in the list, so we add it. The
polygon then has '0' written to mtri[0][0], pointing it to the line
just created, instead of a vertex. This is then repeated for the other
lines in the poly, then the other polys. The other polygons will
probably share lines, and hence won't add new items to the list in
these cases, instead simply referencing lines that are already there.
All lines in
each
ship
instance are initially set to not visible. The function then uses the
normal from each polygon, together with the view vector (the position
of any vertex on the polygon, since our eye is at the origin), to
determine whether the poly is visible or not. If it is, the associated
lines are flagged as being visible. Notice that each line may be
flagged as visible by multiple polygons, but this has very little
overhead compared to rendering each line multiple times.
The models
are
then drawn
by rendering only the enabled lines using the Drawenemies() function.
This function
searches for
collisions between us and enemies, and between enemies and our shots.
Firstly, it
compares
each enemy in turn for collision with our ship. It tests first to see
if it is on the same polygon as us, and only if this is true does it
move onto a more detailed test. This involves adding together the
Collision distances(the depth of the front/back most point on an
object) defined as part of each model to form a "collision distance"
(as opposed to collision volume). When the distance between the models
is less than this distance, we have collided. This causes our ship to
be made invalid, which triggers a routine in the Mainloop() to deal
with loosing a life.
We then
loop
through
all Valid shots and test them against each enemy to determine if they
collided. If so, they are both made invalid. This time, the collision
detection also tests for intersection of the bounding distances behind
the enemy (this is not necessary with our ship, as it is always closest
to us).
When our ship
collides
with an
enemy, as detected in Destroythings(), its Valid flag is cleared. This
triggers a change in Mainloop's behaviour, ceasing the handling of the
objects in the scene and instead handling the loss of a life. This
involves:
- Strobing
the
screen
display. This is achieved by using HAM's ham_FadePalCol() command to
fade out that colour in the palette. We call it a large number of
times, causing the colour to strobe. (The colours produced by changing
the pallette entry in this way are fairly unpredictable, but we have
used this as a quick way of flashing the display without paying too
much attention to how it actually works)
- Decrementing our lives
counter.
- If lives is greater than
0, make our ship Valid again, and invalidate all other objects (remove
enemies and shots). The game will now restart from where it left off,
with the player in the same place, but generating new enemies.
- Else, we have run out of
lives. The colour strobing continues and the screen display remains
frozen. When a key is pressed, the game will start again. Position,
zoom, lives, score and enemies will all be reset.
The following
defines
are
present for debugging purposes:
RELEASE:
When defined,
disables the debug printing to console, including the profiler output,
as this slows down execution and prevents the executable from running
on real GBA hardware. This has been enabled in the supplied executable.
INDESTRUCTIBLE:
Prevents
the test for our own destruction from being carried out, allowing us to
move around the tunnel and shoot without having to avoid the enemy.
This has been disabled in the supplied executable.
FASTDRAW:
Increases line
draw speed at the cost of visual quality. This has been disabled in the
supplied executable.
Programming the
Nintendo Gameboy
Advance: The Unofficial Guide
http://www.jharbour.com/gameboy/
3D Computer Graphics by Alan Watt.
Graphics Gems by Andrew S. Glassner
Emanuel Schleussinger for writing
HAM
Thanks to all the great people on
the
HAM forum
at: http://www.ngine.de
and also the GbaDev forum at: http://forum.gbadev.org
Dave Theurer for writing Tempest in
the first
place!