Friday, April 01, 2011

Pointers

Welcome to Ziggyny's Coding Emporium. Today we're going to be discussing pointers using the random encounter system of the video game Final Fantasy as a starting point and ending up with the rough idea behind linked lists.

First off, how would you imagine the random encounter system works? Me, I'd think that every time you took a step the game would make a check. 10% chance that you get into a fight every step, say. When that check returned true, randomly figure out what the monsters were based on where the fight was taking place. I'd be wrong. It turns out that when the game is turned on and again after every random encounter the game picks  a number and stores that value in a variable, and every time you take a step that value gets decremented. When it hits 0 or lower you get into a fight. So as soon as a fight ends you know exactly how many steps it will take to get into your next fight. (Or at least you would if you had access to that variable.)

Only it turns out that number isn't random at all, either. The first fight on the overworld takes place after 15 steps. The next one is after 24 steps. Then 32. Then 29. It turns out there's a list of 90 step number for the overworld and the game just loops through them all. There are similar lists for ocean and dungeon fights. (Ocean fights are much less frequent, dungeons are slightly less frequent.)

When the fight starts the game picks a fight for you, but there's actually no randomness involved here either. Every zone has 8 possible fights it could be. The game has a list of 256 numbers. The first fight you get when you turn the power on is the fight associated with the first number in that list, which it turns out is 3. Then a 4, then another 3, then a 6, and so on. People have actually used this deterministic nature to plot out exact movement patterns through dungeons to get specific random encounters. (You can't run from some fights in the game, so if you want to beat the game at minimum level you need to never encounter those fights!) The only way to change what fight you're going to do is to change zones. You'll still fight the 3rd fight, it will just be the 3rd one from a different list.

What does all this have to do with pointers? Well, how are those 256 numbers actually stored? You could have a variable for each one and use one gigantic module to track where you are in the list but that's really inefficient. A trick you can use is to use an array which lets you use one variable to store all 256 numbers and then a second variable to store an index into the array. The index is simply a number which tells you where to look in your huge array for the data you need. The programming language takes into account where the data actually is in memory. All you care about is the variable for your array and the variable for your index. But what is the language actually doing? It actually treats all of memory as one big array of its own. The variable you use to refer to your array is just an index itself into the gigantic memory array. Your array variable, in a sense, is just a number.

In C, it's more than just a number in a sense. It's a number period. This number is called a pointer because it 'points' to the specific section of memory the language set aside for you but all it is is an index into an array. The memory system doesn't know what you've stored there. It doesn't care in the slightest. An interesting thing about arrays in C is there's no actual array substance at all. If you allocate the memory for, say, 256 8-bit values you might think it's doing something specific to break up a 2048 bit chunk of memory into a bunch of 8-bit sections for you, but it doesn't. It just takes a contiguous chunk of memory sized 2048, sets it aside, and tells you where it starts. How does the index work then? Well, the system does know the size of the values you're storing so it can work out where it is. A[0] starts right at the start of the array. A[1]? It's exactly 8 bits ahead of A[0] because you're storing 8-bit values. So the system actually turns A[1] into A+8*1. A[200]? It's at A+8*200. (Ever wondered why arrays start at index 0?)

So our array solution was actually a pointer solution without even knowing it!


This is all well and dandy when we know our data isn't going to change, but what happens if we're writing a more dynamic data system? One where we might want to add or delete items from the middle of the list? If you think about it, in our array solution a given entry in the array implicitly  'knew' where the next item in the array was. It was at the location in memory 8 bits ahead of where the current entry is. A[151] obviously follows A[150]. But what if we want to delete A[151]? (Our subscribers whined that the fights referred to by A[151] were too hard and demanded we nerf the game by taking them out.) With an array and the implicit pointer to the next entry we're in trouble. Our best bet is to copy the value from A[152] into A[151]. And then A[153] into A[152]. And so on until we've copied them all. And then we have to change our check for when we hit the end of the array to loop back to the start to check for 255 instead of 256. It's all a big mess.

Even worse, what if the word came down from the president of the company that the users want super-rare fights to spring up. They want a 257th possible outcome, and they want it to come after A[180]. Following the idea above, we could copy A[255] to A[256] and then A[254] to A[255] and so on until we've made room for the new value. Hold on now! We only asked the memory manager for 2048 bits of memory and now we're using 2054. In some languages this would result in an error of some kind. In C the system just doesn't care. It'll let you write into that section of memory regardless of what it's actually supposed to be. (This is how many security flaws came about, by sticking something really big into a section of memory and overwriting actual code!) This is obviously a big, big problem. But we can't just tell the president to screw off or we'll get fired. So we need to allocate 8 new bits of memory somewhere and then remember somehow that we go from A[180] to the new value to A[181]. Probably we do this with a bunch of if statements and eventually end up with terrible, terrible code.

The real problem above is that we were relying on implicit pointers to the next value. What if we could assign an explicit pointer to the next value? That way if we had to add or remove values in our list we could change the explicit pointers and keep the right connections around. Well, we know pointers are simply numbers, so we could change our data from being an 8-bit number to being an 8-bit number AND a pointer. The size of the pointer variable depends on the operating system, but if you're on a 32-bit computer it'll be 32 bits in size. On the NES I'd imagine it would have been 8 bits. This will double the amount of space our data will occupy but it will let us add or remove items easily.

So, instead of allocating one big block of 2048 bits we'd get the system to give us 256 blocks of 16 bits. The first 8 bits will have our number. The second 8 bits will be a pointer to the next allocated block. Maybe it's the next one in memory. Maybe it's somewhere else entirely. We don't care. All we care about is the location of the very first number. After we're done with that number we don't even care about that anymore. So we could update our pointer variable to bits 9-16 of the section of memory it's pointing to! We can even get rid of our 'end of list' check by making the last entry point back to the first one. Now if we need to add an extra number we can allocate space for it and adjust a couple pointers and we're done.


Linked lists are often represented something like above. Pointers are represented with arrows and are made to seem more complicated than they are. The above picture (stolen ruthlessly from wikipedia) has 3 distinct blocks of memory. The circles are just numbers. Numbers that can be thought of as an index into the gigantic array which is memory. You can change the number in the circle to change the index of the array. Or you can follow the arrow and change the value on the other side to change the value stored in the array. Getting the syntax right can be tricky with all the *s and &s but the important thing to keep in mind is pointers are just numbers. There's nothing mythical about them and they're so useful because they can be manipulated as numbers.

No comments: