A couple nights ago, I was trying to explain to my mom what I was programming, the associative array protocol. I explained it badly, but a better way to put it would be this: imagine you're computerizing a library. The main thing you have to do is make it so that people can look up the call number of books. Everyone already knows the title and correct spelling of the book they want to get. Your job is to write a program that, given the title, will look up the call number for you. It has to do this quickly, even if there are a lot of books in the library. Of course, to do this, you need an associative array. This is obvious to anyone who's done much programming. I'm sure most of you already have your favorite structure picked out.
But why do we need to structure the data, anyway? A programmer might say, "to allow data to be accessed with a minimal asymptotic complexity." Basically, that means in as few steps as possible, even if there's a lot of data. Still, this doesn't completely mesh with how computers have been explained to the average person. It probably goes something like this. (This is about as far as the Magic School Bus gets, when the class goes into a computer)
First, input is given to the computer. The input is first in the RAM. It is in ones and zeroes, because that's easier for the computer to handle. To process the data, some ones and zeroes are sent to the CPU, together with some instructions, which write other ones and zeroes back to the RAM. The ones and zeroes represent numbers in base 2. Then, when you want to save the data, those ones and zeroes are taken from the RAM and written to the hard drive.
Then, many people understand the higher level stuff about computers: the Sony VAIO is a good laptop, this is how you use Microsoft Word, Mac OS X is based on Unix, etc.
In the middle, there's a huge gap of information that only specialists know about. Right above the bottom of that gap sit data structures. Data structures, along with file formats, are the way those ones and zeroes are organized. If we only had numbers, we really couldn't do much with computers.
The basis for all data structures is the pointer. Basically, every single location in the RAM has a number describing where it is. A pointer is a number that points to one of those positions in RAM. So the simplest thing you can do is say that a particular pointer doesn't just point to one number; it points to a number, and then the number right after it in RAM. And why limit yourself to just one? Why not 100? This pointer, the pointer to location 42313, represents the number at all of the locations 42313-42412. To get the 53rd item in the array, you just look up what number is at the pointer 42313+53 (counting starts at 0, not 1). This whole thing is called an array.
Data structures get much more complicated, but the idea remains the same: you have numbers, and then you have pointers to numbers, and even pointers to pointers to numbers. Why so many pointers and so many ways of organizing them? Because if you have the right pointers, they take you to the right number, the data that you want. A good deal of computer scientists' work has been devoted to getting just the right way to orient pointers, and counting exactly how many steps they take to find the data.