Useless Factor: ignorable

Showing posts with label ignorable. Show all posts

Sunday, March 9, 2008

Explaining garbage collection

Imagine you have a lot of garbage in your house. You have so much garbage that there's no room to put any new stuff. This isn't garbage that you've thrown in the garbage can, and it's a little unclear what's garbage and what's not.

See, you've built a bunch of Rube Goldberg contraptions in your house, and some machines share parts. Your washing machine might use a pencil somewhere to write down how much longer it has until it's done, and your drier might use the same pencil on a different pad of paper. Even if you decide you don't want the washing machine, you can't just throw out the pencil (though you can throw out the washer's pad of paper.) So how do you determine what you can safely throw away?

It'd be really tedious to do this yourself, so you bring out your handy dandy garbage collection robot. You give the robot a list of Rube Goldberg machines that you use, and the robot will look at what uses what. It'll throw out everything that is unused once it figures everything out.

One strategy the robot could use is to write down a list of all of the objects in your house. The robot will then go down the list of machines that you use and put a mark next to each one you say you use. Until everything's been visited that has a mark, the robot then goes to each marked object and marks everything that it uses. When it's done, it can go back to all of the unmarked objects and thrown them out. This is known as "mark and sweep" garbage collection.

Another strategy is, when there's no more room, to build a new house, and then put in it a copy everything that you use. Then, you look at everything that those Rube Goldberg machines use and make a copy of them, for use in the new house. This doesn't have to be so inefficient, since you can actually reuse the first house again the next time garbage collection happens. This is known as "copying" garbage collection.

The last strategy is to just tell the robot, when you're building your machines, what uses what, and when you start and stop using things. The robot will keep a count of how many things use what. If only one machine is using something, or if you're using it directly and nothing else is, and then that usage stops, then you tell the robot and it throws out that thing. When it throws that piece of garbage out, it has to remember that each thing that that object uses is now used by one fewer thing. This is called "reference counting", because you count how many things refer to a particular object.

Now imagine all of this in a computer. Computer programs often amount to Rube Goldberg machines on a much larger scale. Different pieces of RAM refer to other pieces of RAM, and it's hard for the programmer to tell what she can safely throw out. By using a garbage collector, this can all be handled automatically.

Monday, February 11, 2008

Entscheidungsproblem gelöst! New answer and FAQ

Alan Turing wasn't quite right: anything that can be executed in the physical universe runs in constant time, and it's simple to design a mechanism which tests if an algorithm will halt or not which runs in constant time (assuming it takes less than half of the universe to run).

Q: How can you make such a bold, stupid claim?
A: Because Turing assumed an infinite space to solve problems in. In the real world, there is a finite number of bits we can work with. Let's call this n. If a problem can be solved in the physical universe, it must be solvable using n or fewer bits of memory. There are 2ⁿ possible states the bits can be in, so after 2ⁿ memory writes, either an answer must be produced or the bits are in a state that they've been in exactly, an infinite loop. Since n is a constant, 2ⁿ is a constant, so we know everything runs in worst-case O(1) time. For a more practical simulation, take n to be the number of bits in addressable memory, including virtual memory.

Q: How can you check if an algorithm will halt or not in constant time?
A: Divide the available memory in two. Half will be used for solving the algorithm, and half will be a counter. Increment the counter on each memory write the algorithm memory space. If this counter overflows before the algorithm halts, we know there is an infinite loop. (Note: This can only solve things that take a maximum of half of the available memory, which is a smaller class of programs.)

Remember the big thing that Turing proved: we can't have a program which tests if other programs halt, because if we did have that program, and ran it on a program which halted if it didn't halt and didn't halt if it halted (by the first program's test), then the halting test program couldn't halt or there would be a contradiction. This implicitly rests on the idea that program memory is unbounded. To run this program (the second one) using the halting test method described above would require an unbounded amount of memory because it would result in an unbounded recursion. We can just say that the program crashes, since it runs out of memory. In this world, not everything that's computable can be computed, but everything halts (or repeats a previous memory state exactly, which can be caught, as explained above).

Q: What do you mean by unbounded? And halt?
A: By "unbounded" I mean countable (meaning isomorphic as a set to the naturals). Basically, there is no limit, but no element itself is infinite. By "halt" I mean that the algorithm will stop processing and come to an answer. In this theoretical world of algorithms, note that there's no addtional external input once the algorithm begins.

Q: But the system you describe isn't Turing-complete!
A: Yes. The finite universe isn't strictly Turing-complete, either, and I think this is interesting to note. For example, there's no way we could model another universe larger than our own, down to quantum states, from this universe.

Q: So my computer science professor was lying to me all along! Computational complexity is useless. Why didn't they tell you this?
A: Not quite. Since this constant is so large (even when we shrink it down to 2^{the size of the available memory}), we can actually get a tighter upper bound than O(1) if we use, say, O(m) for the same algorithm. This is why you sometimes have to be careful what your constants are in analysis; sometimes something which looks asymptotically faster is actually slower in not just small but all cases. Anyway, your professor probably didn't tell you this because it's both obvious and vacuous.

Q: So, if you were just tricking me, does this idea mean anything?
A: Actually, it comes up fairly often. Say you're doing arithmetic on integers. Each operation takes constant time, right? Well, it does if you're using machine integers. That's because machine integers are of a certain fixed maximum size. The reason we can say it takes constant time is the same reason that the universe can calculate everything in constant time! In fact, even if we use bignums, if we say "this is only going on in numbers less that can be represented in 256 bits," it still makes some sense to say that things go on in constant time. It's only when things are unbounded that it makes total sense. Sometimes you have to look at very large data points to find that, say, n^{log₄ 3} grows faster than n log² n. If the size of the input is bounded at 10 billion, there's no clear winner.

Think about sorting algorithms. If we have k possible items in an array of length n, we can sort the array in O(n) time and O(k) space using counting sort. Say we're sorting an array of machine integers, and we have no idea what the distribution is. Great! Just make an array of integers which has one counter for each machine integer. Now iterate through the list and increment the array at the integer's index when you encounter that integer. What's the problem? Oh yeah, we just used up more than all of the addressable memory. So just because we can, theoretically, construct something that will solve our problem in linear (or, with the original example) constant time doesn't mean it'll work.

Update: Added another question. Note to readers: The headline and first paragraph aren't to be taken seriously or directly! This is a joke to demonstrate a point, not a claim of mathematical discovery, and the original solution to the halting problem is an eternal mathematical truth.

Tuesday, January 8, 2008

Useless Factor is one year old!

Actually, it's a little more than a year old; I started this blog on December 19, 2006 with a still-relevant post on Factor's forms of if. I meant to put up a "happy birthday me!" post on the right date, but I forgot, so now: Happy Birthday Me!

Many of you brave, intrepid blog readers have endured more than a year of dry, technical posts approaching 2000 words a pop just to learn what I have in my head, giving me a refreshing boost of self-importance. Be proud of yourselves!

(As a little side-comment about Steve Yegge's recent blog post about long blog posts, I write for a long time for a somewhat different reason. I don't try to make readers endure one huge sitting, but rather try to explain everything as fully as is needed to give a good understanding of things. Because of the topics I usually write about, this sometimes takes a long time. I do this completely without noticing; I only think about it when someone points it out to me. I will never limit myself to 1000-word blog posts, because there's just too much important stuff to say.)

Friday, January 4, 2008

Core Factor developers and their projects

Here's a quick listing of the main current Factor developers and their projects. Hopefully, it'll help beginners navigate the community better.

I wanted to keep this list short, but I hope I'm not offending anyone by leaving them off, so please tell me if there's some glaring omission here. There's definitely no official "core" group Factor developers; these are just the people who have contributed a lot in the past. I put the names in order of seniority (in terms of time using Factor).

Slava Pestov
IRC nick: slava

Slava is Factor's creator. He wrote almost all of the core, the compiler and the Factor virtual machine, the frontend of the Factor UI and the backend on X11 and Mac OS X, a detailed math library and the Factor HTTP server. Slava's definitely the leader of the Factor project, but he doesn't tend to operate in a dictatorial way, instead taking input from many people on language design decisions.

Chris Double
IRC nick: doublec

Chris was Factor's first adopter. He wrote a bunch of interesting libraries, including one for lazy lists, another for parsing expression grammars (PEGs), a compiler from a subset of Factor to Javascript, and an 8086 emulator suitable for playing old console games. He also wrote a bunch of useful bindings to various multimedia libraries and a bunch of other things too numerous to list here. Unfortunately for us, Chris has a full-time job, limiting his time to hack on Factor.

Eduardo Cavazos
IRC nick: dharmatech

Ed is a non-conformist and proud of it. He devised Factor's current module system, where vocabs in a USE: clause that haven't already been loaded are loaded automatically. He also wrote a window manager in Factor, a really cool artificial life demo called "boids", a chat server/client, an art programming language implementation and an alternative object system for Factor.

Daniel Ehrenberg (me)
IRC nick: littledan

I'm working on Factor's XML library, Unicode support, and a concatenative pattern matching library. I also have this blog, where I try to write useful (or useless) things about Factor.

Doug Coleman
IRC nick: erg

Doug made a bunch of important libraries including the calendar library, SQL bindings (he's currently working on an abstraction layer), a Mersenne Twister random number generator, some other math libraries, a cryptography library and integration for several text editors. Doug took over maintaining Windows I/O after another contributer, Mackenzie Straight, stopped maintaining it. Doug's really friendly to the beginners in Factor, even the ones who ask stupid questions.

Mackenzie Straight, Elie Chaftari and Alex Chapman, among others, also contributed a lot. You can see what everyone's done by looking in the vocabulary brower included in Factor.

At the risk of repeating myself, when you have questions about anything related to Factor, including any of these libraries discussed, you should pose the question to the general Factor forums: the IRC channel #concatenative on Freenode and the mailing list. We'd be happy to answer them.

Update: Fixed attributions for Doug and Elie, and changed last paragraph.

Sunday, December 23, 2007

Books that should exist

As you can probably guess by the existence of this blog, I love to write. Part of what I love about it is the actually act of writing, but it's really more that I want things to be written that hadn't before been written. I want to write stuff that I wish I could have read. Right now, there are three books and one journal article that I think should exist that. Hopefully, I'll be able to write some of these some time in my life.

Implementing Unicode: A Practical Guide

One thing that struck me when beginning to write the Unicode library is that there aren't many books about Unicode. The two I found in my local Barnes and Noble were the Unicode 5.0 Standard and Unicode Explained. Looking on Amazon.com, I can't find any other books that address Unicode 3.1 (the version that moved Unicode from 1 to 17 planes) or newer in detail, ignoring more specialized books.

Both of these were both great books, but they aren't optimal for figuring out how to implement a Unicode library. Unicode Explained focuses on understanding Unicode for software development, but shies away from details of implementation. The Unicode Standard explains everything, but it often gets overly technical and can be difficult to read for people not already familiar with the concepts described. A Unicode library implementor needs something in the middle. Unicode Demystified might be the right one, but it describes Unicode 3.0, so it is in many ways outdated.

I wish a book existed which explained Unicode in suitable detail for most library implementors. If this book continues to not exist for many years, I might just have to write it myself. This, however, would be more difficult and less fun than my other book ideas.

[Update: After getting my hands on Unicode Demystified, I realize that I should not have thrown it aside so quickly. It's a great book, and nearly all of it is still relevant and current. It looks like the ebook version I have is updated for Unicode 3.1.]

Programming with Factor

Factor's included documentation is great, but it's not enough, by itself, to learn Factor. I, and probably most people who know Factor, learned through a combination of experimenting, reading blog posts and the mailing list, and talking on #concatenative. Many people will continue to learn Factor this way, but it still seems somehow insufficient. It should be possible to learn a programming language without participating in its community.

Of course, we can't write a book on Factor until we get to see what Factor will look like in version 1.0. But I'm confident that this book will be written, even if it goes unpublished in print, and I'm fairly sure that I'll have at least some part in it. Maybe it'll be a big group effort by the whole Factor community.

What I'm thinking is that we could have a book which teaches programming through Factor, to people who aren't yet programmers. I've talked a lot about this with Doug Coleman, and we agree that most programming books are bad; we should make a new book that does things very differently. But we can't agree or imagine just how...

The Story of Programming

I, like many of you reading this blog, have an unhealthy interest in programming languages. Mine may be a little more unhealthy than yours. Whenever I hear the name of a programming language that I don't know of, I immediately need to read about it, to get some basic knowledge of its history, syntax, semantics and innovations.

Most study of programming languages works by examining the properties of the languages themselves: how functional programming languages are different from imperative object-oriented languages and logic languages and the like. But what about a historical perspective? The history of programming languages is useful for the same reason other kinds of historical inquiry are useful. When we know about the past, we know more about the way things are in the present, and we can better tell what will happen in the future. The history of programming languages could tell us what makes things popular and what makes things ultimately useful.

Unfortunately, not much has been done on this. Knowledge of programming language history is passed on, unsourced, with as much verifiability as folk legend. The ACM has held three conferences called HOPL on this subject over the past 30 years, so all the source material is there. But apart from a book published in 1969, this is all I can find as far as a survey history of programming languages goes.

There is a limit to how much academic research papers can provide. The proceedings of the HOPL conferences aren't bedtime reading, and they don't provide much by way of a strong narrative. A new book could present the whole history of programming from the first writings about algorithms to modern scripting languages and functional programming languages so it's both accessible to non-programmers and interesting to programmers. As far as I know, no one's really tried. But it would be really fun to try.

Inverses in Concatenative Languages

Most of my ideas are either bad or unoriginal. But there's one idea that I came up with that seems to be both original and not horrible, and that's the idea of a particular kind of concatenative pattern matching (which I blogged about, and Slava also wrote about in relation to units).

The basic idea is that, in a concatenative language, the inverse of foo bar is the inverse of bar followed by the inverse of foo. Since there are some stacks that we know foo and bar will never return (imagine bar is 2array and the top of the stack is 1), this can fail. From this, we get a sort of pattern matching. Put more mathematically, if f and g are functions, then (f o g)^-1 = g^-1 o f^-1.

In my implementation of this, I made it so you can invert and call a quotation with the word undo. We don't actually need a full inverse; we only need a right inverse. That is, it's necessary that [ foo ] undo foo be a no-op, but maybe foo [ foo ] undo returns something different. Also, we're taking all functions to be partial.

Anyway, I think this is a cool idea, but I doubt it could fill a book. I want to write an article about it for an academic journal so that I can explain it to more people and expand the idea. It could be made more rigorous, and it could use a lot more thought. I hope this works.

When will I write these?

So, I hope I'll be able to write these things. Unfortunately, I'm not sure when I could. I need to finish my next 10 years of education, which won't leave tons of free time unless I give up blogging and programming and my job. Also, I'm not sure if I'm capable of writing a book or an article for an academic journal, though maybe I will be in 10 years when I'm done with my education. It wouldn't be so bad if someone stole my thunder and wrote one of these things because what I really want to do is read these books.

Update: Here are a couple more books (or maybe just long surveys) that should exist but don't: something about operational transformation, and something about edit distance calculations and merge operations for things more complicated than strings. Right now, you can only learn about these things from scattered research papers. (I never thought I'd find the references at the end so useful!)

Wednesday, October 31, 2007

Post removed

The post that used to be here was wrong and pointless. I've removed it, but if you're really curious, it's just commented-out as HTML, so you could still read it.

Friday, September 28, 2007

Metablog entry

Here's how to make a good blog: Just write something. There's no real secret to blog writing; you just have to write something that no one has written before. (And make it sort of interesting and intelligible, but those are more optional.)

I bet all of you reading this entry have a new idea in your head. Why not write about it?

Update: Removed irrelevant rambling

Tuesday, September 25, 2007

My little blog is finally growing up

I'm so happy! I had my first angry anonymous blog comment! Here it is; it was to the article Factor with a dash of curry.

WTF are you on? "More fashionable"? How about "correct"?

I had heard people talking about the shit people put on blog comments, but finally I get part of the action too! Thank you, Anonymous! This implies that I must have some kind of readership of some sort... strange.

If you don't understand that comment at first, I didn't either. I searched my article for 'more fashionable' and found that I wrote "Factor's curry is, by the more fashionable nomenclature of those at Lambda the Ultimate, partial application..." This is an example of what I would call interesting diction (or at least an attempt at that); I wasn't trying to say anything bad about LtU or misrepresent what currying 'technically' is.

Anyway, Haskell Curry was just a guy. His last name doesn't have any inherent meaning. In fact, no word has any inherent meaning, but rather only the meaning given to it by the people who speak a language. Pick up some Wittgenstein, Anonymous!

Thursday, July 26, 2007

Who are you people and what are you doing reading my blog?

Ever since I got an invisible hit counter for this blog in mid-March, I've been surprised to see that people actually read this thing. Of course, I'm talking about you. At my last count, there were 10-15 active Factor programmers in the universe, not counting the ones who do it in secret. Yet, for some reason, my blog gets around 25 unique hits a day, with a standard deviation of around 12 and a distribution skewed strongly to the right. (That's ignoring the outliers of the times I get Redditted, getting 400-500 hits in a day; if I include that, the standard deviation is higher than the mean!)

So, I'm just wondering: among those of you who aren't Factor programmers who I sorta know, what motivates you to read this? Where/how did you find it? I know you're out there, so I'd really like it if you commented. (In fact, I always like it when people comment, even stupid comments.) This isn't meant as interrogation. I'm just really curious and confused.

Update: Thanks for all of your comments! But it's never too late to write another.