Wednesday, October 17, 2007

Factor FAQ

Note: The Factor FAQ has moved to factorcode.org. I'll leave this version up, but will no longer maintain it. Go to the new FAQ if you want to read something accurate and better-formatted.


  1. What is Factor?
    Factor is a functional, dynamically-typed, object-oriented, concatenative programming language designed by Slava Pestov. It's sort of like a combination of Forth and Lisp.

  2. What does concatenative mean?
    There are many ways to categorize programming languages, and one way is contrasting concatenative and applicative. In an applicative language, things are evaluated by applying functions to arguments. This includes almost all programming languages in wide use, such as C, Python, ML, Haskell, and Java. In a concatenative programming language, things are evaluated by composing several functions which all operate on a single piece of data, passed from function to function. This piece of data is usually in the form of a stack. Examples of concatenative languages include Forth, Joy, FP, Cat, and Factor.

  3. If Factor programs are just compositions of existing words, how is Factor as powerful as other programming languages?
    Factor is a Turing-complete programming language whose programs are capable of doing whatever any other programming language can do. In fact, it isn't all that complicated to translate between a concatenative language and an applicative one, as long as it's known how many arguments the words in the concatenative language take.

  4. What is Factor's specialty? Where is it best used?
    Factor can be used for anything. It is not a scripting language but it is suitable for rapid development. Factor has been used for everything from web applications to game development to XML parser implementation. Factor isn't meant for extremely low-level things, like boot loaders or microcontroller programming, though.

  5. What is Factor's purpose?
    Factor is an experiment to build a modern, useful concatenative language with strong abstraction capabilities. Though there are some pitfalls, many problems can be more cleanly expressed in a concatenative language. The Factor website has more about specific goals.

  6. Is Factor suitable for implementing my next program?
    Probably. There are only a couple places where it might not work. For example, you'll never be able to program your TI-83 calculator in Factor, because it has a footprint at least as big as the image, which means at least 500 Kb for most programs, even with a minimal image. Also, Factor won't (yet) work for situations where things need to run extremely fast (though it is already faster than most scripting languages for most tasks, and it's possible to write some parts of a Factor program in C). It would be difficult to write, say, a bootloader with it. One disadvantage of Factor is that there is currently no way to use a C++ library. But for everything else, it should be usable.

  7. Is Factor cross-platform?
    Though Factor images can only run on one specific platform, the same Factor source code can easily be run on any platform that Factor is ported to. Portability issues only occur when interfacing with native libraries, or if there is a bug.

  8. How is Factor different from Forth?
    Forth is untyped, not garbage collected, and close to the machine. For flow control, Forth code tends to use immediate words. Variables and arrays tend to be global, and programs aren't usually written in a functional manner. Factor is very different from this. It is dynamically typed, offering a high degree of reflection. Unreferenced objects are garbage collected with a generational garbage collector. Factor code is a little bit more distant from the machine, though the C FFI allows using words like malloc and mmap. For flow control, Factor generally uses quotations, allowing flexible higher order functions; parsing words are used mostly for definitions and data literals. Variables are dynamically or statically scoped (see below), and arrays are just objects which don't need to be treated specially. There is no need to think about pointers, although in Forth, this can usually be factored out to just a few words. Factor is generally a functional programming language.

  9. Why do we need a new concatenative language?
    Because the other ones aren't suitable for high-level development. Forth is great for low-level things, but its lack of type system and garbage collection make it difficult to debug programs, and it doesn't mesh well with functional programming. Joy made a very important theoretical contribution, but it is difficult to compile efficiently, its syntax is inextensible and it has an insufficient module system. Additionally, it is almost purely functional, making many things difficult. Factor combines the best aspects of these two systems, together many other borrowings from various places.

  10. What is a word?
    A word is our (from Forth) name for a named function.

  11. What is a quotation?
    A quotation is our (from Joy) name for an anonymous function. In syntax, it is a piece of code enclosed in square brackets. A quotation is an ordinary piece of data, and it can be treated in a similar manner to an array.

  12. What is a combinator?
    A combinator is our word for a higher-order function, or a word that takes a quotation as an argument. Examples of combinators include if and map.

  13. Why is there a distinction between words and quotations?
    Words and quotations are used in different places. Factor programs are built out of words, which are just compositions of other words. Words are invoked simply by naming them. Words contain metadata about their module, source location, and other things in addition to the code. If a word is on the stack, it is called with execute. Quotations, on the other hand, are the code-data used in most combinators and are not associated with the same metadata that words are. They are invoked with call.

  14. What is a vocabulary?
    A vocabulary is our name for a module. At one time, there was a distinct concept of modules and vocabularies, but it is now merged.

  15. What is a parsing word?
    A 'parsing word' in Factor is akin to an 'immediate word' in Forth. With a comparison to Lisp, it is somewhat like a reader macro but often used like a regular macro. Defining a parsing word extends the parser, and it is used to introduce new definition syntax or datatype literals.

  16. What is a generic word?
    A generic word, taken from the Lisp terminology of a generic function, is a word that dispatches on the class of its arguments. This means that methods can be defined on it. In Factor, words, rather than objects, handle the dispatch.

  17. What is a word property?
    In Factor, each word (but not quotation) is associated with a hashtable of word properties (abbreviated "word props"). These word props store metadata about the word, like where it was defined and its documentation, but not core properties like its definition or name. The entire hashtable of word props is accessible with the word word-props, and a single word property is accessed with word-prop. Word properties must be used carefully, as they are more 'global' than variables.

  18. Does Factor have variables?
    Yes. Most commonly, Factor uses dynamically scoped variables, but it's possible to use statically scoped ones using the locals vocabulary.

  19. But aren't dynamically scoped variables bad?
    They are, if they're used for everything by default. But usually, data is passed around using the stack. Dynamically scoped variables are used for a number of wider things where it would be awkward to pass data through the stack, such as prettyprinter settings, the state of a the XML parser, and a partially assembled array.

  20. Why isn't my code compiling?
    For Factor code to compile, it has to have a consistent stack effect that the compiler can discern, meaning it always takes a consistent number of things off the top of the stack and puts a consistent number back on. The most likely reason is that this is not the case. Another possibility is that you left out an inline. inline is needed after every combinator definition, as a hit to the compiler. A third possibility is that there is a bug; in that case, please tell us about it!

  21. Is Factor purely functional, like Haskell?
    No, it allows arbitrary side effects and the standard library includes several mutable data structures and imperative I/O.

  22. Why not?
    It makes things much simpler. Effects don't need to be sequenced explicitly, as in Haskell. (No one has yet implemented anything like monads or uniqueness types in Factor, but we would welcome that development.) A broader range of algorithms can be used when mutability is included. It is certainly possible to write Factor code in a purely functional way, and there are a lot of interesting possibilities to explore here. It may be possible, but no one has yet designed a metaprogramming system in a purely functional language with the degree of flexibility that Factor allows.

  23. Why is Factor dynamically, rather than statically, typed?
    For basically the same reason: it makes things more simple and flexible, allowing a high degree of metaprogramming. Additionally, a flexible enough type system for concatenative languages has not yet been designed. However, Factor 2.0 may include optional static typing, if a suitable type system can be found.

  24. What's an image? Why does Factor use one?
    The image is the file that Factor uses to store all code and data when Factor isn't running. The Factor executable and dynamically linked library only have a small amount of knowledge about Factor--just the virtual machine, the primitives and the structure of the image. The entire library is contained in the image, and was loaded there during the bootstrap process. The image is a map of the memory after the code was loaded. Unlike in Smalltalk, Factor code is almost always distributed in files rather than in the image.

  25. What is bootstrapping, and why do I need a boot image for it?
    In general, bootstrapping is the process of compiling a self-hosted compiler, that is, a compiler written in the programming language it compiles. Though Factor isn't entirely self-hosted, we use a bootstrapping process as many important pieces, like the compiler and parser (but not the virtual machine or primitives) are written in Factor. Years ago, there was a Factor interpreter and compiler written in Java, and that was initially used to run the Factor code we use now, creating an image. Now, we use a boot image--a kind of mini-image which has just enough knowledge to start the process to create a full image.

  26. How can I make a boot image?
    Use the word make-image, as in "x86.32" make-image. This creates a file boot.x86.32.image in the current directory which is a full boot image. For a listing of the strings needed to specify architecture, see the help file by running \ make-image help at the listener. You can also get boot images from the Factor website, if you can't make one yourself.

  27. Once I have a boot image, how do I compile and bootstrap Factor?
    Here is the series of commands used to compile Factor on a 32-bit x86 Linux computer:

    make linux-x86-32 #Possible platform strings are given by executing 'make'
    ./factor -i=boot.x86.32.image #replace 'x86.32' with the appropriate architecture's string
    Minor adjustments can make this work on other platforms, except for Mac OS X. To get the GUI working properly, the second line should be replaced with:

    ./Factor.app/Contents/MacOS/factor -i=boot.x86.32.image

  28. Why isn't Factor fully self-hosted?
    Making Factor self-hosted would essentially mean rewriting the virtual machine in a new, low-level DSL within Factor, avoiding all high-level features. This would not offer any real advantages over using C, as it would not be interactively debuggable or replaceable. C is a suitable language for implementing low-level components.

  29. Why aren't the C components of Factor implemented in my favorite other programming language?
    C has many advantages over other programming languages. For one, GCC is heavily optimizing and readily available on almost all platforms. C is also suitable because it is very low-level and close to the machine. We need a programming language which gives us full control over the heap, since the image is saved by copying the heap directly.

  30. How can I keep track of the stack in my head?
    At first, it may be useful to make diagrams on paper. But eventually stack shufflers should fade away in your mind and become part of the data flow. If your stack is hard to trace, it is likely that you are thinking about too many things on the stack at once. It is highly unusual for a Factor word to accept or return more than three arguments on the stack. If you ever need to keep track of the location of more than three or four items, you should probably reorganize the function by factoring it into smaller pieces.

  31. Why are the stack shufflers given names like dup, swap and drop? Why not just x-xx, xy-yx and x- or something like that?
    It is actually possible to use shufflers of this form using a vocabulary called shufflers. However, it is very rarely used. The use of mnemonics is much more clear in almost all cases, as they can mentally represent the abstract data flow going on. For extreme cases of complicated stack shuffling, statically scoped variables in the locals vocabulary are available, but stack shuffling mnemonics like dup, drop and swap are far more clean in most cases to those familiar with the language. For complicated shufflings similar to existing mnemonics, the shuffle vocabulary is also occasionally useful, mostly in dealing with foreign functions with many arguments.

  32. Should I use the last stable version of Factor, or track the current development with git?
    Most users should be fine with the last stable distribution. The current development branch often contains a few new features not included in the last release, but it is unstable occasionally broken. The git development branch is more useful for those developing Factor itself, or updating the packaged libraries so that they work with the development version. When reporting a bug, however, it is important to make sure that that bug is still present on the current development version, as it may have been fixed since the last release.

  33. How can I track development with git?
    First, download git. Then, move to the folder where you want Factor to be downloaded and enter the command git clone git://factorcode.org/git/factor.git. This will create a new folder called factor with the current development version of Factor in it. From the Factor website download a current boot image, and go through the bootstrap process described above.

  34. How can I join Factor development?
    The best way is to make a git repository of your own. Chris Double described how to do this in a blog post. Once you have a git repository, make whatever changes you feel like to the code base, and tell someone involved in development about it. If they like your changes, they can be pulled into the main Factor repository.


  35. Does Factor have a concurrency model?
    Yes, Factor supports explicit, cooperative coroutines. A new thread can be spawned with the word in-thread, and control is passed between threads with the yield word. The core thread vocabulary contains the most basic thread operations, and derived coroutine operations are in the coroutines vocabulary.

  36. Does Factor support multiple OS threads?
    Currently, no. Factor's threading model works somewhat similarly to Erlang's, with the important difference that there is only one heap, and the runtime (virtual machine) always runs in a single OS thread. The VM isn't currently thread-safe, though it will be made so in the future. Certain language features, such as word properties, currently pose challenges for making Factor thread-safe. Because everything is run in a single OS thread and there is no direct efficiency gain, Factor threads are most useful for things like executing parallel I/O operations that involve waiting.

  37. What's some cool feature of Factor that other languages don't support?
    One small feature that comes in handy is the make word, which assists in building sequences. Another cool feature is Factor's unique object system, which deserves a separate blog post to explain. A third feature is the sequence and assoc protocols, allowing numbered sequences and associative mappings to be treated generically. This isn't something that's uniquely possible in Factor, but Factor's library just happens to be very well-designed here. A very interesting library is the units library, which, due to postfix notation, looks very natural. (The calendar library also works well with postfix.) It works very well in conjunction with a library called inverse, which takes advantage of the properties of concatenativity to invert some of computation. Slava described some of these cool properties in a reddit comment.

  38. What kind of foreign function interface does Factor have?
    Factor's FFI library is called alien. It works by linking do a dynamically linked library (.dll, .so or .dylib) at runtime, allowing the user to be free of writing, generating or otherwise messing with C code. Currently, alien only supports interfacing with C. Elie Chaftari wrote a good introduction to Factor's FFI.

  39. Why isn't my code using alien working?
    First, you have to makes sure the appropriate dynamically linked library is being loaded using the word add-library. Once that is loaded, run the word recompile-all to compile all words that haven't been compiled. This will link words using the FFI up with the DLL.

  40. What kinds of GUI libraries does Factor support?
    Currently, Factor uses a cross-platform UI library written in Factor itself, using OpenGL and a small amount of native code on each platform. The listener uses this library. There is a Cocoa binding, which is used for the window frame and menu for Mac applications, though it could be used for other things. Similarly, for Unix, there is an X binding, and it has been used in a Factor window manager, Factory. On Windows, there is a binding to some parts of the Windows API through C, but not parts that create widgets. There aren't any bindings to wxWidgets or Gtk yet. Gtk bindings would be doable but somewhat challenging due to their heavy use of macros and complicated structs, and a SWIG binding could be helpful in implementing them. wxWidgets bindings would be impossible right now, as alien does not support C++'s name mangling.

  41. Why isn't Factor in the Computer Language Shootout?
    We want to make Factor faster before compiling a submission. Most things are already far faster than scripting languages, but certain things, such as I/O, still need some work. The shootout benchmarks are heavy in I/O. But don't let any of this hold you back from making your own Factor submission!

  42. How do you put a Factor program into a package so it can be run easily?
    Currently, there's no good way to do this on all platforms. On Mac OS X, there's a tool to make a .app, but there are no similar tools for other Unix systems or Windows. To package up a Factor program, you have to make an image which, when booted, will start your application. To do this, you have to set the boot quotation to the entry point for your application. In the future, before Factor 1.0 comes out, there will be a simple tool to produce an application package.

  43. How can I start learning Factor?
    The best way to go about it is to figure out something you want to program and start trying to do it. Once you have a goal in mind, you can look at Factor's included documentation, and ask questions on the mailing list or #concatenative at irc.freenode.net.

  44. Are there any good books I can read about Factor?
    Factor is a very young language, and so far, there are no books which use it yet. A good introduction to Forth, much of which applies in Factor, is Thinking Forth (PDF) by Leo Brodie. The best place to start to learn about the principles of modern concatenative languages is the Joy papers, by Manfred von Thun. Another good internet resource is Planet Factor, a blog aggregator for all things Factor-related. There won't be a Factor book written until after Factor 1.0 is released.

  45. I bootstrapped successfully on Windows, but when I run Factor it
    errors immediately with:


    Words calling ``alien-invoke'' must be compiled with the optimizing
    compiler.
    alien-invoke-error-library "freetype"
    alien-invoke-error-symbol "FT_Init_FreeType"
    Words calling ``alien-invoke'' must be compiled with the optimizing
    compiler.
    alien-invoke-error-library "freetype"
    alien-invoke-error-symbol "FT_Init_FreeType"

    The security settings on the required dlls, freetype6.dll and
    zlib1.dll, are wrong. You probably used wget under Cygwin, and this
    gives the wrong Windows security permissions for some reason. The
    easiest solution is to download
    the dlls
    manually with a web browser instead of using wget.

  46. On bootstrap, I get something like :

    Loading P" resource:core/none/none.factor"
    Vocabulary does not exist
    no-vocab-name "bootstrap.math"

    You have an empty directory shadowing the real bootstrap/math
    directory. In this case, core/bootstrap/math is shadowing extra/
    bootstrap/math. This bug has been reported on Windows and Mac and
    may be caused by git.

  47. When I try to bootstrap I get the following output:

    Loading P" factor.image"
    *** Data heap resized to 196104192 bytes
    *** Data GC (2 minor, 10 cards)
    *** Data heap resized to 630124544 bytes
    *** Data GC (0 minor, 0 cards)
    P" factor.image":1

    ^
    Word not found in current vocabulary search path
    no-word-name "\u000c"

    You are passing the boot image name to the Factor executable
    incorrectly. The correct syntax is to pass the image name as an -i=
    parameter, e.g. ./factor -i=boot.x86.32.image.

  48. How can I improve my Factor coding style?


    • Most word definitions should fit in three or fewer 64-column lines.
    • Any copy/pasted code should be factored out into new words.
    • Use combinators to abstract control flow patterns.
    • Use library words where possible.
    • More general words should go at the top of a file; more specific
      at the bottom.
    • Try to use collections instead of working with individual objects on the stack.
    • Don't use the datastack as a data structure.
    • Use meaningful word names. Avoid too many words named (foo) or foo*.
    • A word named (foo) should only exist to help implement the word foo.
    • Come to the irc channel and we'll review your code. It's fun!

  49. Which libraries do I need to get the UI working with X11 on Linux?
    You need to install recent development libraries for libc, Freetype, X11, OpenGL and GLUT. On a Debian-derived Linux distribution (like Ubuntu), you can use the line
    sudo apt-get install libc6-dev libfreetype6-dev libx11-dev glutg3-dev

  50. When using Factor in the terminal, ./factor -run=listener, is there a way to get a command history?
    rlwrap is a readline wrapper that adds readline support to terminal applications. On a Debian-derived distro, you can install it with
    sudo apt-get install rlwrap

    Otherwise, you can download the sources from its website.

  51. When trying to push to my repository using Cygwin, why do I see

    fatal: exec failed
    fatal: The remote end hung up unexpectedly
    error: failed to push to 'foo@bar.com:factor.git'

    Install OpenSSH with the Cygwin installer.


If you have any other questions about anything Factor-related, just comment and I'll put the answer in this post. I hope this helps!

Update: Added some more questions and answers, suggested by several readers. Also, Doug Coleman contributed some answers. Thanks, everyone!

8 comments:

Sekenre said...

One question, Are there any books that are good for learning Factor or Concatenative style programming?

My answer would be Thinking Forth by Leo Brodie, but maybe there are more. Could you add a link to some recommended reading materials and how to apply them to Factor?

Slava Pestov said...

Very nice work, Dan.

Sekenre: the Factor documentation is a good place to start. It is more of a reference than a tutorial, though.

Berlin Brown said...

Keep up the good work. Factor has made some serious progress. I might have to dust off the old Factor code and get cracking on some stuff I want to use it for.

I guess you guys need people to start writing killer apps in Factor. I guess that takes time.

wayo said...

Why are the stack shufflers given names like dup, swap and drop?

Why not just x-xx, xy-yx and x- or something like that?
It is actually possible to use shufflers of this form using a vocabulary called shufflers. However, it is strongly discouraged.


Discouraged by whom? And even if it is discouraged by someone in the "inner circle", who cares? Basically, I think "strongly discouraged" is out of place here. We should not "strongly discourage" folks. Let them learn on their own what works and what doesn't. If someone wants to make some crazy new way mess with the stack, I want them to feel welcome to, not discouraged.

Besides this, nice job Dan.

Ed

Slava Pestov said...

Dan, how about a question about Factor's thread support?

Stefan Scholl said...

Re wayo:

Many languages discourage some things that are possible, but could lead to problems. New programmers don't always see anything wrong in using "goto" in C, or massive amounts of reader macros in Common Lisp just to be able to express something like hashes as in Perl.

It's not "wrong" to use "goto". It's not wrong to define your own shuffle words in Factor. And there can be very good reasons to do so.

But if a FAQ and other programmers tell you something is discouraged, then understand that there could be good reasons for this, which you don't know at this early stage. Nobody will prevent you from experimenting with "bad style". Maybe you can convince the maintainers of a language to change their mind. But there has to be some resistance. Hence the "discouraged".

I'm not speaking for Slava or the Factor community. I'm just a new user.

Daniel Ehrenberg said...

No, Eduardo is right. I changed 'strongly discouraged', a value judgment, to 'very rarely used', which is objectively true. There's nothing evil about using those stack shufflers.

Berlin Brown said...

since this is a faq; you might put an example program

"What does a simple factor program look like" and then show it actually running after it has been compiled/interpreted.