Monday, December 25, 2006

On second thought...

Actually, Google sucks. Yahoo is much better. It offers unlimited use of its REST API, which is so simple, I was able to whip up a search thing in 20 minutes:

USING: http.client xml xml.utilities kernel sequences
namespaces http math.parser help ;
IN: yahoo

TUPLE: result title url summary ;

C: <result> result

: parse-yahoo ( xml -- seq )
"Result" tags-named* [
{ "Title" "Url" "Summary" }
[ tag-named children>string ] curry* map
first3 <result>
] map ;

: yahoo-url ( -- str )
"" ;

: query ( search num -- url )
yahoo-url %
swap url-encode %
"&results=" % #
] "" make ;

: search-yahoo ( search num -- seq )
query http-get 2nip
[ "Search failed" throw ] unless*
string>xml parse-yahoo ;

I wonder how long this same code is in other languages.

Update: Code updated to work with Factor .91

Thursday, December 21, 2006

Scraping Google

So Google scrapped their SOAP API. That's terrible! What can we do? Some people have said basically this is the end of Google. Others, over at EvilAPI, have cloned the old SOAP API by parsing search results. But we don't even need to do that; let's just parse the web pages. Google's mobile device interface uses well-formed XHTML, so it's relatively easy to parse. Who needs SOAP? It took me a while to get this right, but in the end, it was pretty simple to get a basic system up and working:

TUPLE: link title quote url ;

: parse-google ( xml -- seq-links )
"div" get-name-tags 2 12 rot <slice> [
[ "a" get-name-tag children>string ] keep ! get title
[ ! get quote
2 swap dup length 2 - swap subseq
] keep
"span" get-name-tag children>string ! get URL
1 swap [ length 2 - ] keep subseq
] map ;

This, however, is pretty brittle in response to changes in Google's format and poorly factored. It should probably be changed. It's just a start.

This article isn't meant to be in praise of Factor for making this so amazing and easy; it is just a demonstration that complex frameworks like SOAP aren't needed to move data between computers around the web, and that just because Google removed their only official API to access their data on the server side doesn't mean we can't make our own.

Tuesday, December 19, 2006

Variants of if in Factor

Factor has many different combinators where other languages use only one. This can be confusing to beginners, who may spend a lot of time looking this sort of thing up using see. At first, all the variants may seem like unnecessary complication, but they actually allow code to be much more concise than they would be if the only combinator were if. The following article assumes basic knowledge of Factor syntax.

The first combinator for conditional execution, which I just mentioned, is if. This is used in the following way:

! code producing a value on the top of the stack
! code executed if the value on the top of the stack does not equal f
! (any value that is not f is true as far as Factor is concerned)
! this is equivalent to a "then" block
] [
! code executed if the value on the top of the stack equals f
! this is equivalent to an "else" block
] if

Neither of the two blocks of code has access to the value on the stack that chose the branch. Below is an example of a word that uses if:

: operate ( patient -- )
patient dup asleep? [
] [
dup anaesthetize-more operate
] if ;

The simplest variation of if is when. when is used when the "else" block is empty. The following two lines are equivalent:

[ then code ] [ ] if
[ then code ] when

You'd use when in any sort of situation where action only needs to be taken when the predicate is true. Below is an example of usage:

: ?alarm ( patient -- )
dead? [ alarm set-off ] when

Another simple variation is unless, used when the "then" block is empty. The following two lines are equivalent:

[ ] [ else code ] if
[ else code ] unless

You'd use unless in any sort of situation where action only needs to be taken when the predicate is false.

These conditionals all assume that you don't need the value of the predicate in the case where it is not f. When you do need the value, use if*, when* and unless*. Note that in the "else" branch, f is *not* put on the stack, since it conveys no information. The following two lines are equivalent:

dup [ then ] [ drop else ] if
[ then ] [ else ] if*

This may not seem useful at first, but it actually comes up a lot. unless* is particularly useful in cases where you want to replace the contents of the top of the stack in the case where it is f, but when it is not f, you want to leave it how it is. Below is an example of unless*:

: knife ( -- knife )
scalpel [ hacksaw ] unless* ; where scalpel and hacksaw are both ( -- knife )

There are some cases where you don't need to branch but just need to choose between two values, using a predicate. For this, you can use ?. The following two lines are equivalent:

[ 1 ] [ 2 ] if
1 2 ?

This is not always used for literal values; it can also be used for values taken from the stack. An example:

: family-message ( patient -- string )
dead? "My sincerest condolances for your son's death"
"Congratulations on the sucessful operation" ? ;

Factor programmers should be wary of deeply nested conditionals, as they are often evidence of poor factoring. However, sometimes they do become necessary, and Factor provides a Lisp-like cond for this case. The word iterates through a sequence of pairs of a predicate and a value. The first predicate to return something other than f is selected, and its associated quotation executed. Generally, t is used to provide an "else" case. An example is below.

: cut-open ( patient -- )
knife {
{ [ dup not ] [ throw ] }
{ [ 2dup appropriate-knife? ]
[ swap body-part cut ] } ! body-part is ( patient -- part )
! cut is ( part -- )
{ [ t ] [ change-knife cut-open ] } ! change-knife is ( knife -- knife )
} cond ;

In this case, it might be preferable to use nested conditionals, but in the case where there are three or four real predicates which can be either true or false, cond makes things much simpler.

The final type of conditional is ?if. After more than a year of Factor programming, though, I still don't feel perfectly comfortable with it and have never found the need to use it. Of all the branching combinators, it is the least used (see below). However, it still deserves mention because it is still useful in some cases. The following two lines are equivalent:

[ then ] [ else ] ?if
dup [ nip then ] [ drop else ] if

?if is used when you have a predicate, which is passed to the "then" block, and a default, which is passed to the "else" block.

Using the command [ { if when unless if* when* unless* ?if ? cond } [ dup usage length swap set ] each ] make-hash, we can see how often each type of if is used. This yields the following hash associating words to the number of times they are used in the core:

{ unless 44 }
{ when 100 }
{ cond 36 }
{ if 1455 }
{ ? 32 }
{ unless* 28 }
{ when* 59 }
{ ?if 26 }
{ if* 36 }

From this, it is evident that if is used far more often than the other types of conditionals. However, all types are useful and it is good to know all of them.