(2622B)
1 Eensy Weensy 2 ==== 3 4 Eensy Weensy is an example webspider. It's designed to scrape urls of blog posts from <http://borderstylo.com> and output them to a text file. The example is silly, but along with my blog post ["Poor Man's Webspider"](http://pseudony.ms/blags/poor-mans-webspider.html), this should be enough to get you started writing your own. 5 6 Naming 7 ----- 8 9 The Eensy Weensy Spider is the protagonist of a children's song (see wiki's [Eensy Weensy Spider](http://en.wikipedia.org/wiki/Eensy_Weensy_Spider) for more details). 10 11 I feel its' smallness, fragility and resilience are admirable qualities which are also found in this webcrawler. While the spider's ascendency, fall, and triumphant return more closely parallels stories like Henley's "Invictus," Dante's "The Diving Comedy," and Fitzgerald's "The Curious Case of Benjamin Button", it's also a fairly accurate depiction of the life and times of a webcrawler. 12 13 If any of this sounds sensible, I have a bridge I'd like to sell you. 14 15 "Eensy Weensy" is pretty fun to say, and is the name of a spider. Case closed. 16 17 Prerequisites 18 ----- 19 20 * [Firefox](http://www.mozilla.com/en-US/firefox/firefox.html) -- the version here is important (3.5+), since we'll be making using of [native JSON encoding/decoding](https://developer.mozilla.org/En/Using_native_JSON). 21 * [Greasemonkey](https://addons.mozilla.org/en-US/firefox/addon/748) 22 * PHP installed on a server you have access to (like localhost). When I was setting up my first LAMP server, I went to [Ubuntu's directions](https://help.ubuntu.com/community/ApacheMySQLPHP) and copied and pasted my way to victory--maybe it'll help you. Advanced readers will be able to adapt the PHP bit to whatever language they like. 23 24 Usage 25 ----- 26 27 * Save this project wherever your PHP scripts like to be kept (/var/www for example) 28 * Edit spider.home in eensyweensy.user.js to point to eensyweensy.php 29 * Open eensyweensy.user.js in Firefox, then click "Install" 30 * Open [http://borderstylo.com/posts](http://borderstylo.com/posts) 31 32 Three or more pages will load in quick succession, ending up on <http://hampsterdance.com> (you're welcome). The folder containing eensyweensy.php will now have an output.txt file, with one url per line: 33 34 ... 35 http://borderstylo.com/posts/8-wordless-wednesday-16 36 http://borderstylo.com/posts/7-types-and-type-theory 37 http://borderstylo.com/posts/6-border-stylo-and-the-con-of-2009 38 http://borderstylo.com/posts/5-gems-in-scheme 39 http://borderstylo.com/posts/1-border-stylo-site-launches 40 41 License 42 ------- 43 44 Eensy Weensy is released under the MIT License. See LICENSE for more details.