Search Entire Craigslist

You want to parse craigslist
If you read the Craigslist Terms of Use, you’ll quickly learn that using an automated system to interact with Craigslist in certain ways is prohibited. We do not advocate violating Craigslist’s Terms of Use. However, many people do ask how it can be done.

If you find this useful, please provide attribution on your site with a link back to this post. Need help or custom software? Send us an email.

Overview
This combination of perl and bash scripts are meant to run on a Unix box, and have been tested on several shared-hosting providers, including GoDaddy and MediaTemple. They’ll likely work on your local Mac or Ubunutu kit.

Without using a wrapper script, the normal invocation is as follows:

./load.pl
./shake.pl

Required Files
Each pre section below is a separate file. In addition, you’ll need two text files: candidates.txt and emails.txt. (candidates.txt and emails.txt are empty to begin.)

Here is an overview of how things work.

Action Files
• page.pl — The first is a file to parse craigslist pages. It accepts a craigslist (category) URL and a city as arguments, and outputs the URLs of posts with titles that match certain keywords.
• post.pl — There is a file to parse individual posts. It accepts a (post) URL as an argument, and prints out the URL, the post title, the post ID and the email address (if there is one).
• load.pl — The third file is a wrapper file for page.pl. It supplies all of the cities, allows you to edit categories, and sets the number of days in the past that you wish to include.
• shake.pl — The final action file a wrapper for post.pl, and iterates over the candidate file URLs to retrieve the post data.

Cities
• cities.txt — This file includes all of the cities we want to search, separated by semi-colons.

Output Files
• candidates.txt — page.pl prints all of the matching post URLs to this file. (There will likely be duplicates if you search multiple cities or related categories in an a geographical area.)
• emails.txt — post.pl will print the email, post title, post id and post URL. Each value will be separated by :!:SEP:!:. If you run post.pl multiple times (by using shake.pl, for example), emails.txt will include many lines.

Quick Setup (and basic use)
1. Download the zip.
2. FTP it to your server and/or unzip it on your local Mac, Ubuntu or other *nix-type system.
3. Run

#Run the following commands; CD into the directory and make the file executable
cd craigslist
chmod +x *
chmod 777 *

4. Open load.pl in your favorite text editor and change the following section to include any categories you want to search. You can place a “#” before each line to “turn them off”, or add new lines to add additional categories.

print `./page.pl $_ "/cpg/" $datesBack`; #cpg = computer gigs
print `./page.pl $_ "/eng/" $datesBack`; #eng = internet engineering jobs
print `./page.pl $_ "/sof/" $datesBack`; #sof = software/QA/DBA/etc jobs
print `./page.pl $_ "/web/" $datesBack`; #web = web/HTML/info design jobs

5. Open page.pl and edit the words next to the keywords variable to change your search:

#These are the words you want to search for in the post title.
$keywords = '(seo|php|javascript|porsche|whatever)';

6. Run it

./load.pl #wait for it to finish (will write to candidates.txt)
./shake.pl #wait for it to finish

7. Check your results: open emails.txt

Amazon Kindle: Export Notes and Highlights

Amazon has made the Kindle service accessible by making the software run on most modern devices. It’s still not quite as easy to annotate as a PDF — I’m not sure it will ever be as user friendly as Goodreader — but it sort of makes up for that by offering some cool stats on each book. You can see other peoples’ notes and highlights, see how the book ranks among Amazon’s most-highlighted, etc.

Kindle.Amazon.com is a companion site to the software that makes it possible to review and edit your notes for each of your books. Sign in. Select a book. Select your highlights. The only downside is that you can’t export them.

The script below is meant to be converted into a bookmarklet and added to your bookmarks toolbar, and it can then be executed on the page containing your notes. It will launch a new window containing your notes in XML format. Just copy and paste.

Source

(function() { 
    var process = function(sb) {
        /* CONTENT */
        sb = sb.toString().replace(/\[hl\]/g,'<span class="highlight">').replace(/\[\/hl\]/g,"</span>");
        sb = sb.toString().replace(/\[nt\]/g,'<span class="note">').replace(/\[\/nt\]/g,"</span>");
        sb = sb.toString().replace(/\[em\]/g,'<em>').replace(/\[\/em\]/g,"</em>");
        sb = sb.toString().replace(/\[-h1\]/g, '<h1>').replace(/\[\/-h1\]/g, "</h1>");
        sb = sb.toString().replace(/\[-h2\]/g, '<h2>').replace(/\[\/-h2\]/g, "</h2>");
        sb = sb.toString().replace(/\[-h3\]/g, '<h3>').replace(/\[\/-h3\]/g, "</h3>");
        sb = sb.toString().replace(/\[-h4\]/g, '<h4>').replace(/\[\/-h4\]/g, "</h4>");
        
        /* MARKUP */
        sb = sb.replace(/</g,"&lt;").replace(/>/g,"&gt;").toString();
        
        return sb.toString();
    };
    
    var title = jQuery('.title').text().trim();
    var book = '<?php  Header("Content-type: application/xml"); ?>'+ '<?xml version="1.0"?>' + '<?xml-stylesheet href="/xsl.php" type="text/xsl"?>';
    var sb = '';
    var o = "\n"+'<book title="'+title+'">'+title;
    var q1 = "\n"+'<quote type="highlight" >';
    var q2 = '</quote>';
    var c = "\n"+'</book>';
    jQuery('div.highlightRow div.text').each(function() {
        var note = jQuery(this).find('span.noteContent,span.highlight');
        sb = sb + q1+jQuery(note).text().toString().replace(/\n/g,"<br />")+q2;
    });
    
    sb = process(book+o+sb+c);
    
    newWindow = window.open("","","status,height=700,width=500")
    newWindow.focus();
    newWindow.document.write("<textarea>"+sb+"</textarea>");
    newWindow.document.close();
})();

Bookmarklet

In some browsers, you can simply copy the bookmarklet into the URL and press return, and it will work. (It will only do its job if you’re on the page with your notes.) Some browsers force you to first manually create the bookmark by opening up your bookmarks and adding a new one.

javascript:(function%20()%20%7B%0A%20%20%20%20var%20process%20=%20function(sb)%20%7B%0A%20%20%20%20%0A%20%20%20%20sb%20=%20sb.toString().replace(/%5C%5Bhl%5C%5D/g,'%3Cspan%20class=%22highlight%22%3E').replace(/%5C%5B%5C/hl%5C%5D/g,%22%3C/span%3E%22);%0A%20%20%20%20sb%20=%20sb.toString().replace(/%5C%5Bnt%5C%5D/g,'%3Cspan%20class=%22note%22%3E').replace(/%5C%5B%5C/nt%5C%5D/g,%22%3C/span%3E%22);%0A%20%20%20%20sb%20=%20sb.toString().replace(/%5C%5Bem%5C%5D/g,'%3Cem%3E').replace(/%5C%5B%5C/em%5C%5D/g,%22%3C/em%3E%22);%0A%20%20%20%20%0A%20%20%20%20/*%20MARKUP%20*/%0A%20%20%20%20sb%20=%20sb.replace(/%3C/g,%22<%22).replace(/%3E/g,%22>%22).toString();%0A%20%20%20%20%0A%20%20%20%20return%20sb.toString();%0A%7D;%0A%0Avar%20title%20=%20jQuery('.title').text().trim();%0Avar%20book%20=%20'%3C?xml%20version=%221.0%22?%3E';%0Avar%20sb%20=%20'';%0Avar%20o%20=%20%22%5Cn%22+'%3Cbook%20title=%22'+title+'%22%3E'+title;%0Avar%20q1%20=%20%22%5Cn%22+'%3Cquote%20type=%22highlight%22%20%3E';%0Avar%20q2%20=%20'%3C/quote%3E';%0Avar%20c%20=%20%22%5Cn%22+'%3C/book%3E';%0AjQuery('div.highlightRow%20div.text').each(function()%20%7B%0A%20%20%20%20var%20note%20=%20jQuery(this).find('span.noteContent,span.highlight');%0A%20%20%20%20sb%20=%20sb%20+%20q1+jQuery(note).text().toString().replace(/%5Cn/g,%22%3Cbr%20/%3E%22)+q2;%0A%7D);%0A%0Asb%20=%20process(book+o+sb+c);%0A%0AnewWindow%20=%20window.open(%22%22,%22%22,%22status,height=700,width=500%22)%0AnewWindow.focus();%0AnewWindow.document.write(%22%3Cpre%3E%22+sb+%22%3C/pre%3E%22);%0AnewWindow.document.close();%0A%20%20%20%20%0A%7D)();

Automate FTP with Bash (on Mac OS X)

Save the code below in a file with a .sh extension, as in ftp.sh. From the command line, remember to chmod +x it so that it’s executable. Set the file to open with terminal.app. Finally, configure terminal.app to automatically close windows after a script has run. Once those steps are complete, you should be able to just double-click your file and have it FTP your stuff “automatically”.

#!/bin/bash

ftp -inv ftp.yoursite.com<<ENDFTP
user username password
cd xml
bin
lcd ~/Desktop
put "somefile.txt"
bye
ENDFTP

#Trash it
cd ~/Desktop
mv somefile.txt ~/.trash