I’ve been using Dice quite a bit over the last few weeks. For anyone that has used a job board for more than a day, you know that the overwhelming majority of “opportunities” are posted by recruiters, staffing companies and the like.
I tried using the features built into the Dice site to narrow the list by company to direct hire positions, but it’s nearly impossible since the filtering is done (and has to be redone) page by page of the result set. By day 3, I couldn’t stand it anymore.
Perl came to the rescue.
Using the mechanize module, I was able to quickly bust out a script that would hit the site with my keyword search, parse the results (to remove the companies I had determined should be banned), and then write a comma-separated list (position, company, location, link) to an .xls file. Excel’s awesome ability to sort and filter puts this solution over the top, and cuts my daily dice search to 20 minutes or less.
Here’s the code:
#!/usr/bin/perl #use strict; use WWW::Mechanize; use HTTP::Cookies; #USER DEFINED my $username = 'YOURDICEUSERNAME'; my $password = 'YOURDICEPASSWORD'; #OUTPUT FILE $file = '>>c:\search.xls'; #URLS my $url_login = "http://seeker.dice.com/profman/servlet/ProfMan?op=3000&pg=1000"; #URL PARAMETERS %params = ( 'DAYSBACK'=>'0', #Days to search in arrears 'EXTRA_STUFF'=>'0', 'FREE_TEXT'=>'Perl+Scripting', #Search Terms 'FRMT'=>'0', 'Hf'=>'0', 'LOCATION_OPTION'=>'2', 'N'=>'0', 'No'=>'0', #Beginning # of Result Set 'Ns'=>'p_PostedAge|0', 'Ntk'=>'JobSearchRanking', 'Ntx'=>'mode+matchall', 'NUM_PER_PAGE'=>'50', #Number of results per page 'op'=>'300', 'RADIUS'=>'64.37376', 'SORTDIR'=>'7', #7 - Sorted by Date Desc 'SORTSPEC'=>'0' ); #BANNED COMPANIES $banned_file = 'c:\banned_list.txt' or die "Cannot open filen"; open (BANNED, $banned_file); @banned_names = <BANNED>; close(BANNED); #Build Hash of Banned Companies %banned_list; foreach $banned (@banned_names) { chomp $banned; $banned_list{$banned} = 1; } #MECHANIZE $mech = WWW::Mechanize->new(); $mech->agent_alias( 'Windows IE 6' ); $mech->cookie_jar(HTTP::Cookies->new()); $mech->get($url_login); #$mech->form_name('login_form'); #NO FORM NAME $mech->field(SJT_USER_NAME => $username); $mech->field(SJT_PASSWD => $password); $mech->click(); #GET INDEX PAGE $mech->get($url_index); open (MYFILE, $file); print MYFILE "POSITIONtCOMPANYtLOCATIONtPOSTEDtLINKn"; close(MYFILE); $max_rolls = 25; #NUMBER OF PAGES TO SEARCH #(This really should be dynamic, but if you set it to 10 to 20, that'll work) for ($roll = 0; $roll < $max_rolls; $roll++) { $params{'No'} = ($roll * $params{'NUM_PER_PAGE'}); #SET THE # START SEARCH $url = prep_url(); &roll_dice($url); } sub prep_url() { my $search_jobs = 'http://seeker.dice.com/jobsearch/servlet/JobSearch?'; $amp = 0; while (($key, $value) = each %params) { if ($amp >= 1) { $search_jobs .= '&'; } $search_jobs .= $key.'='.$value; $amp = 1; } return $search_jobs; } sub roll_dice($url) { $mech->get($url); #GET THE SEARCH RESULTS open (MYFILE, $file) or die "Cannot open the specified file."; @rows = (); #Empty Array @rows = split('n', $mech->content); $rows = @rows; for ($i = 0; $i < $rows; $i++) { #Position if ($rows[$i] =~ m/<td><a.*href="(.*FREE_TEXT.*)".*>(.*)</a></td>/) { $link = $1; $position = $2; #Company if ($rows[$i+1] =~ m/<td><a.*>(.*)</a></td>/) { #PRINT JUST THE TEXT $company = $1; } #Location if ($rows[$i+2] =~ m/<td>(.*)</td>/) { $location = $1; } #Date if ($rows[$i+3] =~ m/<td>(.*)</td>/) { $posted = $1; } if ($banned_list{$company} != 1) { $link =~ s/&/&/; print "{".$company."}n"; print MYFILE "$positiont$companyt$locationt$postedthttp://seeker.dice.com$linkn"; } $i += 4; } } close(MYFILE); }
dice.pl
You’ll notice there are two file variables in this script. One refers to your banned list. Here’s mine. The other refers to the export file. You may need to make edits to those locations. And don’t forget your user name and password for the Dice site.

