Dice.com Script Update
“Entrepreneurs are risk takers, willing to roll the dice with their money or reputations on the line in support of an idea or enterprise.”
Willing to roll the dice with their money or reputations? I know what you’re thinking: How about just using some dice1? Trust me, it makes complete sense if you just don’t think about it.
Look, I’m lazy. Andrew Carnegie supposedly said, “I can do a years worth of work in 9 months, but not in 12,” before heading on his annual 3-month European vacation, and so it is with me. I need to get 24-hours worth of work done in 19.5, so that I have a few hours to study up on ways to continue bettering the ratio.
The Dice script I wrote as my first entry to this blog started to irritate me. The output was simply a t-separated file misnamed as an excel sheet, which, even though it’s more convenient than using the site directly in huge searches (I go nationwide), still required that I autofit the columns, change the font, create the hyperlinks, and bold the title row every time.
I knew I had a problem when I let my search go for 3 days because I was dreading having to do that five times a day.
This morning I took the minutes I would waste doing those stupid steps and improved the script. There are plenty of great resources on this, so don’t think I’ve invented something new. I haven’t. I just put it to use. But at least, now, the script writes an actual excel sheet with formatting and links.
Here is the code. The banned list doesn’t change, so no need to worry about incompatibility there. I use an old version of excel (<2007), so let me know if you have any troubles. The OLE should still work. Don’t forget you can use the Windows Key+Enter to follow the hyperlinks.
#!/usr/bin/perl #use strict; use WWW::Mechanize; use HTTP::Cookies; use Win32::OLE; use Win32::OLE::Const 'Microsoft Excel'; $Win32::OLE::Warn = 3; #USER DEFINED (DICE U/P) my $output_file = 'c:\search.xls'; #OUTPUT FILE my $sheet_name = 'Dice Search'; my $banned_file = 'c:\banned_list.txt' or die "Cannot open filen"; #___ DEFINE EXCEL / SHEET VARIABLES my $excel = Win32::OLE->GetActiveObject('Excel.Application') || Win32::OLE->new('Excel.Application', 'Quit'); my $current_row = 2; #OUR TITLE ROW TAKES UP ROW 1 my $range; my $adr; my $txt; my $tip1; #___ MAKE EXCEL VISIBLE $excel->{Visible} = 0; #___ ADD NEW EXCEL WORKBOOK my $workbook = $excel->Workbooks->Add; my $sheet = $workbook->Worksheets("Sheet1"); #___ CHANGE WORKSHEET NAME $sheet -> {Name} = $sheet_name; $sheet->Activate; #DICE SERVICE URL my $url_login = "http://seeker.dice.com/profman/servlet/ProfMan?op=3000&pg=1000"; #URL PARAMETERS %params = ( 'DAYSBACK'=>'0', 'EXTRA_STUFF'=>'0', 'FREE_TEXT'=>'JavaScript', 'FRMT'=>'0', 'Hf'=>'0', 'LOCATION_OPTION'=>'2', 'N'=>'0', 'No'=>'0', 'Ns'=>'p_PostedAge|0', 'Ntk'=>'JobSearchRanking', 'Ntx'=>'mode+matchall', 'NUM_PER_PAGE'=>'50', 'op'=>'300', 'RADIUS'=>'64.37376', 'SORTDIR'=>'7', 'SORTSPEC'=>'0' ); #BANNED COMPANIES open (BANNED, $banned_file); @banned_names = <BANNED>; close(BANNED); #ELIMINATE DUPLICATE CONTENT ITEM LINES %banned_list; foreach $banned (@banned_names) { chomp $banned; $banned_list{$banned} = 1; } #MECHANIZE $mech = WWW::Mechanize->new(); $mech->agent_alias( 'Windows IE 6' ); $mech->cookie_jar(HTTP::Cookies->new()); &writeHeaderRow(); #Write values in A1:E1 $max_rolls = 25; #NUMBER OF PAGES TO SEARCH (This really should be dynamic, but if you set it to 10 to 20, that'll work) for ($roll = 0; $roll < $max_rolls; $roll++) { $params{'No'} = ($roll * $params{'NUM_PER_PAGE'}); #SET THE # START SEARCH $url = prep_url(); &roll_dice($url); } &setCellFormatting(); #Format the cells #___ EXCEL SAVE AND QUIT $excel -> {DisplayAlerts} = 0; $workbook -> SaveAs ($output_file); $excel -> Quit; sub prep_url() { my $search_jobs = 'http://seeker.dice.com/jobsearch/servlet/JobSearch?'; $amp = 0; while (($key, $value) = each %params) { if ($amp >= 1) { $search_jobs .= '&'; } $search_jobs .= $key.'='.$value; $amp = 1; } return $search_jobs; } sub roll_dice($url) { $mech->get($url); #GET THE SEARCH RESULTS @rows = (); #Empty Array @rows = split('n', $mech->content); $rows = @rows; for ($i = 0; $i < $rows; $i++) { #Position if ($rows[$i] =~ m/<td><a.*href="(.*FREE_TEXT.*)".*>(.*)</a></td>/) { $link = $1; $position = $2; #Company if ($rows[$i+1] =~ m/<td><a.*>(.*)</a></td>/) { #PRINT JUST THE TEXT $company = $1; } #Location if ($rows[$i+2] =~ m/<td>(.*)</td>/) { $location = $1; } #Date if ($rows[$i+3] =~ m/<td>(.*)</td>/) { $posted = $1; } if ($banned_list{$company} != 1) { $link =~ s/&/&/; $sheet -> Range("A".$current_row) -> {Value} = $company; $sheet -> Range("B".$current_row) -> {Value} = $position; $sheet -> Range("C".$current_row) -> {Value} = $location; $sheet -> Range("D".$current_row) -> {Value} = $posted; #DO THE LINK $range = $sheet->Range("E".$current_row); $adr = "http://seeker.dice.com$link"; $txt = 'View Link'; $tip1 = "Job Posting Link"; &addLink($range, $adr, $txt, $tip1); $current_row ++; } $i += 4; } } } #___ EXCEL TITLE ROW sub writeHeaderRow() { $sheet -> Range("A1") -> {Value} = 'COMPANY'; $sheet -> Range("B1") -> {Value} = 'POSITION'; $sheet -> Range("C1") -> {Value} = 'LOCATION'; $sheet -> Range("D1") -> {Value} = 'POSTED'; $sheet -> Range("E1") -> {Value} = 'LINK'; } #___ EXCEL Hyperlink HELPER Subroutine sub addLink () { $sheet->Hyperlinks->Add({ Anchor => $range, Address => $adr, TextToDisplay => $txt, ScreenTip => $tip1, }); } #___ EXCEL USER/VIEW PREF PROPERTIES (DO THIS AFTER WRITING) sub setCellFormatting() { $sheet->Range("A1:E1")->Font-> {Bold} = "True"; $sheet->Range("A1:E800")->Font->{Size} = 9; $sheet->Range("A1:E800")->Font->{Name} = "Microsoft Jhenghei"; $sheet->Range("A1:E800")->{Columns}->Autofit; }
Learn Something: Excel-related:
Cultured Perl: Reading and writing Excel files with Perl
Win32::OLE.pm and Hyperlinks – PERL Modules
Win32::OLE Excel Cheat Sheet – Perl FAQ – Tek-Tips
Learn Something: Dice-related:
Dice.com Job Listings Fall 45 Percent | Sourcing | ITBusinessEdge.com
Opinion: Watching for Help Wanted Signs | Opinion | Financial Articles & Investing News | TheStreet.com (By Scot Melland, CEO of Dice Holdings.)
1 Dice.com is about the most-awesome site ever. In terms of the service it provides, there isn’t any other I would rather use. I’m just dealing with searches that require me to see 1,000+ positions a day. Having to weed out staffing companies with the wrong opportunities (No, thanks, I’m really not interested in your medical-billing-slash-javascript position) or the wrong area (I’m just not interested in moving to Pakistan. I’m sorry.) every single day gets frustrating, and that’s the only reason I don’t use the site directly.

