Monday, April 29, 2013

Always remember to shred your ship when done with it

We've all seen the hapless user who sells their computer on eBay without wiping the hard drive.

How about selling a Coast Guard patrol board to the North Koreans without wiping the navigation system.  :-)

What else did they forget to sanitize?

http://www.theregister.co.uk/2013/04/29/japan_coast_guard_forgets_wipe_data_norks/

There's a reason for process and rules, including that annoying check sheet for hardware disposal.

Thursday, April 25, 2013

Ouch!

One of the issues we have to address as security folks is protecting a person's privacy.   If you've ever dealt with Personal Health Information (PHI), you know that there are strict rules about what aspects of a person's identity must be protected when associated with medical data.

In what can only be described as an object lesson of how important this is, the folks at the Data Privacy Lab (at Harvard) conducted an interesting experiment - looking into how many folks in the Personal Genome Project they could identify just by birthdate, sex and zip code.

Amazingly, they identified 200 participants with 84% to 90% accuracy.  Let me repeat that for emphasis ... using just birthdate, zip and sex they were able to link 200 folks to their "anonymous" genome with good accuracy.  They basically matched data from the genome project with public voter registration data and other public data.

Here's a web site where they report their findings: http://dataprivacylab.org/projects/pgp/
The full report is at: http://dataprivacylab.org/projects/pgp/1021-1.pdf

Best of all, they have a web site where you can put in your birthdate, sex and zip, and they'll tell you how many folks match in their public records. (http://aboutmyinfo.org/)

I tried it for my info, and there's only one record which matches my info.  I live in a relatively small town (Boulder, CO) but still I was shocked.  It's a good thing I don't feel a need to hid my identity.

For reference, here's what HIPAA says about data that needs to be protected (thanks Wikipedia, http://en.wikipedia.org/wiki/Protected_health_information):

Under the US Health Insurance Portability and Accountability Act (HIPAA), PHI that is linked based on the following list of identifiers must be treated with special care.
  1. Names
  2. All geographical identifiers smaller than a state, except for the initial three digits of a zip code if, according to the current publicly available data from the Bureau of the Census: the geographic unit formed by combining all zip codes with the same three initial digits contains more than 20,000 people; and [t]he initial three digits of a zip code for all such geographic units containing 20,000 or fewer people is changed to 000
  3. Dates (other than year) directly related to an individual
  4. Phone numbers
  5. Fax numbers
  6. Email addresses
  7. Social Security numbers
  8. Medical record numbers
  9. Health insurance beneficiary numbers
  10. Account numbers
  11. Certificate/license numbers
  12. Vehicle identifiers and serial numbers, including license plate numbers;
  13. Device identifiers and serial numbers;
  14. Web Uniform Resource Locators (URLs)
  15. Internet Protocol (IP) address numbers
  16. Biometric identifiers, including finger, retinal and voice prints
  17. Full face photographic images and any comparable images
  18. Any other unique identifying number, characteristic, or code except the unique code assigned by the investigator to code the data

Tuesday, April 23, 2013

Pasting for Gold redux


A fair while ago I posted a entry talking about scraping files from Pastebin (http://jrnerqbbzrq.blogspot.com/2013/02/pasting-for-gold.html).  I even posted some code for a program which would grab copies of selected files for later analysis.

Since then, I've continued to play with it.  It's really great fun to see what folks post to Pastebin when chatting amongst themselves.  As I mentioned in my first posting, the number of system compromises, lists of cracked passwords, "dox'ng", IRC chat logs, and general Internet cruft is impressive.  Viewing these files provides a bit of insight into a few of the dark corners of the Internet, and frankly can be a bit addicting ... kinda like plopping down in front of a reality TV show and seeing just how childish grown-ups can really be.

I've upgraded the scraping program, primarily to deal with occasional problems with connectivity to Pastebin.  I suspect the problems are due to Pastebin sometimes blocking my connections ... probably via a load balancer which mistakes my earnest efforts as some sort of abuse.  Their terms of use are vague, but in their FAQ (http://pastebin.com/faq), regarding their AUP they say "Do not aggressively spider the site", and go on to say they'll block you if you do.   I sent them an email asking about this, but they never replied.   You can play with the -w option to increase the delay between connections if you have problems, although the longer you set the interval the greater the chance you'll miss some files.

As with the first version of this program, it is heavily based on the program written by malc0de:
http://malc0de.com/tools/scripts/pastebin.txt

Here it is.  I call it PasteScrape.pl:

#!/usr/bin/perl -w

#
#Simple perl script to parse pastebin to alert on keywords of interest. 
#1)Install the the LWP and MIME perl modules
#2)Create two text files one called keywords.txt and tracker.txt
#2a)keywords.txt is where you need to enter keywords you wish to be alerted on, one per line.
#3)Edit the code below and enter your smtp server, from email address and to email address. 
#4)Cron it up and receive alerts in near real time
#

########################################################################
# Downloaded 1-29-13 from http://malc0de.com/tools/scripts/pastebin.txt
# by DA - I'm not the author, but I'm afraid that I've had my way with it.
# Changes:
#     Removed email code
#     Added random sleep to be considerate 
#     Added infinite loop to be inconsiderate
#     Added write the matching paste to a separate file (writeHitToFile)
#     Added writting matching expression to writeHitToFile
#     Moved read of regex to inside main loop - catch changes on the fly
#     Added write log of hits to HitList.txt
#     Added getopt and cleaned up a bit
########################################################################

$debugRequested = 0;
$delayInterval = 5;  # Default max delay between queries to web site
$keyWordsFileName = 'keywords.txt';
$fetchErrorCnt = 0;
$tryOneMoreTime = 0;
$webProxy = 0;

use LWP::Simple;
use LWP::UserAgent;

use Getopt::Long;

GetOptions ("h" => \$Help_Option, "d" => \$debugRequested, "w=s" => \$delayInterval, "k=s" => \$keyWordsFileName, 
     "p=s" => \$webProxy );

if ($Help_Option){ &showHelp;}

my $ua = new LWP::UserAgent;
$ua->agent("Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1");

if ($webProxy){
    $ua->proxy('http', $webProxy);
}


my $tracking_file = 'tracker.txt';

while (1){

    # Load keywords.  Check the file each loop in case they've changed.
    open (MYFILE, $keyWordsFileName) or die "Couldn't open $keyWordsFileName: $!";
    @keywords = <MYFILE>;
    chomp(@keywords) ;
    $regex = join('|', @keywords);
    close MYFILE;

#Set the date for this run
    my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime(time);
    my $datestring = sprintf("%4d-%02d-%02d",($year + 1900),($mon+1),$mday);
    my $dateTimeString = sprintf("%4d-%02d-%02d %02d:%02d",($year + 1900),($mon+1),$mday, $hour, $min);

    $dir = sprintf("%4d-%02d-%02d",($year + 1900),($mon+1), $mday);

    if ($webProxy){
 $ua->proxy('http', $webProxy);
    }
    my $req = new HTTP::Request GET => 'http://pastebin.com/archive';
    my $res = $ua->request($req);
    $pastebin = $res->content; 

    unless (defined $pastebin){
 die "Request from pastebin failed @ $dateTimeString: ($!)\n";
    }

    my @links = getlinks();
    $linkCount = $#links;

    &debugPrint ("\n");  # Just a stupid formatting thing
    print "Starting new batch at $dateTimeString. Save-to dir is $dir. Keywords file is $keyWordsFileName. regex is: $regex\n";
    &debugPrint ("size of \@links: $linkCount\n");
    if (@links) {
 $fetchErrorCnt = 0;
 $tryOneMoreTime = 0;
 foreach $line (@links){
     &RandSleep ($delayInterval);
     if  (checkurl($line) == 0){
  my $request = "http://pastebin.com/$line\n";
  my $link = $line;
  if ($webProxy){
      $ua->proxy('http', $webProxy);
  }
  my $req = new HTTP::Request GET => "$request"; 
  my $res = $ua->request($req);
  my $content = $res->content;
  my @data = $content;
  if ($debugRequested){
      &debugPrint ("checking ($linkCount) - http://pastebin.com/$line ... ");
      $linkCount--;
  }
  foreach $line (@data){
      if ($content =~ m/\<textarea.*?\)\"\>(.*?)\<\/textarea\>/sgm){ 
   @data = $1; 
   foreach $line (@data){
       if ($line =~ m/($regex)/i){
    $Match = keyWordMatch ($line);
    storeurl($link);
    &debugPrint (" matched $Match ...");
    &writeHitToFile ($link, $line, $Match);
       }
   }
   next;
      }
  }
     }  
 }
    }
    else {  # Sometimes the fetch fails.  Don't really know why, but we try a few more times before giving up
 unless ($tryOneMoreTime){ # unless we're on the very last try
     print "fetch of links failed - can't say why (guess: $!). Sleeping for a minute ... \n";
     sleep 60;
     print "awake. Trying again\n";
 }
 if (++$fetchErrorCnt >= 10){
     if ($tryOneMoreTime){
  print "That's it, waited an hour and still failing ... Giving up\n";
  exit;
     }
     print "10 failures in a row.  Sleeping for an hour and then trying ONE MORE TIME\n";
     $tryOneMoreTime = 1;
     sleep 3600;
 }
    }
}

sub getlinks{
    my @results;
    if (defined $pastebin) {
        @data = $pastebin;
        foreach $line (@data){
            while ($line =~ m/border\=\"0\"\s\/\>\<a\shref\=\"\/(.*?)"\>/g){
                my $url = $1;
         push (@results, $url);        
     }
 }
    }
    
    return @results;
}

sub storeurl {
    my $url = shift;
    open (FILE,">> $tracking_file") or die("cannot open $tracking_file");
    print FILE $url."\n";
    close FILE;
}

sub checkurl {
    my $url = shift;
    if (-e $tracking_file){
 open (FILE,"<$tracking_file") or die("cannot open $tracking_file for read");
    }
    else {
 return 0;  # File doesn't exist yet
    }
    foreach my $line ( <FILE> ) {
 if ( $line =~ m/$url/i ) {
     &debugPrint ("detected repeat check of $url ");
     return 1;
 }
    }
    return 0;
}

sub RandSleep{
    my $maxSleepTime = pop;
    my $sleepTime = int rand ($maxSleepTime + 1); # Need the +1 since we'll never hit maxSleepTime otherwise

    &debugPrint ("sleeping for $sleepTime ... ");
    sleep $sleepTime;
    &debugPrint ("awake!\n");
}

sub writeHitToFile{

    my $matchingExpression = pop;
    my $Contents = pop;
    my $url = pop;
    chomp ($url);

    unless (-e $dir){
 mkdir $dir or die "could not create directory $dir: $!\n";
    }

    if (-d $dir){
 open (HIT_FILE, ">$dir/$url") or die "could not open $dir/$url for write: $!\n";
 print HIT_FILE "http://pastebin.com/$url matched \"$matchingExpression\"\n" or die "print of url to $dir/$url failed: $!\n";
 print HIT_FILE $Contents or die "print of contents to $dir/$url failed: $!\n";
 close HIT_FILE;

 # Get the current time for the list file entry
 my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime(time);
 my $datestring = sprintf("%4d-%02d-%02d %02d:%02d",($year + 1900),($mon+1),$mday, $hour, $min);

 open (HIT_LIST_FILE, ">>HitList.txt") or die "could not open HitList.txt for append: $!\n";
 print HIT_LIST_FILE "$dir/$url - http://pastebin.com/$url matched \"$matchingExpression\" at $datestring\n" or die "print of hit to HitList.txt failed: $!\n";
 close HIT_LIST_FILE;
    }
    else {
 die "$dir exists but is not a directory!\n";
    }
}

sub keyWordMatch{
    my $matchingLine = pop;

    foreach $check (@keywords){
 if ($matchingLine =~ m/$check/i){
     return $check;
 }
    }
    return "No Match";
}

sub showHelp {
    print<<endHelp
$0: [-h] [-d] [-w <Max Wait Interval in seconds>][-p <http proxy>] [-k <Keywords File>]
-h: Show this help message
-d: Print debug output
-w <wait seconds>: Max wait in seconds between fetches.  Each fetch is delayed a random amount between 0 and this value. Default is 5 seconds.
-k <filename>: Name of file with keywords to monitor for.  Each line of the file is text or a perl regular expression. Default is \'keywords.txt\'
-p: Proxy through <http proxy>  (good for use with Zap or Burp)

Track progress via \"tail -f HitList.txt\"
endHelp
 ;
    exit;  # We always exit after showing help
}

sub debugPrint{
    unless ($debugRequested){ return;}

    my $message = pop;
    $saveState=$|; $| = 1;  # Save whether print is buffered, and make unbuffered

    print $message;  # print the message

    $| = $saveState; # return print buffering to previous state
}




To use this program you may need to install the LWP perl module (although it appears to be installed by default).  PasteScrape.pl expects  to find a file named 'keywords.txt' which contain search strings (regular expressions) to match against any files it finds (one string per line.)  If a file found on Pastebin contains a match, it's saved in a folder named after the current date (yyyy-mm-dd).

I've only used it under Linux (Ubuntu) and haven't tested it elsewhere.  But if LWP works on your OS, it should be pretty portable.

In the previous posting, I mentioned that the biggest problem is dealing with the flood of files that I was getting.  There are two ways to deal with this type of problem.  The first is to more strictly limit what you save (via the 'keywords.txt' file), but that's no fun!  The second way is to become more efficient at sorting through the hundreds or thousands of files collected each day.  That's what the second program here is for.

I call it PasteView.pl


#!/usr/bin/perl -w

# This program is a custom hack job to view the saved files from my
# custom hack job Pastebin scraper program. :-)
# 
# The format of the files is as follows: The first line of the file
# contains the full URL and what specific string was matched in the
# paste to result in it being saved.  The rest of the file is the
# contents of the saved paste.
# 
# E.G:
# $ head -10 2013-02-09/017Gy3yZ 
# http://pastebin.com/017Gy3yZ matched &dquo;password&dquo;
# [10:45:02] [INFO] LaunchFrame.main:161: FTBLaunch starting up (version 1.2.2)
# [10:45:02] [INFO] LaunchFrame.main:162: Java version: 1.6.0_38
# [10:45:02] [INFO] LaunchFrame.main:163: Java vendor: Sun Microsystems Inc.
# [10:45:02] [INFO] LaunchFrame.main:164: Java home: C:\Program Files\Java\jre6
# [...]
#
# So, foreach file in the argv, we look at the first line.  The first
# line describes the primary match that was identified by the Pastebin
# scraper.  After we collected all the matches, we list each of the
# matched strings found (from the first line of each file) and a
# count.  The user will then be prompted to select one &dquo;match&dquo;, and
# will then have the option to view any of the files which correspond
# to the match.
#
# The -m option shortcuts the collection process and just presents
# files which match the -m options.
#
# Timestamp
#
# The -n (show new files) and the -w (write timestamp) options control
# the ability to only view &dquo;new&dquo; files.  New files are files which
# were created after a previously established timestamp.  The
# timestamp is stored in a file, and is established either via the -w
# option, or via the "w" command while viewing files via the -s
# option.
#
# De-Escape character entities
#
# It"s very common for pastebin files to utilize the HTML character
# entities (e.g. &lt; for "<").  The -e option filters out a small
# subset of these for readability
#
########################################

use Getopt::Long;

$DEBUG = 0;
$debugFileName = "DEBUG.txt";
$choiceCount = 40;   # Default on how many choices to give
$lineLength = 80;    # Default to chop line length
$viewOnlyNewFiles = 0;  # Flag: only look at files created since timestamp
$showMatchingLine = 0; # Flag, show first matching line instead of first line in file
$timeFileName = "lastCheck";  # Where to write timestamp
$lastTimeChecked = 0;

GetOptions (&dquo;h&dquo; => \$Help_Option, &dquo;m=s&dquo; => \$matchStringArg, &dquo;d&dquo; => \$DEBUG, &dquo;e&dquo; => \$deEscape, 
     &dquo;n&dquo; => \$viewOnlyNewFiles, &dquo;w&dquo; => \$setNewFilesDate, &dquo;l&dquo; => \$showMatchingLine,
     &dquo;p=s&dquo; => \$choiceCount);

if ($Help_Option){ &showHelp;}

if ($DEBUG){  # Open debug output file
    open DEBUG_FILE, &dquo;>$debugFileName&dquo; or die &dquo;open of $debugFileName failed: $!&dquo;;
    $debugDate = `date`;
    print DEBUG_FILE &dquo;Starting debug output to $debugFileName at $debugDate&dquo;;
    print DEBUG_FILE &dquo;Command Line Options:\n&dquo;;
    if (defined $matchStringArg){
 print DEBUG_FILE &dquo;   matchStringArg = $matchStringArg\n&dquo;;
    }
    if (defined $DEBUG){
 print DEBUG_FILE &dquo;   DEBUG = $DEBUG\n&dquo;;
    }
    if (defined $deEscape){
 print DEBUG_FILE &dquo;   deEscape = $deEscape\n&dquo;;
    }
    if (defined $viewOnlyNewFiles){
 print DEBUG_FILE &dquo;   viewOnlyNewFiles = $viewOnlyNewFiles\n&dquo;;
    }
    if (defined $setNewFilesDate){
 print DEBUG_FILE &dquo;   setNewFilesDate = $setNewFilesDate\n&dquo;;
    }
    if (defined $showMatchingLine){
 print DEBUG_FILE &dquo;   showMatchingLine = $showMatchingLine\n&dquo;;
    }
    if (defined $choiceCount){
 print DEBUG_FILE &dquo;   choiceCount = $choiceCount\n&dquo;;
    }
}

while (1){ # We keep looping through choices until we exit

    # clear out results from last run (if there was one)
    foreach $key (keys %matchCount){delete $matchCount{$key};}  
    foreach $key (keys %matchList){ delete $matchList{$key};} 
    $totalMatches = 0;
    $matchedFileCount = 0;

    # Handle -n option
    if ($viewOnlyNewFiles){
 $now = time;
 if (-e $timeFileName){   # If there"s timestamp file, use it
     open TIME, &dquo;<$timeFileName&dquo; or die &dquo;open of $timeFileName for read failed: $!&dquo;;
     $lastTimeChecked = <TIME>;
     close TIME;
 }
 else {  # Otherwise force user to create a timestamp file
     unless ($setNewFilesDate){
  print &dquo;-n option invalid since timestamp file \&dquo;$timeFileName\&dquo; was not found.  Use -w option to establish. Exiting.\n&dquo;;
  exit;
     }
 }
    }

    # Handle -w option
    if ($setNewFilesDate){   # user has said to create or update timestamp file
 $now = time;
 open TIME, &dquo;>$timeFileName&dquo; or die &dquo;open of $timeFileName for write failed: $!&dquo;;
 print TIME $now;
 close TIME;
    }

    if ($matchStringArg){
 # User used -m option to select a custom match, skip first
 # loop through files since we know what to match

 if ($DEBUG){print DEBUG_FILE &dquo;user selected -m: matchStringArg = $matchStringArg\n&dquo;;}

 $matchString = $matchStringArg;
    }
    else {
 
    # Cycle through all the files, look at the first line (contains
    # the match string from PasteScrape).  We"ll use this to present
    # the user with a list of matches to select from.

 $totalMatches = 0;

 foreach $fileName (@ARGV){  # we go through each of the files specified by user
     unless (-e $fileName){   # Just in case of a user typo or something
  print &dquo;Couldn"t find $fileName, exiting\n&dquo;;
  exit;
     }

     if ($DEBUG){ print DEBUG_FILE &dquo;in file examination loop: fileName = $fileName\n&dquo;;}

     if ($viewOnlyNewFiles){  # user only wants to see new files. Compare this file to timestamp
  @fileStats = stat ($fileName);
  if ($DEBUG){print DEBUG_FILE &dquo;file access date = $fileStats[9], lastTimeChecked = $lastTimeChecked\n&dquo;;}
  if ($fileStats[9] < $lastTimeChecked){
      next;
  }
     }

     # Now, open the file and examine the first line 
     open FILE, &dquo;<$fileName&dquo; or die &dquo;open of $fileName failed: $!&dquo;;
     $firstLine = <FILE>;
#     if ($firstLine eq ""){die &dquo;attempt to read $fileName for -s failed: $!&dquo;;}
     unless (defined $firstLine){next;}

     $firstLine =~ /.+matched\s+\&dquo;(.+)\&dquo;/;
     if ($DEBUG){print DEBUG_FILE &dquo;  matched = $1\n&dquo;;}

     # Keep track of how many files &dquo;match&dquo; each match string
     $matchCount{$1}++;
     $totalMatches++;

     close FILE;
 } # foreach $fileName ...

 # Now that we"ve looked at each of the requests files, show
 # them to user and see which &dquo;match&dquo; is of interest
 unless ($DEBUG){system ("/usr/bin/clear");}
 print &dquo;$totalMatches total primary matches (as identified by PasteScrape):\n&dquo;;
 $matchIndex = 0;
 foreach $match (sort byCount keys %matchCount){
     $matchArray[$matchIndex] = $match;
     print &dquo;$matchIndex --> ($matchCount{$match}) $match\n&dquo;;
     $matchIndex++;
 }
 print &dquo;$matchIndex --> Provide a custom search string\n&dquo;;

 print &dquo;Select a matching expression to review (\#, \&dquo;w\&dquo; or \&dquo;q\&dquo; to quit): &dquo;;
 $inLine = <STDIN>;
 if ($inLine =~ /q/i){  # User requested quit
     exit;
 }

 if ($inLine =~ /w/i){  # User requested we reset the timestamp
     $now = time;
     open TIME, &dquo;>$timeFileName&dquo; or die &dquo;open of $timeFileName for write failed: $!&dquo;;
     print TIME $now;
     close TIME;

     print &dquo;New timestamp written, exiting\n&dquo;;
     exit;
 }

 # So now, the user should have selected which match string to
 # review the files which match.  User selects the number of
 # the match string
 chomp ($inLine);
 unless ($inLine =~ /^\s*\d+\s*$/){   # test for a simple digit input
     unless ($inLine =~ /^\s*$/) {    # exit on empty line, but no error msg
  print &dquo;Didn"t recognize \&dquo;$inLine\&dquo; as a valid choice (must be an integer.) Exiting\n&dquo;;
     }
     exit;
 }
 $matchSelection = int ($inLine);

 if ($DEBUG){print DEBUG_FILE &dquo;matchSelecton = $matchSelection\n&dquo;;}

 unless (($matchSelection >= 0) and ($matchSelection <= ($matchIndex))){ # range check user selection
     print &dquo;\&dquo;$matchSelection\&dquo; isn\"t a valid selection. Exiting\n&dquo;;
     exit;
 }

 if ($matchSelection == $matchIndex){  # User selected custom search string
     print &dquo;Search string: &dquo;;
     $matchString = <STDIN>;
     chomp ($matchString);
     if ($DEBUG){ print DEBUG_FILE &dquo;matchString = $matchString (user provided)\n&dquo;;}
 }
 else {
     $matchString = $matchArray[$matchSelection];    # determine the selected match string from list
     if ($DEBUG){ print DEBUG_FILE &dquo;matchString = matchArray[$matchSelection] ($matchArray[$matchSelection])\n&dquo;;}
 }
    }  # else (present potential matches to user)


    # We"ve shown the user all the match strings and/or the user has
    # told us which one to look at.  Now cycle through all the files
    # again, and if the first line (or any line, with -l) matches the
    # user selected, add it to the list to present the user.  There
    # may be hundreds of matches, so we need to present them in
    # batches.

FILE_LOOP:
    foreach $fileName (@ARGV){

 if ($DEBUG){ print DEBUG_FILE &dquo;fileName = $fileName\n&dquo;;}

 if ($viewOnlyNewFiles){ # as before, user may only want to consider &dquo;new&dquo; files.
     @fileStats = stat ($fileName);
     if ($DEBUG){print DEBUG_FILE &dquo;file access date = $fileStats[9], lastTimeChecked = $lastTimeChecked\n&dquo;;}
     if ($fileStats[9] < $lastTimeChecked){
  next;  # skip files which are not &dquo;new&dquo;
     }
 }

 open FILE, &dquo;<$fileName&dquo; or die &dquo;open of $fileName failed: $!&dquo;;
 $firstLine = <FILE> ;  # contains the &dquo;match&dquo; string
 unless (defined $firstLine){next;}  # skip if empty
 if ($firstLine !~ /matched/){next;} # File is not in right format, skip it
 if ($showMatchingLine){  # User wants to see the matching line, not the first line in the file
     while ($inLine = <FILE>){
  if ($inLine =~ /$matchString/i){
      chomp($inLine);
      $matchList{$fileName} = $inLine;
      $matchedFileCount++;
      close FILE;
      next FILE_LOOP;
  }
     }
 }
 else {
     $secondLine = <FILE>;  # this will give user a hint as to contents of the file
     unless (defined $secondLine){next;}  # skip if empty
     chomp($secondLine);
     close FILE;
     $firstLine =~ /.+matched\s+\&dquo;(.+)\&dquo;/i;  # Does this file match the requested &dquo;match&dquo; strong
     unless (defined $1){
  if ($DEBUG){print DEBUG_FILE &dquo;  --> failed to find match in \&dquo;$firstLine\&dquo;\n&dquo;;}
  next;
     }
     if ($DEBUG){print DEBUG_FILE &dquo;  matched = $1\n&dquo;;}
     if ($matchString eq $1){   # we have a match.  Set the second line aside to show user
  $matchList{$fileName} = $secondLine;
  $matchedFileCount++;
     }
 }
    }

    if ($DEBUG){
 foreach $matchFile (keys %matchList){
     print DEBUG_FILE &dquo;$matchFile --> $matchList{$matchFile}&dquo;;
 }
    }

# We"ve collected the names of all the files which have the &dquo;match&dquo;
# string in their first line.  We"ve also collected the second line
# (or first matching line) from each of these files.  The second line
# will often allow the user to determine what type of contents are in
# a file.  Present the list to the user and let her select which ones
# to view using the unix &dquo;less&dquo; command.  User input is the # of the
# entry to show, user can select multiple entries.

    unless ($DEBUG){system ("/usr/bin/clear");}
    print &dquo;Found $matchedFileCount &dquo;;

    if ($matchedFileCount == 0){  # ghads, I hate special cases  :-)
 print &dquo;matches\n&dquo;;
    }

    $pickID = 0;
    $totalMatchCount = keys %matchList;
    $matchesShownCount = $choiceCount;
    foreach $matchFile ( keys %matchList){
 if ($pickID == 0){   # print at top of the screen listing next set of matches
     $matchesLeft = $totalMatchCount - $matchesShownCount;
     if ($showMatchingLine){
  if ($matchesLeft <= 0){
      print &dquo;files which contain \&dquo;$matchString\&dquo; ...\n&dquo;;
  }
  else {
      print &dquo;files which contain \&dquo;$matchString\&dquo;. $matchesLeft are left after this group ...\n&dquo;;
  }
     }
     else {
  if ($matchesLeft <= 0){
      print &dquo;files identified by PasteScrape as containing \&dquo;$matchString\&dquo; ...\n&dquo;;
  }
  else {
      print &dquo;files identified by PasteScrape as containing \&dquo;$matchString\&dquo;. $matchesLeft are left after this group ...\n&dquo;;
  }
     }
 }
 $pickList[$pickID] = $matchFile;
 $escapedString = &filterEscapeString($matchList{$matchFile});
 print &dquo;$pickID ($matchFile) >> $escapedString\n&dquo;;   # print each file"s info for user
 $matchesShownCount++;
 $pickID++;

 # We only show choiceCount files at a time, to avoid scrolling
 # choices off the screen.

 if ($pickID >= $choiceCount){  # We"ve completed a batch.  Now see which ones the user wants to see
     print &dquo;\nSelect matches to review (separate by \&dquo;,\&dquo; or \&dquo;.\&dquo;, \<cr\> for next group, \&dquo;\*\&dquo;, \&dquo;q\&dquo; to quit): &dquo;;
     $inLine = <STDIN>;
     if (length ($inLine) > 1){
  if ($inLine =~ /q/i){  # User requested quit
      exit;
  }

  if ($inLine =~ /\*/){ # &dquo;wildcard&dquo; ... user wants to view them all
      $inLine = &dquo;0&dquo;;
      foreach $i (1 .. $pickID - 1){  # yeah, it"s a hack - build up a fake user input
   $inLine .= &dquo;, $i&dquo;;
      }
  }

  @selected = split (/,|\./,$inLine);  # parse user input.
  foreach $selectedID (@selected){
      $selectedID =~ s/\s+//g;  # lose extraneous spaces in input

      if ($DEBUG){ print DEBUG_FILE &dquo;select = $selectedID, &dquo;;}

      if (($selectedID !~ /^\d+$/) or ($selectedID > $pickID - 1) or ($selectedID < 0)){ 
   next;    # range check user input, skip if out of range
      }

      $selectedFileName = $pickList[$selectedID];
      if ($deEscape){  # If user has requested filtering, do it now
   $selectedFileName = &filterEscapeFile($selectedFileName);
      }
      system (&dquo;/usr/bin/less -i -p \"$matchString\" $selectedFileName&dquo;); # show file to user
  }
     }

     # Prepare for next &dquo;batch&dquo; of files to consider
     $pickID = 0;
     @selected = "";
     unless ($DEBUG){system ("/usr/bin/clear");}
 }
    }

    if ($pickID != 0){
 print &dquo;\nSelect matches to review (separate by \&dquo;,\&dquo; or \&dquo;.\&dquo;, \<cr\> for next set, \&dquo;\*\&dquo; for all, \&dquo;q\&dquo; to quit): &dquo;;
 $inLine = <STDIN>;
 if (length ($inLine) > 1){
     if ($inLine =~ /q/i){  # User requested quit
  exit;
     }

     if ($inLine =~ /\*/){ # user wants to view them all
  $inLine = &dquo;0&dquo;;
  if ($DEBUG) {print DEBUG_FILE &dquo;starting in wildcard(2): inLine = $inLine\n&dquo;;}
  foreach $i (1 .. $pickID - 1){
      if ($DEBUG) {print DEBUG_FILE &dquo;in wildcard loop(2): i = $i, inLine = $inLine\n&dquo;;}
      $inLine .= &dquo;, $i&dquo;;
  }
     }


     @selected = split (/,|\./,$inLine);
     foreach $selectedID (@selected){
  $selectedID =~ s/\s+//g;
  if (($selectedID !~ /^\d+$/) or ($selectedID > $pickID - 1) or ($selectedID < 0)){ next;}
  $selectedFileName = $pickList[$selectedID];
  if ($deEscape){
      $selectedFileName = &filterEscapeFile($selectedFileName);
  }
  system (&dquo;/usr/bin/less -i -p \"$matchString\" $selectedFileName&dquo;);
     }
 }
    }

    if ($matchStringArg){  # We only loop once if user invoked with -m
 if ($DEBUG) {print DEBUG_FILE &dquo;done with -m, exiting\n&dquo;;}
 exit;
    }
}

if ($DEBUG) {print DEBUG_FILE &dquo;Fell into exit outside while(1) loop!!!!\n&dquo;;}
print &dquo;unexpected exit!\n&dquo;;
exit;


sub filterEscapeFile{
    my $fileToFilter = pop;
    my $tmpCopyFile = "/tmp/LogViewTmp";
    my $inLine = "";
    my $outLine = "";

    open IN_FILE, &dquo;<$fileToFilter&dquo; or die &dquo;open of IN_FILE ($fileToFilter) failed: $!&dquo;;

    open OUT_FILE, &dquo;>$tmpCopyFile&dquo; or die &dquo;open of OUT_FILE ($tmpCopyFile) failed: $!&dquo;;

    print OUT_FILE &dquo;Original unfiltered file: $fileToFilter --> &dquo; or die &dquo;write of Original Filename to to $tmpCopyFile failed: $!&dquo;;
    
    while ($inLine = <IN_FILE>){
 $inLine =~ s/\&quot;/\"/g;
 $inLine =~ s/\&amp;/\&/g;
 $inLine =~ s/\&lt;/\</g;
 $inLine =~ s/\&gt;/\>/g;
 $inLine =~ s/\&ldquo;/\&dquo;/g;
 $inLine =~ s/\&rdquo;/\&dquo;/g;
 $inLine =~ s/\&lsquo;/\"/g;
 $inLine =~ s/\&rsquo;/\"/g;
 $inLine =~ s/\&hellip;/…/g;

 $inLine =~ s/\e/<ESC>/g;

 print OUT_FILE $inLine or die &dquo;write to $tmpCopyFile failed: $!&dquo;;
    }

    close IN_FILE;
    close OUT_FILE;
    return $tmpCopyFile;
}

sub filterEscapeString{
    my $stringToFilter = pop;

    $stringToFilter =~ s/\&quot;/\"/g;
    $stringToFilter =~ s/\&amp;/\&/g;
    $stringToFilter =~ s/\&lt;/\</g;
    $stringToFilter =~ s/\&gt;/\>/g;
    $stringToFilter =~ s/\&ldquo;/\&dquo;/g;
    $stringToFilter =~ s/\&rdquo;/\&dquo;/g;
    $stringToFilter =~ s/\&lsquo;/\"/g;
    $stringToFilter =~ s/\&rsquo;/\"/g;
    $stringToFilter =~ s/\&hellip;/…/g;

    if (length ($stringToFilter) > $lineLength){
 $stringToFilter = substr ($stringToFilter, 0, $lineLength);
    }

    $stringToFilter =~ s/\e/<ESC>/g;


    return $stringToFilter;
}

sub byCount {
    return $matchCount{$a} <=> $matchCount{$b};
}

sub showHelp {
    print<<endHelp

Use this program to review files saved by the PasteScrape program.
Files are reviewed using the \&dquo;less\&dquo; program.

$0: [-h] [-d] [-e] [-n] [-w] [-l] [-m <matchstring>]  <files to view>
-h: Show this help message
-d: Save debug output to $debugFileName
-e: Convert common escape characters back to normal (e.g. \&dquo;\&lt\;\&dquo; to \&dquo;\<\&dquo;)
-n: View only files created since last timestamp was saved to timestamp file
-w: Save current time into timestamp file
-l: When listing matches, show first match in file, not first line in file
-m: Only show files which contain <matchstring>
-p: Print <line-count> matches for second set of pages (default is 40)

Normally, there are two sets of pages shown.  The first page shows the
various matches which were identified by PasteScrape. It also shows
how many files PasteScrape saved for each match.  When you select a
match from this page, the second set of pages will provide a list of
all the files which contain this match along with a line from each
file to help you identify files of interest.  Those files you select
will then be shown to you via the &dquo;less&dquo; program.

Using the -m option skips the first page and takes you directly to the
second set of pages.  When combined with -l, the entire contents of
each file will be searched for <matchstring>, otherwise all matches
will be based on the primary match identified by PasteScrape.

The followng options will be available on the first page:
<\#>: Select the match to view by specifying its number
\&dquo;w\&dquo;: Write the timestamp file and quite the program (see the -n option)
\&dquo;q\&dquo;: To quit the program
You will also have the option to specify a custom search string or
regular expression

After selecting the number of a match to view (or if using -m), you
will be presented with a list of files which match your request.

By default, the first line of each matching file is shown (since this
will often identify the type of file).  With the -l option, the first
matching line in the file will be shown instead.  Please note that
since the -l option searches the entire file for matches, it may
identify more files to review than were identified by PasteScrape
(e.g. a file identified by PasteScrape as containing &dquo;Password&dquo; may
show up when you request matches for &dquo;Username&dquo;, since it contains
both.)

When presented with a list of matches, the following commands are
available:
<\#>: Select matches to review (select by number and separate by \&dquo;,\&dquo; or \&dquo;.\&dquo;)
\<cr\>: To move to the next page of matches (or back to the first page if done) 
\&dquo;\*\&dquo;: To select all the files shown
\&dquo;q\&dquo;: To quit the program

A common usage would be: $0 -n -e -l 2013-04-\*/\* 

endHelp
 ;
    exit;  # We always exit after showing help
}



After collecting files for a day or so, running PasteView can be pretty interesting.  Keep in mind that the richness of the files you collect (and how much disk space you fill) is dependent on the contents of the keywords.txt files used by PasteScrape.pl.  BTW, if keywords.txt contains the line ".*" (no quotes), you'll collect all the files publicly available. :-)

Both the programs take a "-h" command line option to provide a help page.

Thursday, April 4, 2013

We need non-resolvable domain names

Cute.

As ICANN starts to roll out extended domain names, folks are starting to notice potential collisions with domain names that have long been used on internal networks (e.g. ".corp".)  This leads to all sorts of problems when those domains suddenly start resolving to addresses outside the internal network.

Some of those problems include significant security problems, for example with certificates.

This article nicely lays out the problem: http://arstechnica.com/security/2013/04/possible-security-disasters-loom-with-rollout-of-new-top-level-domains/

The obvious solution is for ICANN to designate certain domain names as reserved for internal use, similar to RFC 1918 non-routable IP addresses.  As suggested in the letter referenced in the article linked above, surveys of internal domains already in use provides a list of likely candidate.