Saturday, February 2, 2013

Pasting for Gold


A little while ago, I happened to go to pastebin.com, and after reading the latest hacker manifesto, I noticed on the upper right the list of "Public Pastes" in the last 20 seconds.  Checking them out, I was surprised to discover that in almost real time I could take a sample of the files that folks were anonymously pasting to Pastebin.  Fascinating!  Lots of code snippets, various configuration files, a couple more hacker manifestos, chat logs, pointers to torrents, amateur fiction, schools assignments and lots of other stuff.  It was really quite addictive.

BTW, keep in mind some of these things are Not Safe For Work - however it's all text - you won't have  some sort of picture just pop up unless you follow a URL in a pasted file.  But just to be clear, some of the text is quite offensive.

Digging a bit deeper, I found some perl code to monitor Pastebin for public posts, retrieving pastes that match keywords.   Tweaking it a bit, I've got something which now runs in the background and does the monitoring for me.  The program I started with is at this link:

http://malc0de.com/tools/scripts/pastebin.txt

My modified version:

#!/usr/bin/perl -w

#
#Simple perl script to parse pastebin to alert on keywords of interest. 
#1)Install the the LWP and MIME perl modules
#2)Create two text files one called keywords.txt and tracker.txt
#2a)keywords.txt is where you need to enter keywords you wish to be alerted on, one per line.
#3)Edit the code below and enter your smtp server, from email address and to email address. 
#4)Cron it up and receive alerts in near real time
#

########################################################################
# Downloaded 1-29-13 from http://malc0de.com/tools/scripts/pastebin.txt
# by DA - I'm not the author, but I'm afraid that I've had my way with it.
# Changes:
#     Removed email code
#     Added random sleep to be considerate 
#     Added infinite loop to be inconsiderate
#     Added write the matching paste to a separate file (writeHitToFile)
#     Added writting matching expression to writeHitToFile
#     Moved read of regex to inside main loop - catch changes on the fly
#     Added write log of hits to HitList.txt
#     Added getopt and cleaned up a bit
########################################################################

$DEL_DEBUG = 0;
$delayInterval = 5;  # Default max delay between queries to web site
$keyWordsFileName = 'keywords.txt';

use LWP::Simple;
use LWP::UserAgent;

use Getopt::Long;

GetOptions ("h" => \$Help_Option, "d" => \$DEL_DEBUG, "w=s" => \$delayInterval, "k=s" => \$keyWordsFileName );

if ($Help_Option){ &showHelp;}

my $ua = new LWP::UserAgent;
$ua->agent("Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1");

my $tracking_file = 'tracker.txt';

while (1){

    # Load keywords.  Check the file each loop in case they've changed.
    open (MYFILE, $keyWordsFileName);
    @keywords = <MYFILE>;
    chomp(@keywords) ;
    $regex = join('|', @keywords);
    close MYFILE;

#Set the date for this run
    my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime(time);
    my $datestring = sprintf("%4d-%02d-%02d",($year + 1900),($mon+1),$mday);
    $dir = sprintf("%4d-%02d-%02d",($year + 1900),($mon+1), $mday);

    my $req = new HTTP::Request GET => 'http://pastebin.com/archive';
    my $res = $ua->request($req);
    $pastebin = $res->content; 

    my @links = getlinks();
    $linkCount = $#links;

    if ($DEL_DEBUG){print "\n";}  # Just a stupid formatting thing
    print "Starting new batch. Save-to dir is $dir. Keywords file is $keyWordsFileName. regex is: $regex\n";
    if ($DEL_DEBUG){ print "size of \@links: $linkCount\n";}
    if (@links) {
     foreach $line (@links){
         &RandSleep ($delayInterval);
         if  (checkurl($line) == 0){
          my $request = "http://pastebin.com/$line\n";
          my $link = $line;
          my $req = new HTTP::Request GET => "$request";     
          my $res = $ua->request($req);
          my $content = $res->content;
          my @data = $content;
          if ($DEL_DEBUG){#print "-------------------------------------------------\n";
              print "checking ($linkCount) - http://pastebin.com/$line ... ";
              $linkCount--;
          }
          foreach $line (@data){
              if ($content =~ m/\<textarea.*?\)\"\>(.*?)\<\/textarea\>/sgm){     
               @data = $1; 
               foreach $line (@data){
                   if ($line =~ m/($regex)/i){
                    $Match = keyWordMatch ($line);
                    storeurl($link);
                    if ($DEL_DEBUG){ print " matched $Match ...";}
                    &writeHitToFile ($link, $line, $Match);
                   }
               }
              }
          }
         }          
     }
    }
    else {
     die "fetch of links failed - can't say why\n";
    }
}

sub getlinks{
    my @results;
    if (defined $pastebin) {
        @data = $pastebin;
        foreach $line (@data){
            while ($line =~ m/border\=\"0\"\s\/\>\<a\shref\=\"\/(.*?)"\>/g){
                my $url = $1;
             push (@results, $url);        
         }
     }
    }
    
    return @results;
}

sub storeurl {
    my $url = shift;
    open (FILE,">> $tracking_file") or die("cannot open $tracking_file");
    print FILE $url."\n";
    close FILE;
}

sub checkurl {
    my $url = shift;
    open (FILE,"< $tracking_file") or die("cannot open $tracking_file");
    foreach my $line ( <FILE> ) {
     if ( $line =~ m/$url/i ) {
         if ($DEL_DEBUG){print "detected repeat check of $url ";}
         return 1;
     }
    }
    return 0;
}

sub RandSleep{
    my $maxSleepTime = pop;
    my $sleepTime = int rand ($maxSleepTime + 1); # Need the +1 since we'll never hit maxSleepTime otherwise

    if ($DEL_DEBUG){print "sleeping for $sleepTime\n";}
    sleep $sleepTime;
}

sub writeHitToFile{

    my $matchingExpression = pop;
    my $Contents = pop;
    my $url = pop;
    chomp ($url);

    unless (-e $dir){
     mkdir $dir or die "could not create directory $dir: $!\n";
    }

    if (-d $dir){
     open (HIT_FILE, ">$dir/$url") or die "could not open $dir/$url for write: $!\n";
     print HIT_FILE "http://pastebin.com/$url matched \"$matchingExpression\"\n" or die "print of url to $dir/$url failed: $!\n";
     print HIT_FILE $Contents or die "print of contents to $dir/$url failed: $!\n";
     close HIT_FILE;

     # Get the current time for the list file entry
     my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime(time);
     my $datestring = sprintf("%4d-%02d-%02d %02d:%02d",($year + 1900),($mon+1),$mday, $hour, $min);

     open (HIT_LIST_FILE, ">>HitList.txt") or die "could not open HitList.txt for append: $!\n";
     print HIT_LIST_FILE "$dir/$url - http://pastebin.com/$url matched \"$matchingExpression\" at $datestring\n" or die "print of hit to HitList.txt failed: $!\n";
     close HIT_LIST_FILE;
    }
    else {
     die "$dir exists but is not a directory!\n";
    }
}

sub keyWordMatch{
    my $matchingLine = pop;

    foreach $check (@keywords){
     if ($matchingLine =~ m/$check/i){
         return $check;
     }
    }
    return "No Match";
}

sub showHelp {
    print<<endHelp
$0: [-h] [-d] [-w <Max Wait Interval in seconds>] [-k <Keywords File>]
Check whether <remote-port> on <remote-host> is listening
-h: Show this help message
-d: Print debug output
-w <wait seconds>: Max wait in seconds between fetches.  Each fetch is delayed a random amount between 0 and this value. Default is 5 seconds.
-k <filename>: Name of file with keywords to monitor for.  Each line of the file is text or a perl regular expression. Default is \'keywords.txt\'

Track progress via \"tail -f HitList.txt\"
endHelp
     ;
    exit;  # We always exit after showing help
}
 



In my version, I make a point of throttling my accesses since I don't want to abuse their site, so I'm likely only sampling a portion of what's posted to Pastebin.

Even with sampling I've been pulling up a lot of data - e.g. in one 24 hour period I ended up grabbing 1313 pastes which contain the word "password" in them ... several of which document compromised accounts.  In that same period there were 28 pastes with the word "passwd", and 200 with the word "anonymous" in them.  When I have time, this will all go into my password cracking lists.

Just as an example of the sort of thing which turns up, without trying very hard, I came across a portion of the 55,000 Twitter accounts which were compromised last year.  (Like the echo of a scream - they're still bouncing around on the Internet.)   Other things I've stumbled across include the latest call to action by Anonymous (operation #fema) and several password lists posted from recently hacked sites.  As yet another example, I think I came across source code for a couple of programs which appear to be part of Windows NT.

Probably the biggest problem with all this is that there's too much data to easily sort through.  The next round of modifications my program will be to try to find ways to make sorting through the results more efficient.

The bottom line of all this is that monitoring Pastebin can give you a very interesting view into some portions of the Internet.  Of course, there are probably many other, similar, places you can productively monitor.

To use my program you may need to install the LWP perl module (although it appears to be installed by default).  And then let'er rip!

Here's some more discussion about monitoring Pastebin: http://isc.sans.org/diary/SCADA+hacks+published+on+Pastebin/12088 https://isc.sans.edu/diary/Quick+Tip%3A+Pastebin+Monitoring+%26+Recon/12091




No comments:

Post a Comment