By using Mandiant’s Redline tool, I’ve identified three of the seven new samples that VirusShare has just added:

  • GLOOXMAIL - 3de1bd0f2107198931177b2b23877df4
  • BISCUIT - 12f25ce81596aeb19e75cc7ef08f3a38
  • TARSIP-MOON - bd02b41817d227058522cca40acd390

This week marks the first week that I have integrated APT1 samples into the graded practical exercises in the Reverse Engineering class I teach at Mississippi State University. The use of real-world malware attributed to state-sponsored actors in my classroom has been the focus of some recent positive media attention. If you’re interested in following along, this is the assignment my students are working on this week:

The students have been excited about applying what they’ve learned to malicious software that’s been making headlines recently. Most of the APT1 samples are easy enough to analyze to be good exercise material for the students at this point in their reverse-engineering-education, and it’s interesting to look at the software that’s been responsible for the theft of so much information. I’m very impressed with my students’ progress so far, and I hope they’re enjoying getting their hands dirty this week.

 

Introduction

I was contacted a few days ago by a person who had knowledge of a small Electronik Tribulation Army botnet.  You might remember these guys as being GhostExodus’ old group.  The contact sent me the source code of a PHP bot that connects to an IRC command & control.  The source was was obfuscated using the Free Online PHP Obfuscator.  To find the C&C server, I went through a process of stripping away the obfuscator’s layers of encoding, which I’m documenting here.  This information might be useful if you’re doing similar reverse-engineering work on this PHP obfuscator (or others).

Note: At each stage, I have stripped the “<?php” tags to prevent the code from running accidentally.  If you are following along, you’ll need to re-insert them (and preferably do so within a sandbox environment).

Stage 1

Here’s the original chunk of code:

On the first line, a variable is being set to a string that’s being represented by a mix of hexadecimal (‘\x’) and octal (‘\’) escape sequences.  This obfuscator makes extensive use of this technique. Python uses the same escapes as PHP for hex and octal, so it’s easy to use my always-open python shell to see a “normalized” ascii representation of these strings:

>>> "\x62\141\x73\145\x36\64\x5f\144\x65\143\x6f\144\x65"
'base64_decode'

PHP allows strings to be used as function names with a very easy syntax, so the variable $v539ded4bc2c gets set to “base64_decode”, which is then called with a large string of base64-encoded code.  The decoded string of code then gets passed to eval() to execute.  We’d rather just see what the decoded string is, so the easiest thing to do is replace the eval() with a print().  Then we can dump out the next stage:

hacbooknano:php_reverse wesley$ php original_print.txt > stage2_1.txt

Stage 2

Here’s what we have now:

The lack of line breaks is annoying, so a little dirty python code to split that up:

#!/usr/bin/python
import sys

fp = open(sys.argv[1])
data = fp.read()
fp.close()

for i in data:
   sys.stdout.write(i)
   if i == ';':
      sys.stdout.write('\n')

Running this:

hacbooknano:php_reverse wesley$ ./breaklines.py stage2_1.txt > stage2_2_linebreaks.txt

We now have this:

The first 133 lines set up obfuscated names for the rest of the code in this stage. It builds them a character at a time, interleaving them.

We can decode these names by copying those assignments out to another file, and printing the obfuscated names out at the end:

hacbooknano:php_reverse wesley$ php stage2_3_displaynames.txt
x24b0884a06dee76da986eb65ba2940d = base64_decode
t104a34fab793aa8acc27101aa69e16d = ereg_replace
f28748ed1b08d4ce5faba4c5bbe478a2 = file_get_contents
sba02b7a6e9217c818bda90209467b6b = gzinflate
k9c9e40dc7cf4574c577417cdc8ae8a4 = md5
fafd3e80e124e1f5d45522b2e31e3eab = ob_end_clean
n8ad08ea0791139ed748c49d82092979 = ob_end_flush
v077b05ec0999fba76a979f188a32e32 = ob_get_contents
gb6e4eb13daf014a331ffe0376f2357b = ob_start
ff29e8f9567141dfd9b4c31c83a38d63 = str_replace
gb4ceeb3708efd3539d845de0b7fd52e = str_rot13
g52eba32e62d0a481f8e5efd196b27b8 = strpos
n8af683210c35ad36253a33d28a3fbde = strtok

Now, you can take this and go back to stage2_2_linebreaks to rename all the functions to their more readable names.  I did this manually with search-and-replace in TextMate, since I wanted to see what was being replaced and when.  I also normalized the strings as I did in stage 1.  You wind up with the following code:

There’s what appears to be a tamper check, though I didn’t really play with it much since there’s no reason to.  All we’re interested in at this point is the body of that “if” clause.  A chunk of encoded text is ROT-13′d, base64 decoded, gunzipped, and finally eval()’d.  If we chop out the tamper check, and replace the eval() with a print() again, we get to move on.

Stage 3

Here’s what we have now:

This is close to the original code.  The obfuscator has encoded the strings, done away with whitespace, and randomized variable names.  We can normalize the strings, as above, and reformat the code.  For variable names, that’s where we have to do some more human-eyes analysis.  By looking at what the variables are set to, what functions they are being passed into, and other contextual information, we can give most variables much more reader-friendly names.

I only partially went through this process with this file, as I found what I needed, and had a good idea of the rest of the file.  The partial cleanup is here:

Here’s where it’s assigns the botnet C&C server settings:

error_reporting(0);
set_time_limit(0);
$filename = "./a73v9.php";
$current_dir = "./";
$channel = "#nobotshere";
$host = "complexity.razorhack.org";
$port = 65000;

The system, at the time, had been compromised by the ETA member, MR^E, giving shoutouts to the other ETA members:

(Real smart, defacing your own botnet C&C)

Conclusions

I’d like to thank my twitter followers for being very rapid in getting back-channels in-gear to get the C&C hosting and domain taken out.  While they’re back to much more typical skiddie activities (as opposed to backdooring hospital HVAC systems), it’s obvious that these guys haven’t learned much of a lesson.  One can only hope that one day they’ll realize that they can build on the skills they’re using to run nets like this to get a start in legitimate security work, before it’s too late and they manage to burn their bridges and/or get busted.

Hopefully this will help some folk get a start in reversing PHP (and other interpreted language) de-obfuscation as well.  It’s pretty easy, and I think that files like this would serve as a good introduction for students to the concepts involved in reverse engineering in general.  After a few baby-steps like this we can move them up to compiled code :) .

Update: Looks like the original author of the bot code found out about this post, and decided to post the original source, along with a rant about how I “pick on retards”:

© 2012 McGrew Security Suffusion theme by Sayontan Sinha