Analyzing nested, obfuscated PHP files…

Many PHP webshells are encrypted, encoded, obfuscated in many different ways, but most use a rudimentary approach relying on engaging the same sequence of code ‘hiding’ routines repetitively, sequences that rely on calls to eval that are then applied to various combos of gzinflated, base64decoded and sometimes rot13-decoded data blobs. In some cases it can go on and on for as many as 10+ iterations…

Deobfuscating such scripts is a tedious, and quite frankly, a boring job to do – you have to go through all these layers manually, one by one. BOOOOOORING!

Luckily, we can use some automation to help the process…

The following perl script does the magic on a Windows box, provided you have installed Php in c:\php.

use strict;
use warnings;

my $f=shift || die ("Gimme a file name!\n");
my $cnt=0;
my $nf=$f;

while (1)
 {
    print "$cnt\n";
    open F,"<$nf";
    binmode F;
    read F,my $data,-s $f;
    close F;

    last if $data !~ /[\s\@]eval/s;

    $data=~s/([\s\@])eval/$1echo/sg;
    $data=~s/X-Powered-By: .*?><\?//sg;

    $nf = sprintf("$f.%04d.php",$cnt);

   if ($data !~/<\?/)
    {
       $data = "<?php\n$data\n";
    }    

    open F,">$nf";    
    binmode F;
    print F $data;
    close F;

    system ("c:\php\php.exe $nf > $nf.txt");
    $nf=$nf.".txt";

    $cnt++;
 }

What happens is this: the script takes an input PHP file, reads it, finds eval expressions in it, and replaces them with echo, and then writes the modified script into a new file, and then it executes this new file as a script under php.exe binary. The resulting decoded/decrypted script is saved to a new text file. Then this new text file becomes an input to the same procedure, and the process is repeated until there is no more eval references in the final file…

If you are lucky, the last file will hold the decoded script.

There are caveats, of course.

More advanced cases rely on sneaking in some non-evaled variables that are introduced in consecutive layers, to be then used/referenced later, sometimes even by the final layer. The script above doesn’t take care of cases like this, but still, you can solve such cases by browsing the resulting .txt files — you will quickly discover where the information was lost, and adjust for it, and finally – repeat the whole automation process after manually editing one of the intermediate text files.

If it doesn’t make any sense: just review the text files one by one and look for loose code or initialized variables that you may want to manually copy to the next layer, and restart from there.

The other caveat case are scripts that rely on preg_replace(“/.*/e” trick. It’s not taken care of, but why should it? It’s been deprecated since php 5.5 and removed in php 7.0. If you see a script obfuscated using this function, it’s most likely a very old code. You can still de-obfuscate it manually or semi-automatically (with parts deobfuscated by the above scripts), but let’s be honest – very unlikely it’s your smoking gun…. If anyone is still using such old PHP version, there is probably a bigger fish to fry…

In my experience, the code works on majority of poorly encoded php webshells, and if sometimes it doesn’t – it just needs a tweak or two to account for some random/unexpected cosmetic issues.

And sometimes, you can just run the script on all already somehow deobfuscated scripts:

for %k in (.php*) do (perl __decode.pl "%k") 

Example results for a test.php script:

Da Li’L World of DLL Exports and Entry Points, Part 6

I love looking at clusters of files, because it’s the easiest way to find patterns. In the last part of this series I focused on Nullsoft installers (DLLs!) only, and today, I will use the very same idea to describe clusters of DLL families I have generated from a very large corpora of clean samples (collected over last decade, or so).

What makes a summary like this interesting?

Some malware families like to ’emulate’ real software. They imitate clean .exe and .dll files by copypasteing their lists of imports, exports, internal strings, but then adding an extra import or export here and there; some go as far as to integrate their malicious code with the existing source code. So, the compiled embedded malicious code occupies like 5-10% of the actual binary, and the rest is all nice and dandy code ‘borrowed’ from some open source project. Detecting a malicious code inside such binaries is not trivial, but one thing that sometimes gives the badness away is that extra export. So, this post is about these extra exports…

The most popular exports combo in my sampleset is this:

148199
DllCanUnloadNow
DllGetClassObject
DllRegisterServer
DllUnregisterServer

No surprises here, it’s your traditional COM library at work.

The next two are variants of the above, but including an extra export:

27741
DllCanUnloadNow
DllGetClassObject
DllInstall
DllRegisterServer
DllUnregisterServer

24647
DllCanUnloadNow
DllGetClassObject
DllMain
DllRegisterServer
DllUnregisterServer

Now you know where it’s heading…

When you analyze a DLL, and it includes all the export functions from the sets above, BUT then export some additional functions, these functions are definitely of interest. This doesn’t mean all DLLs that export these ‘default’ sets + something extra that I am highlighting as ‘functions of interest’ are malicious. It’s just an easy win to focus on these extra exported functions first, even if just to discover that a legitimate programmer of a legitimate DLL was overzealous in over-exporting functions…

Here’s an example of a legitimate set with these ‘extras’:

2019
DeferredDeleteW
DllCanUnloadNow
DllGetClassObject
DllInstall
DllRegisterServer
DllUnregisterServer
InstallPackagesManagedW
InstallPackagesW
ReinstallPackageW
ResumeAsyncW
ResumeW
UninstallPackageW

or

1464
DllCanUnloadNow
DllGetClassForm
DllGetClassInfo
DllGetClassObject
DllGetInterface

Secondly, many of traditional DLL exports are _not_ meant to be executed from the likes of rundll32.exe.

What does it mean?

These popular DLL Export combos give you a list of functions that, if seen being invoked via command line, are most likely an indicator of something ‘funny’ going on. This is because these functions are (normally) not designed to be rundll32-friendly and are meant to be accessed programmatically only. There are exceptions, of course…. f.ex. tailored DllInstall is sometimes invoked by legitimate software and via rundll32.exe, but the main message here is that if you see rundll32.exe executing one of the non-rundll32-exe friendly functions, you better start investigating…

Last, but not least — remember that DLLs exporting via ordinals is a thing too, so keep this in mind during your analysis….

So, what other ‘healthy’ combos we can see out there?

  • QT Plug-ins export these two functions:
1847
qt_plugin_instance
qt_plugin_query_metadata
1612
gegl_module_query
gegl_module_register
  • NVIDIA Stereo API DLLs:
1582
GetStereoApi

There are many other combos like this, but in today’s era of AI knowing-it-all, ask your nearest chatGPT for the full list, mine is most likely already quite obsolete 🙂