Enter Sandbox 31: Web Shells

Webshells are malicious scripts/programs that are uploaded to compromised web servers. Most webshells are written in JSP, ASP, PHP and they are interpreted by a dedicated script processor/interpreter executed by the web server (f.ex. Apache, IIS, Tomcat). The results of that processing are rendered into a content that is sent back to the client – typically a web browser, but more advanced offensive tools may choose a different communication protocol.

Once a webshell is installed, it allows the attacker to use the compromised web server for various purposes. The attacker can use it to steal web server data and code, attack other systems, act as a proxy, host phishing content/landing zones, give hacktivists an ability to deface a web site, impersonate web server owner, and do many other things…

The reason I talk about them in this series is because sandboxes don’t handle webshells well. And yes, analysing webshells is actually not an easy task, in general…

First the good news. There are still many cases today where very well-known webshells are being used by attackers, attackers who are not making any effort to change the code of these classic web shells.

Then the worse news: it is becoming more and more common nowadays that many of the web shells are obfuscated, code-protected (various wrappers and extensions), password-protected, many rely on POST instead of GET requests (which means that the parameters included in POST requests are usually not logged by the web server), and some more advanced webshell payloads can be very well hidden inside the legitimate server-side code (making them harder to spot). As you can imagine, manual webshell analysis of these more complicated cases is very time-consuming and non-trivial. This is because in most these cases there it now the authentication bit that we need to take care of… In other words, when you access a more modern webshell today, you often need to provide a password, token, secret to access that particular webshell’s GUI. So, if it is not included in an outer code layer of a webshell, we need to either find it via opportunistic hash cracking, leak, or, if lucky, borrow it from other researchers who were luckier and somehow got a hold of it… Anyone having an experience manually analysing webshells like this knows that it is a very time consuming game, and one that doesn’t necessary guarantee the positive outcome…

Luckily, not all cases are like this.

Many of the old-school webshells will still load and execute their code with no problem, they don’t include any guardrails, and are often instantly rendering a user-friendly interface, presenting its features to a curious analyst in their full glory.

Additionally, a lot of webshell functionality existing today is often… accidental. Many web developers don’t have any cybersecurity experience, they don’t validate the input, they don’t sanitize the output and happily include code in their creations that takes whatever the input user-controlled parameters provide, and push it directly to some very dangerous shell functions that can be easily fooled into running some unexpected code…

And this is why sandboxes should be used to analyse web-server code; same as they analyze traditional executables.

We can argue that at the very basic level, sandbox webshell (or, more precisely: web server code) analysis should:

  • provide at least a screenshot of how the suspicious script is being rendered by a browser,
  • allow the analyst to inspect the code, ideally, unwrapped,
  • highlight references to GET and POST variables, and
  • perhaps support interactive analysis by letting the analyst play around with the sample during the session, and allow them to send hand-crafted GET or POST requests (if lucky, manual code analysis can help to discover required credentials that will grant access to the web shell’s GUI).

So, with that… let’s look at some very practical aspects of web server code analysis from a sandbox designer’s perspective. Depending on the file type, we need to create an environment within the test system that is able to execute/interpret scripts on a server-side (web server running locally), and then present the results of script execution on a client side (browser), and then combine them to present the final results to the sandbox user.

Analysis of JSP scripts

  • Download and install the latest Java Development Kit.
  • Download and install the latest Apache Tomcat installer.
  • Now you can access Tomcat Web Server on http://localhost:8080/.
  • You can drop a JSP sample in c:\Program Files\Apache Software Foundation\Tomcat <version>\webapps\test\<file>.jsp, and access it via http://localhost:8080/test/<file>.jsp.

Analysis of JSP and PHP scripts

  • Download and install the latest Java Development Kit.
  • Download and install the latest XAMPP.
  • Now you can access Apache Server on http://localhost/ and https://localhost/, and then Tomcat Web Server on http://localhost:8080/.
  • You can drop PHP sample into c:\xampp\htdocs\test\<file>.php and access it via http://localhost/test/<file>.php and https://localhost/test/<file>.php.
  • You can drop JSP sample into c:\xampp\tomcat\webapps\test\<file>.jsp and access it via http://localhost:8080/test/<file>.jsp.

Analysis of ASP scripts

  • Install Windows Server of your choice, f.ex. Windows Server 2025.
  • Install IIS and ASP support following this guide or others.
  • Now you can access IIS Web Server on http://localhost/.
  • You can drop ASP sample into c:\inetpub\wwwroot\test\<file>.asp and access it via http://localhost/test/<file>.asp.

Running possible web server scripts under Apache, Tomcat, IIS servers in an automatic fashion is cool, but we know it is just the first piece of puzzle.

If we are lucky, and the web server script code is not obfuscated, we can make an attempt to analyze its code, even if in a rudimentary way. The goal is to discover the following:

  • is the code recognized by any yara rule?
  • does the script expect GET, POST, or both requests ?
  • how are these retrieved? (there are multiple ways to do it f.ex. $_GET, $_SERVER[‘QUERY_STRING’] in PHP)
  • what are the names of parameters passed to the script ?
  • is the code obfuscated/wrapped/hidden/protected?
  • can we analyze the script code to understand if there are any:
    • hardcoded values in it, possibly usernames/passwords
    • hashes of hardcoded values in it, possibly of usernames/passwords
    • URLs/IPs it is connecting to
    • files it writes to
    • references to known webshell functions that help to compress/decompress, code/decode data&code, execute/interpret code f.ex. base64_decode/base64_encode, gzdeflate/gzinflate, str_rot13, eval, system, etc.,
    • references to string/character/hexadecimal/decimal/octal manipulation functions that are often used by webshells to construct dynamic code,
    • references to functions that may shed light on the code functionality f.ex. file operations, directory operations, file downloading/uploading, program execution, privilege escalation, etc.,
    • references to known webshell authors, ASCII ART of hacking groups, etc.

As you can see, there are many ways to improve our webshell sandbox analysis experience…

Enter Sandbox 30: Static Analysis gone wrong

This series is quite old, and I kinda abandoned it at some stage, but today I am reviving it to talk about … static analysis…

Let’s be honest – last 2 decades changed the way we do malware analysis, and for many reasons:

  • groundbreaking developments in decompilation,
  • groundbreaking developments in deobfuscation,
  • groundbreaking developments in devirtualisation,
  • groundbreaking developments in emulation,
  • groundbreaking developments in sandboxing,
  • groundbreaking developments in Satisfiability Modulo Theory (SMT) solvers,
  • groundbreaking developments in GenAI,
  • demonopolisation and democratisation of reverse engineering tools aka a lot more tools available in general, and even if some are still commercial, they are often cheaper, and many that are free — are literally game changers, and generally speaking… the tooling today is far more accessible than it was 20 years ago,
  • emergence of many advanced (and often free) mature malware-oriented sandboxing, hooking and emulation toolkits,
  • development of many free tools/techniques enables us to decompile, debundle many installers or compiled scripts,
  • software (including malware) developers walking away from protectors, packers and wrappers of yesterday – today it’s often no longer worth it,
  • emergence of tools like Detect It Easy, Yara/Yara-X, Capa, Floss, Bulk Extractor, and many forensic tools that allow us to perform a lot of file format-parsing tasks associated with preliminary static sample analysis focused on ‘low hanging fruits’ like:
    • reputational checks, signed binary checks,
    • determining the file format very precisely,
    • automated feature/functionality discovery/extraction/classification,
    • automatic payload decryption/extraction,
    • automatic config decryption/extraction,
    • full metadata parsing/extraction,
    • extraction of strings of interest hidden inside the code that in the past we could only find via dynamic analysis (f.ex. on stack), and of course,
    • large and rich libraries of yara rules help to immediately identify malware sample’s family if it has been already classified before,
    • older programming languages like Visual Basic, Delphi, C, C++ are now replaced by Go, Rust, Python, .NET, Windows Apps, Electron Apps,
  • emergence of SaaS and software delivered via browser only,
  • disabling OS / Software features by default helped to kill many attack vectors (macros, autorun.inf, etc.),
  • decreasing importance of email – it got replaced by IM software with rich features,
  • lots of new operating systems, new CPUs, and new architectures expanded the scope, and made Windows less important,
  • jailbreaking scene,
  • 0day/vulnerability discovery scene,
  • lolbins, RMMs and a wave of TTPs that focus on blending in with the environment,
  • advances in EDR-based detections,
  • advances in decoy-based detections,
  • lots of new protections built-in into browsers and file readers/viewers prevent old drive-by attacks,
  • smartphones and tablets taking over from desktop computers and laptops for many daily tasks,
  • 0days moving from endpoints to IoT, appliances, mobile devices,
  • security focus moving from an endpoint attack surface to identity solutions,
  • platformisation and a global move from ‘build’ to ‘buy’ lowered the bar for cybersecurity skills required to do the job,
  • etc.

In 2010 malware analysts’ skills were measured by the knowledge of debuggers, disassemblers, file formats, packers, etc. Now… we are in 2025 and let’s be honest… malware analysis process of today usually starts with a submission of a sample to a sandbox / sample analysis portal. And, sadly, it very often ends there!

This is where this post begins.

I am quite surprised that many automated malware analysis solutions do not process samples statically very well. They do not do in-depth file format analysis, they do not recognize corrupted files well, and often offer a false sense of security/value by offering a CLEAN verdict for files that simply need more …. reversing love.

See the below example.

I took Notepad.exe from Win10, truncated it with a hex editor, and then submitted it to a few online file analysis services. I am happy that some of them immediately marked the file as corrupted, but it didn’t stop them from running a full-blown dynamic analysis session on the file I submitted. And in terms of static analysis, some solutions went as far as to report lots of findings related to anti-reversing techniques, cryptography, and lots of far-fetched conclusions that are nonsensical in a context of a) a corrupted file, b) Notepad program (clearly non-malicious), and are simply not a true reflection of reality.

I kid you not, but a truncated notepad sample that will never execute was marked as

  • a program that can enumerate processes (because it references NtQuerySystemInformation function that is actually used by warbird protection that invokes this API with a SystemThrottleNotificationInformation/SystemPolicyInformation parameter),
  • a program that accepts drag & drop operations (true),
  • a program that has an ability to take screenshots (just because it references a CreateDC API function), which is not true,
  • and so on and so forth.

Let’s be clear – mapping presence of APIs in the sample’s import table or as a string referencing API name found in a sample’s body to actual ‘threats’ or TTPs is an absurdity that is omnipresent in sandbox reports today and should be corrected asap. This could have worked in 2010, but today these sort of ‘determinations’ must be seen as poor indicators.

And as an analyst, I’d actually like to see why the sample was marked as corrupted. I’d also like to see the context of the far-fetched API-matching claims as well. You can’t list many Windows API in a negative context (like f.ex. CreateDC that notepad uses for… printing) unless you really can prove that it is indeed present in the code to deliver some malicious functionality… It strikes us as an over-simplistic approach that is focused more on the quantity of the findings than the overall quality of the report.

This is where old-school reversing comes in.

A long time ago I wrote my own PE file parser that I always run on all PE samples that I analyze, first. Because I wrote it, I fully control what it tells me, and since I used this tool to analyze many files over the years, corrected it on many occasions, learned a lot about PE file format intricacies on the way, and I have incorporated a lot of PE file format checks into it.

Running it on my truncated Notepad sample I immediately get many red flags:

(Raw Offset + Raw size of '.data '=0002EC00>filesize=0002DE00
(Offset to Raw size of '.pdata '=0002EC00>filesize=0002DE00
(Offset to Raw size of '.didat '=0002FE00>filesize=0002DE00
(Offset to Raw size of '.rsrc '=00030000>filesize=0002DE00
(Offset to Raw size of '.reloc '=00030C00>filesize=0002DE00
(wrong appdata ofs/size=0002EC00,00000000)
(.rsrc File Offset 00030000 <> DataDirectoryResourceOffset = 00000000

Seeing this kind of result immediately alters the way I do my sample analysis:

  • I, for sure, can’t run/test/debug/analyze it.
  • I, for sure, can’t trust any sandbox report generated for this sample.
  • I may need to ask about the source of the file.

My point is… if we want to sandbox/automate sample analysis, let’s do it in a smarter way. File format parsing is an extremely complex topic. If you look at Detect It Easy program’s data base, you will find a huuuuge number of file-typing routines that try to analyze various file types and return the best verdict possible.

So what can we do today?

Ask Sandbox vendors to do a more thorough static analysis that check file’s basic properties and at the most basic level, verifies if we have enough data in a submitted file to cover all the sections listed in a PE header…