{"id":10140,"date":"2025-09-19T22:18:42","date_gmt":"2025-09-19T22:18:42","guid":{"rendered":"https:\/\/www.hexacorn.com\/blog\/?p=10140"},"modified":"2025-09-19T23:09:37","modified_gmt":"2025-09-19T23:09:37","slug":"enter-sandbox-30-static-analysis-gone-wrong","status":"publish","type":"post","link":"https:\/\/www.hexacorn.com\/blog\/2025\/09\/19\/enter-sandbox-30-static-analysis-gone-wrong\/","title":{"rendered":"Enter Sandbox 30: Static Analysis gone wrong"},"content":{"rendered":"\n<p>This series is quite old, and I kinda abandoned it at some stage, but today I am reviving it to talk about &#8230; static analysis&#8230;<\/p>\n\n\n\n<p>Let&#8217;s be honest &#8211; last 2 decades changed the way we do malware analysis, and for many reasons: <\/p>\n\n\n\n<ul>\n<li>groundbreaking developments in decompilation,<\/li>\n\n\n\n<li>groundbreaking developments in deobfuscation,<\/li>\n\n\n\n<li>groundbreaking developments in devirtualisation,<\/li>\n\n\n\n<li>groundbreaking developments in emulation,<\/li>\n\n\n\n<li>groundbreaking developments in sandboxing,<\/li>\n\n\n\n<li>groundbreaking developments in Satisfiability Modulo Theory (SMT) solvers,<\/li>\n\n\n\n<li>groundbreaking developments in GenAI,<\/li>\n\n\n\n<li>demonopolisation and democratisation of reverse engineering tools aka a lot more tools available in general, and even if some are still commercial, they are often cheaper, and many that are free &#8212; are literally game changers, and generally speaking&#8230; the tooling today is far more accessible than it was 20 years ago,<\/li>\n\n\n\n<li>emergence of many advanced (and often free) mature malware-oriented sandboxing, hooking and emulation toolkits,<\/li>\n\n\n\n<li>development of many free tools\/techniques enables us to decompile, debundle many installers or compiled scripts,<\/li>\n\n\n\n<li>software (including malware) developers walking away from protectors, packers and wrappers of yesterday &#8211; today it&#8217;s often no longer worth it,<\/li>\n\n\n\n<li>emergence of tools like Detect It Easy, Yara\/Yara-X, Capa, Floss, Bulk Extractor, and many forensic tools that allow us to perform a lot of file format-parsing tasks associated with preliminary static sample analysis focused on &#8216;low hanging fruits&#8217; like:\n<ul>\n<li>reputational checks, signed binary checks,<\/li>\n\n\n\n<li>determining the file format very precisely,<\/li>\n\n\n\n<li>automated feature\/functionality discovery\/extraction\/classification,<\/li>\n\n\n\n<li>automatic payload decryption\/extraction,<\/li>\n\n\n\n<li>automatic config decryption\/extraction,<\/li>\n\n\n\n<li>full metadata parsing\/extraction,<\/li>\n\n\n\n<li>extraction of strings of interest hidden inside the code that in the past we could only find via dynamic analysis (f.ex. on stack), and of course,<\/li>\n\n\n\n<li>large and rich libraries of yara rules help to immediately identify malware sample&#8217;s family if it has been already classified before,<\/li>\n\n\n\n<li>older programming languages like Visual Basic, Delphi, C, C++ are now replaced by Go, Rust, Python, .NET, Windows Apps, Electron Apps,<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>emergence of SaaS and software delivered via browser only,<\/li>\n\n\n\n<li>disabling OS \/ Software features by default helped to kill many attack vectors (macros, autorun.inf, etc.),<\/li>\n\n\n\n<li>decreasing importance of email &#8211; it got replaced by IM software with rich features,<\/li>\n\n\n\n<li>lots of new operating systems, new CPUs, and new architectures expanded the scope, and made Windows less important,<\/li>\n\n\n\n<li>jailbreaking scene,<\/li>\n\n\n\n<li>0day\/vulnerability discovery scene,<\/li>\n\n\n\n<li>lolbins, RMMs and a wave of TTPs that focus on blending in with the environment,<\/li>\n\n\n\n<li>advances in EDR-based detections,<\/li>\n\n\n\n<li>advances in decoy-based detections,<\/li>\n\n\n\n<li>lots of new protections built-in into browsers and file readers\/viewers prevent old drive-by attacks,<\/li>\n\n\n\n<li>smartphones and tablets taking over from desktop computers and laptops for many daily tasks,<\/li>\n\n\n\n<li>0days moving from endpoints to IoT, appliances, mobile devices,<\/li>\n\n\n\n<li>security focus moving from an endpoint attack surface to identity solutions,<\/li>\n\n\n\n<li>platformisation and a global move from &#8216;build&#8217; to &#8216;buy&#8217; lowered the bar for cybersecurity skills required to do the job,<\/li>\n\n\n\n<li>etc.<\/li>\n<\/ul>\n\n\n\n<p>In 2010 malware analysts&#8217; skills were measured by the knowledge of debuggers, disassemblers, file formats, packers, etc. Now&#8230; we are in 2025 and let&#8217;s be honest&#8230; malware analysis process of today usually starts with a submission of a sample to a sandbox \/ sample analysis portal. And, sadly, it very often ends there!<\/p>\n\n\n\n<p>This is where this post begins.<\/p>\n\n\n\n<p>I am quite surprised that many automated malware analysis solutions do not process samples statically very well. They do not do in-depth file format analysis, they do not recognize corrupted files well, and often offer a false sense of security\/value by offering a CLEAN verdict for files that simply need more &#8230;. reversing love.<\/p>\n\n\n\n<p>See the below example.<\/p>\n\n\n\n<p>I took Notepad.exe from Win10, truncated it with a hex editor, and then submitted it to a few online file analysis services. I am happy that some of them immediately marked the file as <em>corrupt<\/em>ed, but it didn&#8217;t stop them from running a full-blown dynamic analysis session on the file I submitted. And in terms of static analysis, some solutions went as far as to report lots of findings related to anti-reversing techniques, cryptography, and lots of far-fetched conclusions that are nonsensical in a context of a) a corrupted file, b) Notepad program (clearly non-malicious), and are simply not a true reflection of reality. <\/p>\n\n\n\n<p>I kid you not, but a truncated notepad sample that will never execute was marked as<\/p>\n\n\n\n<ul>\n<li>a program that can enumerate processes (because it references <em>NtQuerySystemInformation<\/em> function that is actually used by <a href=\"https:\/\/stackoverflow.com\/questions\/40643965\/what-is-microsoft-warbird-in-compiler-of-vs2015\">warbird<\/a> protection that invokes this API with a <em>SystemThrottleNotificationInformation<\/em>\/<em>SystemPolicyInformation<\/em> parameter), <\/li>\n\n\n\n<li>a program that accepts drag &amp; drop operations (true), <\/li>\n\n\n\n<li>a program that has an ability to take screenshots (just because it references a <em>CreateDC <\/em>API function), which is not true,<\/li>\n\n\n\n<li>and so on and so forth.<\/li>\n<\/ul>\n\n\n\n<p>Let&#8217;s be clear &#8211; mapping presence of APIs in the sample&#8217;s import table or as a string referencing API name found in a sample&#8217;s body to actual &#8216;threats&#8217; or TTPs is an absurdity that is omnipresent in sandbox reports today and should be corrected asap. This could have worked in 2010, but today these sort of &#8216;determinations&#8217; must be seen as poor indicators.<\/p>\n\n\n\n<p>And as an analyst, I&#8217;d actually like to see why the sample was marked as <em>corrupt<\/em>ed. I&#8217;d also like to see the context of the far-fetched API-matching claims as well. You can&#8217;t list many Windows API in a negative context (like f.ex. <em>CreateDC<\/em> that notepad uses for&#8230; printing) unless you really can prove that it is indeed present in the code to deliver some malicious functionality&#8230; It strikes us as an over-simplistic approach that is focused more on the quantity of the findings than the overall quality of the report.<\/p>\n\n\n\n<p>This is where old-school reversing comes in.<\/p>\n\n\n\n<p>A long time ago I wrote my own PE file parser that I always run on all PE samples that I analyze, first. Because I wrote it, I fully control what it tells me, and since I used this tool to analyze many files over the years, corrected it on many occasions, learned a lot about PE file format intricacies on the way, and I have incorporated a lot of PE file format checks into it.<\/p>\n\n\n\n<p>Running it on my truncated Notepad sample I immediately get many red flags:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">(Raw Offset + Raw size of '.data '=0002EC00&gt;filesize=0002DE00\n(Offset to Raw size of '.pdata '=0002EC00&gt;filesize=0002DE00\n(Offset to Raw size of '.didat '=0002FE00&gt;filesize=0002DE00\n(Offset to Raw size of '.rsrc '=00030000&gt;filesize=0002DE00\n(Offset to Raw size of '.reloc '=00030C00&gt;filesize=0002DE00\n(wrong appdata ofs\/size=0002EC00,00000000)\n(.rsrc File Offset 00030000 &lt;&gt; DataDirectoryResourceOffset = 00000000<\/pre>\n\n\n\n<p>Seeing this kind of result immediately alters the way I do my sample analysis:<\/p>\n\n\n\n<ul>\n<li>I, for sure, can&#8217;t run\/test\/debug\/analyze it.<\/li>\n\n\n\n<li>I, for sure, can&#8217;t trust any sandbox report generated for this sample.<\/li>\n\n\n\n<li>I may need to ask about the source of the file.<\/li>\n<\/ul>\n\n\n\n<p>My point is&#8230; if we want to sandbox\/automate sample analysis, let&#8217;s do it in a smarter way. File format parsing is an extremely complex topic. If you look at Detect It Easy program&#8217;s <a href=\"https:\/\/github.com\/horsicq\/Detect-It-Easy\/tree\/master\/db\">data base<\/a>, you will find a huuuuge number of file-typing routines that try to analyze various file types and return the best verdict possible.<\/p>\n\n\n\n<p>So what can we do today?<\/p>\n\n\n\n<p>Ask Sandbox vendors to do a more thorough static analysis that check file&#8217;s basic properties and at the most basic level, verifies if we have enough data in a submitted file to cover all the sections listed in a PE header&#8230;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This series is quite old, and I kinda abandoned it at some stage, but today I am reviving it to talk about &#8230; static analysis&#8230; Let&#8217;s be honest &#8211; last 2 decades changed the way we do malware analysis, and &hellip; <a href=\"https:\/\/www.hexacorn.com\/blog\/2025\/09\/19\/enter-sandbox-30-static-analysis-gone-wrong\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[41],"tags":[],"_links":{"self":[{"href":"https:\/\/www.hexacorn.com\/blog\/wp-json\/wp\/v2\/posts\/10140"}],"collection":[{"href":"https:\/\/www.hexacorn.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.hexacorn.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.hexacorn.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.hexacorn.com\/blog\/wp-json\/wp\/v2\/comments?post=10140"}],"version-history":[{"count":14,"href":"https:\/\/www.hexacorn.com\/blog\/wp-json\/wp\/v2\/posts\/10140\/revisions"}],"predecessor-version":[{"id":10159,"href":"https:\/\/www.hexacorn.com\/blog\/wp-json\/wp\/v2\/posts\/10140\/revisions\/10159"}],"wp:attachment":[{"href":"https:\/\/www.hexacorn.com\/blog\/wp-json\/wp\/v2\/media?parent=10140"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.hexacorn.com\/blog\/wp-json\/wp\/v2\/categories?post=10140"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.hexacorn.com\/blog\/wp-json\/wp\/v2\/tags?post=10140"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}