Analysing NSRL data set for fun and because… curious, Part 3

Nearly two years ago I published a quick summary of my analysis of NSRL data. I believe I was the first one to publicly evaluate this data set, and I still stand by the harsh conclusions I reached back then, today. And what makes me really happy about that 2 year old analysis is a small ripple effect that my posts caused…

I really loved this DFIR science follow-up post – not only Joshua followed my steps and delivered some nice data crunching on the NSRL core dataset to confirm/disprove my findings and hypothesis – he also did some actual benchmarking! I think the results of his experiment prove beyond any doubt that when you blindly do garbage in, there will for sure be garbage out. Also known as: you can use NSRL data better. And then Joshua published his Efficient-NSRL tool as well. So, if you use NSRL set in your investigations, you will benefit from taking a look at my older posts, Joshua’s post, and his Efficient-NSRL tool…

Two years later…

The NSRL data set has changed a lot since 2021, so it’s only natural to come back to its recent incarnation to see what has changed…

The first notable change is that the NSRL data is now distributed as a SQLite3 database only. The schema of the database is available and you can find it inside files named like this:

  • RDS_2023.03.1_modern.schema.sql
  • RDS_2023.06.1_modern_minimal.schema.sql

To create a textual equivalent of the old NSRLFile.txt file one has to follow the recipe provided inside this PDF. Which, of course doesn’t work, because the already-present FILE view (inside the RDS_2023.03.1_modern.db) does not include the crc32 column/field… but we can fix that easily. We just create a new VIEW called FILE2 that includes that missing CRC32 column/field:

CREATE VIEW FILE2 AS
    SELECT
        UPPER(md.sha256) AS sha256,
        UPPER(md.sha1) AS sha1,
        UPPER(md.md5) AS md5,
        UPPER(md.crc32) AS crc32,
        CASE md.extension
        WHEN ''
                THEN md.file_name
                ELSE md.file_name||'.'||md.extension
            END AS file_name,
        md.bytes AS file_size,
        po.package_id
    FROM
        METADATA AS md,
        PACKAGE_OBJECT AS po
    WHERE
        md.object_id = po.object_id

and then we run the export query using a FILE2 view:

DROP TABLE IF EXISTS EXPORT;
CREATE TABLE EXPORT AS SELECT sha1, md5, crc32, file_name, file_size, package_id FROM FILE2;
UPDATE EXPORT SET file_name = REPLACE(file_name, '"', '');
.mode csv
.headers off
.output output.txt
SELECT '"' || sha1 || '"', '"' || md5 || '"', '"' || crc32 || '"', '"' || file_name || '"', file_size,
package_id, '"' || 0 || '"', '"' || '"' FROM EXPORT ORDER BY sha1;

or, if we just want file names:

.output filenames.txt
SELECT file_name FROM EXPORT;

These filenames can be then sorted, counted, etc.

There is a lot more file names in the new set, that’s for sure. It went from 16512841 unique file names I observed in a 2021 set to 23676133 in Jan 2023. Still, lots of it is not that useful, because the actual benign (‘good’) source files are being pushed around, their logical chunks carved out, their sections and class files extracted, etc. – same as before, the most frequent ‘file names’ are PE file section names, MSI table names, Java files, etc… And if you missed the memo, hashes of these logical ‘chunks’ are not very useful as you will never find their binary equivalents present on any file system, except for the ‘worker’ NSRL system(s). Unless your forensic suite can apply hashes to PE file sections, MSI tables, .jar class files – all these ‘partial’ hashes are useless when it comes to ‘mark file as a good, NSRL known file’.

The stats for the top file names are now as follows (for RDS_2023.03.1_modern.db):

  • 9081226 1
  • 7850139 .text
  • 5933107 .reloc
  • 5086051 .data
  • 3634652 version.txt
  • 3101066 .rdata
  • 2923472 CERTIFICATE
  • 2784502 __LINKEDIT
  • 2784113 __TEXT__text
  • 2758779 __TEXT__cstring
  • 2735742 __DATA__data
  • 2718505 __DATA__bss
  • 2667173 __TEXT__const
  • 2629651 __DATA__const
  • 2588460 __DATA__common
  • 2437056 __DATA__mod_init_func
  • 2187040 __DATA__mod_term_func
  • 2164991 __DWARF__debug_abbrev
  • 2164534 __DWARF__debug_line
  • 2164534 __DWARF__debug_info
  • 2164532 __DWARF__debug_aranges
  • 2163269 __DWARF__debug_pubnames
  • 2163268 __DWARF__debug_pubtypes
  • 2162599 __DWARF__debug_str
  • 2162336 __DWARF__debug_frame
  • 2161990 __DWARF__debug_loc
  • 2161722 __DWARF__debug_ranges
  • 2159803 __DWARF__apple_objc
  • 2159803 __DWARF__apple_namespac
  • 2159800 __DWARF__apple_types
  • 2159800 __DWARF__apple_names
  • 2158643 __DWARF__debug_inlined
  • 2157348 __HIB__common
  • 2157348 __HIB__bss
  • 2157347 __KLD__bss
  • 2157346 __HIB__const
  • 2157345 __KLD__cstring

We must admit that it’s s hardly useful.

Having said that, you may be surprised that I still like this dataset a lot, and would still recommend using the NSRL set in your investigations, even if you use it blindly. Yes, it’s not ideal, it may cause your forensic boxes some extra CPU cycles, but it’s at least something. And it’s out there, for free. I also respect the efforts a lot, because a few years ago I made a conscious decision to create a competitive set to NSRL and now I do know now how hard it is…

The bottom line is: know and use all available data sets and tools. Just apply them wisely.

DeXRAY, DFIR, and the art of ambulance chasing…

Pretty much all of my DeXRAY posts ever published been focusing on new versions of this tool being released. Today I will talk about the ‘making of the sausages’ part of this process, aka how DeXRAY came to be.

If you have been working in a DFIR space for more than a decade you probably already know that any type of high-fidelity evidence found on an endpoint is gold, and Quarantine folders/files are one of the best in this category… These are locations where security software stores intercepted/blocked/quarantined files. Before the strong adoption of NextGen, EDR the AV products used to catch many malware files ‘just in time’, then encrypt their content, often move them to a ‘special safe location’, and delete them from their original location. And yes, these encrypted files (most of the time) can be decrypted by DeXRAY….

Despite this informative intro you may still ask… why do we even need talk about Quarantine files and folders today?

First of all, I believe not every DFIR analyst is aware of these file system locations. And as the time goes by, probably less and less of them, as well. It’s a knowledge of the past, after all. Moreso, in a world of ever-changing landscape that is affecting not only the actual threats, but also security solutions, it’s not uncommon for the following events to occur:

  • multiple security solutions installed on the same host, plus
    • installation of one, doesn’t imply the older one was (fully) uninstalled! that is, there may be remnants of the old security solution still present on the system, not only the program binaries, configuration, but also quarantined files!
  • different polices used by these security solutions may cause interesting interference (f.ex. exclusion policies for directories/files in one may suppress some detections, but still trigger other detections in another solution)
  • some DFIR analysts can actually miss an opportunity to discover these existing quarantined files, because they simply don’t know about them!

So, if you want to improve your chances of detecting something interesting on the endpoint you investigate, this post is for you.

And yes, we are ambulance chasing, but for a good reason! Discovering that someone else (meaning: some other software) had discovered something before us is actually NOT A BAD THING. I would go as far as to say that while discovering and analysing quarantined files is being a bit of cheating, it may actually cut down a lot of analysis time in some cases. And in the DFIR world, time is really of essence.

The ambulance chasing rule #1 is that when you process your evidence, make sure you pay attention to these low-hanging fruits and nuggets…

Before I go into gore details, let me digress to deliver a personal rant: analysing paths where security software stores its quarantined files is not easy in 2010s/2020s. It requires a lot of patience, plus some more. The security solutions of ‘today’ migrated away from the golden era of 90s/2000s. Big time. Yup, while in the past you would download the software and just install it, today you can’t install anything w/o creating an online account at least, and/or (pre- or) paying for a subscription, even if just for a test period (credit card authorizations). So, if you want to try yourself – you have been warned: I went through hell of doing it for many security solutions and do not recommend. For realz, you are going to be exposed to a lot of b/s and ‘I really don’t wanna do it’ equilibristics. Plus, some solutions use consoles that are no longer present on the client side (endpoint) either, and have been moved to the server-side, so you will actually need these b/s online accounts — yes, the temp emails, phone numbers won’t cut it. Let me be blunt and say it’s actually quite an experience to install many of the security software packages of today w/o getting seriously pissed off… Now, imagine you are that damsel in distress, you know nothing about security, but you suspect you got hit by some malware/hacking attacks and want to purchase a security solution to help you with your problem. I am feeling very very sorry for you in 2023… Anyway… this is the end of the rant 🙂

The good news is that from a forensic investigators’ perspective, these solutions have already been (pre)installed on the systems you analyze. As such, we just need to find these quarantined folders/files!

Here are the rules:

  • If part of the directory / folder refers to ‘/.*?Quarantine/’ — check it!
  • If part of the directory / folder refers to ‘/chest/’ — check it!
  • If part of the directory / folder refers to ‘/QB/’ — check it!
  • If part of the directory / folder refers to ‘/Infected/’ — check it!
  • If part of the directory / folder refers to ‘/Backup/’ — check it!
  • If part of the directory / folder refers to ‘/$360Section/’ — check it!
  • If part of the directory / folder refers to ‘/fq/’ — check it!
  • If part of the directory / folder refers to ‘/qv/’ — check it!
  • If part of the directory / folder refers to ‘/Jail/’ — check it!
  • If part of the directory / folder refers to ‘/Safestore/’ — check it!
  • if the file extension is one of these
    • ‘.v3b’, ‘.eqf’, ‘.qua’, ‘.qv’, ‘.bdq’, ‘.q’, ‘.cmc’, ‘.vir’, ‘.ifc’, ‘.nqf’, ‘.tmp’ (with a header ‘KSS’), ‘.klq’, ‘.qnt’, ‘.bin’ (with a file name being a hash), ‘.lqf’, ‘.quar’, ‘.data’, ‘.bup’, ‘.mal’, ‘.exv’, ‘.dlv’, ‘.virus’, ‘.infected’, ‘.malware’, ‘.suspicious’, ‘.sdb’, ‘.qbd’, ‘.qbi’, ‘.idx’, ‘.qtn’, ‘.vbn’, ‘quarantine.db’ — check it !!!

I’d love to say – you see? it’s that simple. Yet, I know it is not. Still… happy ambulance chasing!

There you have it. It was that easy.