{"id":9279,"date":"2024-07-14T00:08:16","date_gmt":"2024-07-14T00:08:16","guid":{"rendered":"https:\/\/www.hexacorn.com\/blog\/?p=9279"},"modified":"2024-07-14T00:15:08","modified_gmt":"2024-07-14T00:15:08","slug":"high-fidelity-detections-are-low-fidelity-detections-until-proven-otherwise","status":"publish","type":"post","link":"https:\/\/www.hexacorn.com\/blog\/2024\/07\/14\/high-fidelity-detections-are-low-fidelity-detections-until-proven-otherwise\/","title":{"rendered":"High Fidelity detections are Low Fidelity detections, until proven otherwise"},"content":{"rendered":"\n<p>A few days ago <a href=\"https:\/\/x.com\/nas_bench\">Nas<\/a> kicked off an interesting <a href=\"https:\/\/x.com\/nas_bench\/status\/1808814344286740894\">discussion<\/a> on Xitter about detections&#8217; quality. I liked it, so I offered my personal <a href=\"https:\/\/x.com\/Hexacorn\/status\/1808822427222184408\">insight<\/a>. I then added a stupid <a href=\"https:\/\/x.com\/Hexacorn\/status\/1808910092584050690\">example<\/a> to illustrate my point to which <a href=\"https:\/\/x.com\/DylanInfosec\">DylanInfosec<\/a> <a href=\"https:\/\/x.com\/DylanInfosec\/status\/1808934236285489313\">replied<\/a>:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">Would love to set some time aside and gather some OS log dumps, throw em in a SIEM and test that way or something. I guess crowd validation with a trusted diverse group could work too. Not-for-profit or anything but just to share with the community<\/pre>\n\n\n\n<p>This made me think&#8230;<\/p>\n\n\n\n<p>I am an old-school data hoarder; as far as I remember I have always been actively looking for data of interest in a lot of places&#8230; And I must confess that the only reason I could immediately provide that stupid mimi-based regex filename search example was because I had an access to my private &#8216;clean&#8217; file names dataset&#8230;<\/p>\n\n\n\n<p>You see&#8230; over a decade ago I kicked off a personal project of mine that focused on collecting software data from CLEAN sources. While many people in the cybersecurity industry at that time primarily focused on malware collections, I decided to take a step forward and collect data that was most likely clean. So, I wrote a number of web scrapers, downloaders, used VPN and Tor where necessary and eventually built a large data set of samples that is a a collection of (most likely) clean files downloaded from publicly available sources. I didn&#8217;t stop there. I took every single sample that I downloaded and got it decompiled, whenever it was possible&#8230; then processed all the decompiled files only to build a modern, full-blown, Windows-centric clean software data collection set that I believed at that time to be far better than <a href=\"https:\/\/www.hexacorn.com\/blog\/2023\/09\/16\/analysing-nsrl-data-set-for-fun-and-because-curious-part-3\/\" data-type=\"post\" data-id=\"8715\">NIST&#8217;s<\/a>. <\/p>\n\n\n\n<p>Now, it&#8217;s been a few years and this set is getting older and older, every single day, so perhaps it&#8217;s time for it to win some brownie points in the community&#8230;<\/p>\n\n\n\n<p>Many of our threat hunting rules depend on file names. The file I am attaching to this post includes a list of many PE file names in my collection that are known to be &#8216;clean&#8217; (to be precise, these are all file names ending with the following file extensions: &#8216;exe&#8217;, &#8216;dll&#8217;, &#8216;drv&#8217;, &#8216;ocx&#8217;, &#8216;sys&#8217;). It goes without saying that you must treat this list as very suspicious, but I hope it will help you to write better detections&#8230;<\/p>\n\n\n\n<p><a href=\"https:\/\/hexacorn.com\/d\/_files_of_interest.zip\">_files_of_interest.su.zip<\/a><\/p>\n\n\n\n<p>And to illustrate the point, let&#8217;s run a query that is similar to the one I did for my tweet:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">rg -i \"mimi.*?\\.(dll|exe|sys)\" _files_of_interest.su<\/pre>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/www.hexacorn.com\/blog\/wp-content\/uploads\/2024\/07\/mimi1.png\"><img decoding=\"async\" loading=\"lazy\" width=\"344\" height=\"182\" src=\"https:\/\/www.hexacorn.com\/blog\/wp-content\/uploads\/2024\/07\/mimi1.png\" alt=\"\" class=\"wp-image-9309\" srcset=\"https:\/\/www.hexacorn.com\/blog\/wp-content\/uploads\/2024\/07\/mimi1.png 344w, https:\/\/www.hexacorn.com\/blog\/wp-content\/uploads\/2024\/07\/mimi1-300x159.png 300w\" sizes=\"(max-width: 344px) 100vw, 344px\" \/><\/a><\/figure>\n\n\n\n<p>Note: you can&#8217;t use the _files_of_interest.zip\/_files_of_interest.su files for commercial purposes.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A few days ago Nas kicked off an interesting discussion on Xitter about detections&#8217; quality. I liked it, so I offered my personal insight. I then added a stupid example to illustrate my point to which DylanInfosec replied: Would love &hellip; <a href=\"https:\/\/www.hexacorn.com\/blog\/2024\/07\/14\/high-fidelity-detections-are-low-fidelity-detections-until-proven-otherwise\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[53,39,79],"tags":[],"_links":{"self":[{"href":"https:\/\/www.hexacorn.com\/blog\/wp-json\/wp\/v2\/posts\/9279"}],"collection":[{"href":"https:\/\/www.hexacorn.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.hexacorn.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.hexacorn.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.hexacorn.com\/blog\/wp-json\/wp\/v2\/comments?post=9279"}],"version-history":[{"count":8,"href":"https:\/\/www.hexacorn.com\/blog\/wp-json\/wp\/v2\/posts\/9279\/revisions"}],"predecessor-version":[{"id":9313,"href":"https:\/\/www.hexacorn.com\/blog\/wp-json\/wp\/v2\/posts\/9279\/revisions\/9313"}],"wp:attachment":[{"href":"https:\/\/www.hexacorn.com\/blog\/wp-json\/wp\/v2\/media?parent=9279"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.hexacorn.com\/blog\/wp-json\/wp\/v2\/categories?post=9279"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.hexacorn.com\/blog\/wp-json\/wp\/v2\/tags?post=9279"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}