{"id":7914,"date":"2022-01-16T15:50:25","date_gmt":"2022-01-16T15:50:25","guid":{"rendered":"https:\/\/www.hexacorn.com\/blog\/?p=7914"},"modified":"2022-01-16T15:50:25","modified_gmt":"2022-01-16T15:50:25","slug":"yara-carpet-bomber","status":"publish","type":"post","link":"https:\/\/www.hexacorn.com\/blog\/2022\/01\/16\/yara-carpet-bomber\/","title":{"rendered":"Yara Carpet Bomber"},"content":{"rendered":"\n<p>A lot of people are sharing their Yara creation (look for <a href=\"https:\/\/twitter.com\/search?q=%23100DaysofYARA\">#100DaysofYARA<\/a> tag on Twitter), so I thought I will share a bit too. <\/p>\n\n\n\n<p>This is a very unusual way of using Yara and I hope you will find it interesting.<\/p>\n\n\n\n<p>When we think of Yara rules we usually have very specific cluster of strings in mind &#8211; formed by be it an API, a debug string, a snippet of code, etc. What if instead we used yara to scan files for much large sets of strings? While it may sound counterintuitive, Yara is really very well prepared to do &#8220;carpet bombing&#8221; string scans on target files. It&#8217;s actually super fast and efficient. <\/p>\n\n\n\n<p>Let&#8217;s have a look at an example.<\/p>\n\n\n\n<p>Imagine that you want to find all English words inside a file. I choose &#8220;English&#8221; because it&#8217;s easy to demo, but you could use any other language really. The traditional approach would rely on running the &#8220;strings&#8221; tool over the target file and then manually combing through the results, cherry-picking words that &#8220;look&#8221; English. For other languages you may need a localized version of &#8220;strings&#8221; tool (e.g. my old tool <a href=\"https:\/\/www.hexacorn.com\/blog\/2012\/11\/18\/hstrings-release-when-all-strings-are-attached\/\" data-type=\"post\" data-id=\"1498\">hstrings<\/a> could help), but the principle is the same. In some cases you could also apply knowledge of file structure so that could extract some of the strings &#8216;natively&#8217; (e.g. from resources in PE file).<\/p>\n\n\n\n<p>We can also approach it from a different angle. We will build a list of all English words and then search for them in the file. All at once. There are obvious caveats &#8211; we can never sure we have a list of all English words e.g. gobbledygook or ragamuffin may not be on the list, and short words will certainly be causing a lot of False Positives, but it&#8217;s just a POC of an idea.<\/p>\n\n\n\n<p>So, we find a random English words <a href=\"https:\/\/raw.githubusercontent.com\/dwyl\/english-words\/master\/words.txt\">list<\/a>. We write a small script to extract all 6+ character long strings and exclude strings starting with digits and we then convert it into a set of Yara rules. Yara accepts up to 10K strings per rule so we have to split the dictionary into multiple rules.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"> my $cnt=0;\n my $n=0;\n while (&lt;>)\n {\n     s\/[\\r\\n]+\/\/g;\n     next if length($_)&lt;6;\n     next if \/^[0-9]\/;\n     s\/\\\"\/\\\\\"\/g;\n if ($n==0)\n  {\n   print \"\n rule \".sprintf(\"eng_%04d\", $cnt).\"\n {\n  strings:\n \";\n   }\n     print \"\\$ = \\\"$_\\\" ascii wide nocase\\n\";\n     $n++;\n     if ($n>9999)\n     {\n       $cnt++;\n       $n=0;\n     print \"\n        condition:\n          any of them\n     }\n     \";\n     }\n }\n print \"condition:\n         any of them\n }\n \";<\/pre>\n\n\n\n<p>The resulting rules can be saved into <em>eng.yar<\/em> file and then compiled with yarac to <em>eng.yac<\/em>:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">yarac eng.yar eng.yac<\/pre>\n\n\n\n<p>We will get a lot of warnings about the rule slowing down the scanning, but who cares \ud83d\ude42<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"> warning: rule \"eng_0000\" in eng.yar(10008): rule is slowing down scanning\n warning: rule \"eng_0001\" in eng.yar(20016): rule is slowing down scanning\n warning: rule \"eng_0002\" in eng.yar(30024): rule is slowing down scanning\n warning: rule \"eng_0003\" in eng.yar(40032): rule is slowing down scanning\n warning: rule \"eng_0004\" in eng.yar(50040): rule is slowing down scanning\n ...<\/pre>\n\n\n\n<p>Note, the resulting file is gigantic &#8211; ~600MB in size. You can reduce is by mingling with &#8220;ascii wide nocase&#8221; sets (if you exclude them, the file will be only ~70MB). <\/p>\n\n\n\n<p>We can now use the rules on e.g. Notepad:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">yara -s -C eng.yac c:\\windows\\notepad.exe<\/pre>\n\n\n\n<p>-s &#8211; will extract strings<br>-C &#8211; will tell yara the rules are compiled<\/p>\n\n\n\n<p>The results will look like this:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">eng_0000 c:\\windows\\notepad.exe\n 0x280b5:$: Accelerator\n 0x2822a:$: Accelerator\n 0x2822a:$: Accelerators\n 0x26f00:$: Accept\n 0x2b9bd:$: Access\n 0x2862e:$: Acquire\n 0x286f6:$: Acquire\n 0x2862e:$: AcquireS\n 0x286f6:$: AcquireS\n 0x28df9:$: Activation\n 0x27faf:$: Active\n 0x28e04:$: actory\n 0x286d9:$: Address\n 0x289c5:$: alLock\n 0x28b68:$: alLock\n eng_0001 c:\\windows\\notepad.exe\n 0x25050:$: A\\x00p\\x00p\\x00l\\x00i\\x00c\\x00a\\x00t\\x00i\\x00o\\x00n\\x00\n 0x25260:$: A\\x00p\\x00p\\x00l\\x00i\\x00c\\x00a\\x00t\\x00i\\x00o\\x00n\\x00\n 0x2ba0f:$: application\n 0x2baf4:$: application\n eng_0002 c:\\windows\\notepad.exe\n 0x2b75e:$: Archit\n 0x2b88f:$: Archit\n 0x2b75e:$: Architect\n 0x2b88f:$: Architect\n 0x2b75e:$: Architecture\n 0x2b88f:$: Architecture\n 0x227ba:$: A\\x00r\\x00o\\x00u\\x00n\\x00d\\x00\n 0x2b6c8:$: assembl\n 0x2b713:$: assembl\n 0x2b7e5:$: Assembl\n 0x2b7f9:$: assembl\n [...]<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>A lot of people are sharing their Yara creation (look for #100DaysofYARA tag on Twitter), so I thought I will share a bit too. This is a very unusual way of using Yara and I hope you will find it &hellip; <a href=\"https:\/\/www.hexacorn.com\/blog\/2022\/01\/16\/yara-carpet-bomber\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[83],"tags":[],"_links":{"self":[{"href":"https:\/\/www.hexacorn.com\/blog\/wp-json\/wp\/v2\/posts\/7914"}],"collection":[{"href":"https:\/\/www.hexacorn.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.hexacorn.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.hexacorn.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.hexacorn.com\/blog\/wp-json\/wp\/v2\/comments?post=7914"}],"version-history":[{"count":1,"href":"https:\/\/www.hexacorn.com\/blog\/wp-json\/wp\/v2\/posts\/7914\/revisions"}],"predecessor-version":[{"id":7915,"href":"https:\/\/www.hexacorn.com\/blog\/wp-json\/wp\/v2\/posts\/7914\/revisions\/7915"}],"wp:attachment":[{"href":"https:\/\/www.hexacorn.com\/blog\/wp-json\/wp\/v2\/media?parent=7914"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.hexacorn.com\/blog\/wp-json\/wp\/v2\/categories?post=7914"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.hexacorn.com\/blog\/wp-json\/wp\/v2\/tags?post=7914"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}