{"id":6013,"date":"2019-03-03T00:14:53","date_gmt":"2019-03-03T00:14:53","guid":{"rendered":"http:\/\/www.hexacorn.com\/blog\/?p=6013"},"modified":"2019-03-03T22:57:15","modified_gmt":"2019-03-03T22:57:15","slug":"extracting-and-parsing-pe-signatures-en-masse","status":"publish","type":"post","link":"https:\/\/www.hexacorn.com\/blog\/2019\/03\/03\/extracting-and-parsing-pe-signatures-en-masse\/","title":{"rendered":"Extracting and Parsing PE signatures en masse"},"content":{"rendered":"\n<p>A few years back I was dealing with a large corpora of PE files, and many of them were PUA\/Adware installers. Most of these were signed, so I thought it would be cool to automate writing  yara sigs based on these PE signatures. So I did, and it helped me a lot with dividing the whole sampleset into clusters. I could then just exclude (a.k.a. delete) the uninteresting clusters of installers, and remove them from a scope of my further analysis.<\/p>\n\n\n\n<p>Today someone reminded me of this project, and I thought I will jot down some notes + share the yara sig I generated at that time. I believe in automation a lot, and hope this will be useful to someone facing similar problems.<\/p>\n\n\n\n<p>To extract signatures from a PE file, one can use the <a href=\"https:\/\/blog.didierstevens.com\/programs\/disitool\/\">disitool.py<\/a> from Didier Stevens. Once we extract it, we can analyze it. The problem is that: <\/p>\n\n\n\n<ul><li>the extracted signature is in a binary form<\/li><li>parsing it is non-trivial, so we need to use existing tools to do so for us<\/li><\/ul>\n\n\n\n<p>After googling around, I eventually learned how to do it &amp; wrote a simple batch file that I delegated this unpleasant task to. The batch file takes a name of a PE file from a command line, and extracts the binary signature using disitool.py, and then parses it&#8230; in 3 different ways.<\/p>\n\n\n\n<p>This is the batch file:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">disitool.py extract \"%1\" \"%1.cert\"<br>if exist \"%1.cert\" (<br> openssl asn1parse -inform DER -i -in \"%1.cert\" &gt; \"%1.cert.asn\"<br> openssl pkcs7 -inform DER -in \"%1.cert\" -text -print_certs &gt; \"%1.cert.asn2\"<br> certutil -asn \"%1.cert\" &gt; \"%1.cert.asn3\"<br>)<\/pre>\n\n\n\n<p>You may notice that I am using both openssl \/ certutil. Why double, or even triple the effort? This is because I discovered that relying on data extracted by only one tool was not enough. To be frank, I don&#8217;t know the intricate details of what is exactly stored inside the actual Authenticode signature, and how. <a href=\"https:\/\/en.wikipedia.org\/wiki\/Abstract_Syntax_Notation_One\">The ASN format<\/a> is not a pillow read either, hence I went with a ROI-driven approach and simply extracted the data in any possible way and format.<\/p>\n\n\n\n<p>With that, I ran it over a corpora of samples. I then used a quick &amp; dirty parser I wrote for the data outputted by these two tools, and generated a yara sig that covered most of the installers in the corpora.<\/p>\n\n\n\n<p>You can download the Yara Sig file <a href=\"https:\/\/hexacorn.com\/d\/pua_installers_sigs.yar\">here<\/a>. Note, I saved it as Unicode, so you can see localization issues one needs to take into account while parsing sigs.<\/p>\n\n\n\n<p>Feel free to use it, but only on your own risk. I don&#8217;t guarantee that it&#8217;s error free. Also, if you are listed in the sig file, it&#8217;s only for purposes of samples&#8217; clustering. <\/p>\n","protected":false},"excerpt":{"rendered":"<p>A few years back I was dealing with a large corpora of PE files, and many of them were PUA\/Adware installers. Most of these were signed, so I thought it would be cool to automate writing yara sigs based on &hellip; <a href=\"https:\/\/www.hexacorn.com\/blog\/2019\/03\/03\/extracting-and-parsing-pe-signatures-en-masse\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[39,9,83],"tags":[],"_links":{"self":[{"href":"https:\/\/www.hexacorn.com\/blog\/wp-json\/wp\/v2\/posts\/6013"}],"collection":[{"href":"https:\/\/www.hexacorn.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.hexacorn.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.hexacorn.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.hexacorn.com\/blog\/wp-json\/wp\/v2\/comments?post=6013"}],"version-history":[{"count":11,"href":"https:\/\/www.hexacorn.com\/blog\/wp-json\/wp\/v2\/posts\/6013\/revisions"}],"predecessor-version":[{"id":6026,"href":"https:\/\/www.hexacorn.com\/blog\/wp-json\/wp\/v2\/posts\/6013\/revisions\/6026"}],"wp:attachment":[{"href":"https:\/\/www.hexacorn.com\/blog\/wp-json\/wp\/v2\/media?parent=6013"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.hexacorn.com\/blog\/wp-json\/wp\/v2\/categories?post=6013"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.hexacorn.com\/blog\/wp-json\/wp\/v2\/tags?post=6013"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}