{"id":3441,"date":"2015-12-21T15:27:32","date_gmt":"2015-12-21T15:27:32","guid":{"rendered":"http:\/\/www.hexacorn.com\/blog\/?p=3441"},"modified":"2019-07-04T23:08:27","modified_gmt":"2019-07-04T23:08:27","slug":"idapython-making-strings-decompiler-friendly","status":"publish","type":"post","link":"https:\/\/www.hexacorn.com\/blog\/2015\/12\/21\/idapython-making-strings-decompiler-friendly\/","title":{"rendered":"IDAPython &#8211; making strings decompiler-friendly"},"content":{"rendered":"<p><strong>Update<\/strong><\/p>\n<p>As pointed out by <a href=\"https:\/\/twitter.com\/0stracon\">0stracon<\/a> there is an option in Hexrays that actually enables it to print all strings. Go to Hex-Rays Decompiler Analysis Options and untick &#8216;Print only constant string literals&#8217;.<\/p>\n<p>To make it permanent, enable it in hexrays.cfg:<\/p>\n<pre>#define HO_CONST_STRINGS\u00a0\u00a0 0x0040\u00a0\u00a0 \/\/ Only print string literals if they reside\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \/\/ in read-only memory (e.g. .rodata segment).\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \/\/ When off, all strings are printed as literals.\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \/\/ You can override decompiler's decision by\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \/\/ adding 'const' or 'volatile' to the\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \/\/ string variable's type declaration\r\nHEXOPTIONS               = 0x....   \/\/ Combination of HO_... bits<\/pre>\n<p>I was not aware of this option and reinvented the wheel \ud83d\ude42<\/p>\n<p><strong>Old post<\/strong><\/p>\n<p>One of the features of IDA is its ability to recognize strings. This is a great feature, especially useful when you combine it with a power of HexRays decompiler &#8211; together they can produce a very nice pseudocode.<\/p>\n<p>There is only one annoying bit there: if strings are recognized and defined inside a writable segment, they will not be presented by the decompiler as strings, but as variable names referring to strings.<\/p>\n<p>Let&#8217;s have a look at the example.<\/p>\n<p>In the below example (Dexter sample) IDA recognizes the string &#8220;UpdateMutex:&#8221;<\/p>\n<p><a href=\"https:\/\/www.hexacorn.com\/blog\/wp-content\/uploads\/2015\/12\/strings_1.png\"><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter size-medium wp-image-3442\" src=\"https:\/\/www.hexacorn.com\/blog\/wp-content\/uploads\/2015\/12\/strings_1-300x186.png\" alt=\"strings_1\" width=\"300\" height=\"186\" srcset=\"https:\/\/www.hexacorn.com\/blog\/wp-content\/uploads\/2015\/12\/strings_1-300x186.png 300w, https:\/\/www.hexacorn.com\/blog\/wp-content\/uploads\/2015\/12\/strings_1-80x50.png 80w, https:\/\/www.hexacorn.com\/blog\/wp-content\/uploads\/2015\/12\/strings_1-598x372.png 598w, https:\/\/www.hexacorn.com\/blog\/wp-content\/uploads\/2015\/12\/strings_1.png 879w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a>When we now switch to the decompiler view, we will see that the decompiler changes it to s__Updatemutex:<\/p>\n<p><a href=\"https:\/\/www.hexacorn.com\/blog\/wp-content\/uploads\/2015\/12\/strings_1a.png\"><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter size-medium wp-image-3446\" src=\"https:\/\/www.hexacorn.com\/blog\/wp-content\/uploads\/2015\/12\/strings_1a-300x135.png\" alt=\"strings_1a\" width=\"300\" height=\"135\" srcset=\"https:\/\/www.hexacorn.com\/blog\/wp-content\/uploads\/2015\/12\/strings_1a-300x135.png 300w, https:\/\/www.hexacorn.com\/blog\/wp-content\/uploads\/2015\/12\/strings_1a.png 301w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p>(the &#8216;s__&#8217; prefix comes from the string prefix I typically use i.e. &#8216;s-&gt;&#8217; which decompiler &#8216;flattens&#8217; to &#8216;s__&#8217;). The s__Updatemutex refers to a string as shown below i.e. &#8220;UpdateMutex:&#8221; :<\/p>\n<p><a href=\"https:\/\/www.hexacorn.com\/blog\/wp-content\/uploads\/2015\/12\/strings_2.png\"><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter size-medium wp-image-3443\" src=\"https:\/\/www.hexacorn.com\/blog\/wp-content\/uploads\/2015\/12\/strings_2-300x46.png\" alt=\"strings_2\" width=\"300\" height=\"46\" srcset=\"https:\/\/www.hexacorn.com\/blog\/wp-content\/uploads\/2015\/12\/strings_2-300x46.png 300w, https:\/\/www.hexacorn.com\/blog\/wp-content\/uploads\/2015\/12\/strings_2.png 819w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a>Obviously, a\u00a0 decompiled code that refers to the actual string is much more readable &#8211; see the same piece of code as shown above where data is referred to by actual strings:<\/p>\n<p><a href=\"https:\/\/www.hexacorn.com\/blog\/wp-content\/uploads\/2015\/12\/strings_2a.png\"><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter size-medium wp-image-3445\" src=\"https:\/\/www.hexacorn.com\/blog\/wp-content\/uploads\/2015\/12\/strings_2a-300x120.png\" alt=\"strings_2a\" width=\"300\" height=\"120\" srcset=\"https:\/\/www.hexacorn.com\/blog\/wp-content\/uploads\/2015\/12\/strings_2a-300x120.png 300w, https:\/\/www.hexacorn.com\/blog\/wp-content\/uploads\/2015\/12\/strings_2a.png 343w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a>In order to make the decompiler use these actual strings (not the reference) we have two options:<\/p>\n<ul>\n<li>Make the segment where the string is recognized read-only (by disabling &#8216;Write&#8217; in segment properties):<\/li>\n<\/ul>\n<p style=\"padding-left: 30px;\"><a href=\"https:\/\/www.hexacorn.com\/blog\/wp-content\/uploads\/2015\/12\/strings_3.png\"><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter size-medium wp-image-3447\" src=\"https:\/\/www.hexacorn.com\/blog\/wp-content\/uploads\/2015\/12\/strings_3-300x217.png\" alt=\"strings_3\" width=\"300\" height=\"217\" srcset=\"https:\/\/www.hexacorn.com\/blog\/wp-content\/uploads\/2015\/12\/strings_3-300x217.png 300w, https:\/\/www.hexacorn.com\/blog\/wp-content\/uploads\/2015\/12\/strings_3-222x160.png 222w, https:\/\/www.hexacorn.com\/blog\/wp-content\/uploads\/2015\/12\/strings_3.png 525w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a>Unfortunately, this will confuse the decompiler and will make the output not trustworthy (it is often truncated). You will also receive a friendly reminder that you are doing something stupid \ud83d\ude09 a.k.a. a red card from the decompiler&#8217;s authors:<\/p>\n<p><a href=\"https:\/\/www.hexacorn.com\/blog\/wp-content\/uploads\/2015\/12\/strings_3a.png\"><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter size-medium wp-image-3448\" src=\"https:\/\/www.hexacorn.com\/blog\/wp-content\/uploads\/2015\/12\/strings_3a-300x218.png\" alt=\"strings_3a\" width=\"300\" height=\"218\" srcset=\"https:\/\/www.hexacorn.com\/blog\/wp-content\/uploads\/2015\/12\/strings_3a-300x218.png 300w, https:\/\/www.hexacorn.com\/blog\/wp-content\/uploads\/2015\/12\/strings_3a-222x160.png 222w, https:\/\/www.hexacorn.com\/blog\/wp-content\/uploads\/2015\/12\/strings_3a.png 555w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<ul>\n<li>The second option is to use a &#8216;proper&#8217; method of fixing the issue by telling the IDA that the string is a read-only a.k.a. constant i.e. you can change the type of the string from existing one to the one prefixed with a keyword &#8216;const&#8217;:<\/li>\n<\/ul>\n<p><a href=\"https:\/\/www.hexacorn.com\/blog\/wp-content\/uploads\/2015\/12\/strings_4.png\"><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter size-medium wp-image-3444\" src=\"https:\/\/www.hexacorn.com\/blog\/wp-content\/uploads\/2015\/12\/strings_4-300x44.png\" alt=\"strings_4\" width=\"300\" height=\"44\" srcset=\"https:\/\/www.hexacorn.com\/blog\/wp-content\/uploads\/2015\/12\/strings_4-300x44.png 300w, https:\/\/www.hexacorn.com\/blog\/wp-content\/uploads\/2015\/12\/strings_4.png 708w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a>Since most of the time strings are static it is handy to convert all the strings in IDA to read-only i.e. retyping all of them using the &#8216;const&#8217; trick.<\/p>\n<p>This is exactly what the <a href=\"https:\/\/www.hexacorn.com\/tools\/strings_to_const.py\">strings_to_const.py<\/a> script is intended to do.<\/p>\n<p>It enumerates all segments, finds all strings recognized by IDA (note the comment about the prefix I use, you may need to adapt it to your needs), and then converts them to read-only.<\/p>\n<p>The result?<\/p>\n<p>See below &#8211; before and after:<\/p>\n<p><a href=\"https:\/\/www.hexacorn.com\/blog\/wp-content\/uploads\/2015\/12\/strings_before_after.png\"><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter size-medium wp-image-3449\" src=\"https:\/\/www.hexacorn.com\/blog\/wp-content\/uploads\/2015\/12\/strings_before_after-300x255.png\" alt=\"strings_before_after\" width=\"300\" height=\"255\" srcset=\"https:\/\/www.hexacorn.com\/blog\/wp-content\/uploads\/2015\/12\/strings_before_after-300x255.png 300w, https:\/\/www.hexacorn.com\/blog\/wp-content\/uploads\/2015\/12\/strings_before_after.png 1012w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Update As pointed out by 0stracon there is an option in Hexrays that actually enables it to print all strings. Go to Hex-Rays Decompiler Analysis Options and untick &#8216;Print only constant string literals&#8217;. To make it permanent, enable it in &hellip; <a href=\"https:\/\/www.hexacorn.com\/blog\/2015\/12\/21\/idapython-making-strings-decompiler-friendly\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[85,9,44,5],"tags":[],"_links":{"self":[{"href":"https:\/\/www.hexacorn.com\/blog\/wp-json\/wp\/v2\/posts\/3441"}],"collection":[{"href":"https:\/\/www.hexacorn.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.hexacorn.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.hexacorn.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.hexacorn.com\/blog\/wp-json\/wp\/v2\/comments?post=3441"}],"version-history":[{"count":7,"href":"https:\/\/www.hexacorn.com\/blog\/wp-json\/wp\/v2\/posts\/3441\/revisions"}],"predecessor-version":[{"id":3456,"href":"https:\/\/www.hexacorn.com\/blog\/wp-json\/wp\/v2\/posts\/3441\/revisions\/3456"}],"wp:attachment":[{"href":"https:\/\/www.hexacorn.com\/blog\/wp-json\/wp\/v2\/media?parent=3441"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.hexacorn.com\/blog\/wp-json\/wp\/v2\/categories?post=3441"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.hexacorn.com\/blog\/wp-json\/wp\/v2\/tags?post=3441"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}