{"id":3674,"date":"2016-06-07T19:39:13","date_gmt":"2016-06-07T19:39:13","guid":{"rendered":"http:\/\/www.hexacorn.com\/blog\/?p=3674"},"modified":"2019-01-29T17:33:59","modified_gmt":"2019-01-29T17:33:59","slug":"win10-registry-and-fun-with-ucsutf16","status":"publish","type":"post","link":"https:\/\/www.hexacorn.com\/blog\/2016\/06\/07\/win10-registry-and-fun-with-ucsutf16\/","title":{"rendered":"Win10, Registry, and fun with UCS\/UTF16"},"content":{"rendered":"<p>We got so used to &#8216;see&#8217; Unicode strings as being made up of characters that occupy 2-bytes that we often forget that it&#8217;s actually not true &#8211; using 2 bytes is just a convenient way to represent most of the common characters, but the standard allows us to use characters that are outside of that 16-bit spectrum. To represent them it uses sth called high- and low- surrogates:<\/p>\n<p>As per <a href=\"https:\/\/en.wikipedia.org\/wiki\/Universal_Character_Set_characters\">https:\/\/en.wikipedia.org\/wiki\/Universal_Character_Set_characters<\/a>:<\/p>\n<p style=\"padding-left: 30px;\"><b>Surrogates<\/b>. The UCS includes 2,048 code points in the Basic Multilingual Plane (BMP) for surrogate code point pairs. Together these surrogates allow any code point in the sixteen other planes to be addressed by using two surrogate code points. This provides a simple built-in method for encoding the 20.1 bit UCS within a 16 bit encoding such as UTF-16. In this way UTF-16 can represent any character within the BMP with a single 16-bit byte. Characters outside the BMP are then encoded using two 16-bit bytes (4 octets total) using the surrogate pairs.<\/p>\n<p>Why I am writing about it?<\/p>\n<p>I just stumbled upon a Registry key that is using the 4-byte long Unicode Characters in Windows 10 ;):<\/p>\n<p>HKEY_USERS\\.DEFAULT\\Control Panel\\International\\&#x1f30e;&#x1f30f;&#x1f30d;<\/p>\n<p><a href=\"https:\/\/www.hexacorn.com\/blog\/wp-content\/uploads\/2016\/06\/ucskey.png\"><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter wp-image-3675 size-full\" src=\"https:\/\/www.hexacorn.com\/blog\/wp-content\/uploads\/2016\/06\/ucskey.png\" alt=\"ucskey\" width=\"289\" height=\"326\" srcset=\"https:\/\/www.hexacorn.com\/blog\/wp-content\/uploads\/2016\/06\/ucskey.png 289w, https:\/\/www.hexacorn.com\/blog\/wp-content\/uploads\/2016\/06\/ucskey-266x300.png 266w\" sizes=\"(max-width: 289px) 100vw, 289px\" \/><\/a><\/p>\n<p>It looks like a gimmick, and someone probably had a bit of fun implementing it, but this is actually a legitimate entry being queried when Windows starts!<\/p>\n<p><a href=\"https:\/\/www.hexacorn.com\/blog\/wp-content\/uploads\/2016\/06\/wininit.png\"><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter wp-image-3676\" src=\"https:\/\/www.hexacorn.com\/blog\/wp-content\/uploads\/2016\/06\/wininit-300x30.png\" alt=\"wininit\" width=\"500\" height=\"50\" srcset=\"https:\/\/www.hexacorn.com\/blog\/wp-content\/uploads\/2016\/06\/wininit-300x30.png 300w, https:\/\/www.hexacorn.com\/blog\/wp-content\/uploads\/2016\/06\/wininit.png 644w\" sizes=\"(max-width: 500px) 100vw, 500px\" \/><\/a><\/p>\n<p>The characters are (by their binary representation):<\/p>\n<ul>\n<li>3C D8 0E DF\n<ul>\n<li>0xD83C 0xDF0E <a href=\"http:\/\/www.fileformat.info\/info\/unicode\/char\/1F30E\/index.htm\">http:\/\/www.fileformat.info\/info\/unicode\/char\/1F30E\/index.htm<\/a><\/li>\n<\/ul>\n<\/li>\n<li>3C D8 0F DF\n<ul>\n<li>0xD83C 0xDF0F <a href=\"http:\/\/www.fileformat.info\/info\/unicode\/char\/1f30f\/index.htm\">http:\/\/www.fileformat.info\/info\/unicode\/char\/1f30f\/index.htm<\/a><\/li>\n<\/ul>\n<\/li>\n<li>3C D8 0D DF\n<ul>\n<li>0xD83C 0xDF0D <a href=\"http:\/\/www.fileformat.info\/info\/unicode\/char\/1F30D\/index.htm\">http:\/\/www.fileformat.info\/info\/unicode\/char\/1F30D\/index.htm<\/a><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>I wonder what could be impacted by the &#8220;Unicode string=16-bit characters&#8221; assumption:<\/p>\n<ul>\n<li>I guess not all tools may support UCS properly\n<ul>\n<li>if they assume Unicode is 16-bit\/use their own parsers w\/o taking into account surrogates (I am guilty as charged, I often simplify my scripts this way)<\/li>\n<li>obviously, most of &#8216;strings&#8217; tools fail on this too (but most of them fail on non-English Unicode strings anyway)<\/li>\n<li>many fonts don&#8217;t support surrogates and they can&#8217;t display them (Win10 Consolas does, win7 Arial Unicode doesn&#8217;t)<\/li>\n<li>I noticed that cmd.exe on Win10 can&#8217;t &#8216;see&#8217; these properly and there is no direct way to change the font to Consolas &#8211; see below the folder named same way as the key &#8211; as seen in Explorer and in cmd terminal:<a href=\"https:\/\/www.hexacorn.com\/blog\/wp-content\/uploads\/2016\/06\/cmd-1.png\"><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter size-medium wp-image-5886\" src=\"https:\/\/www.hexacorn.com\/blog\/wp-content\/uploads\/2016\/06\/cmd-1-300x122.png\" alt=\"\" width=\"300\" height=\"122\" srcset=\"https:\/\/www.hexacorn.com\/blog\/wp-content\/uploads\/2016\/06\/cmd-1-300x122.png 300w, https:\/\/www.hexacorn.com\/blog\/wp-content\/uploads\/2016\/06\/cmd-1.png 380w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a><\/li>\n<\/ul>\n<\/li>\n<li>who knows, maybe malware will start using it too<br \/>\n<a href=\"https:\/\/www.hexacorn.com\/blog\/wp-content\/uploads\/2016\/06\/autoruns.png\"><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter wp-image-3683\" src=\"https:\/\/www.hexacorn.com\/blog\/wp-content\/uploads\/2016\/06\/autoruns-300x18.png\" alt=\"autoruns\" width=\"506\" height=\"30\" srcset=\"https:\/\/www.hexacorn.com\/blog\/wp-content\/uploads\/2016\/06\/autoruns-300x18.png 300w, https:\/\/www.hexacorn.com\/blog\/wp-content\/uploads\/2016\/06\/autoruns.png 590w\" sizes=\"(max-width: 506px) 100vw, 506px\" \/><\/a><\/li>\n<\/ul>\n<p>Anyway, it&#8217;s more a trivia than anything else&#8230;<\/p>\n<p><strong>Note:<\/strong><\/p>\n<p>if you want to test your tool, run it on non-windows10 OS version; this way you will see if the app supports it both from the analysis perspective (proper parsing of UCS strings) and visually (font)<\/p>\n","protected":false},"excerpt":{"rendered":"<p>We got so used to &#8216;see&#8217; Unicode strings as being made up of characters that occupy 2-bytes that we often forget that it&#8217;s actually not true &#8211; using 2 bytes is just a convenient way to represent most of the &hellip; <a href=\"https:\/\/www.hexacorn.com\/blog\/2016\/06\/07\/win10-registry-and-fun-with-ucsutf16\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[19],"tags":[],"_links":{"self":[{"href":"https:\/\/www.hexacorn.com\/blog\/wp-json\/wp\/v2\/posts\/3674"}],"collection":[{"href":"https:\/\/www.hexacorn.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.hexacorn.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.hexacorn.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.hexacorn.com\/blog\/wp-json\/wp\/v2\/comments?post=3674"}],"version-history":[{"count":6,"href":"https:\/\/www.hexacorn.com\/blog\/wp-json\/wp\/v2\/posts\/3674\/revisions"}],"predecessor-version":[{"id":5887,"href":"https:\/\/www.hexacorn.com\/blog\/wp-json\/wp\/v2\/posts\/3674\/revisions\/5887"}],"wp:attachment":[{"href":"https:\/\/www.hexacorn.com\/blog\/wp-json\/wp\/v2\/media?parent=3674"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.hexacorn.com\/blog\/wp-json\/wp\/v2\/categories?post=3674"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.hexacorn.com\/blog\/wp-json\/wp\/v2\/tags?post=3674"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}