FaaS for noobs

This is the first version of this article. Due to nuances, and things I forgot while writing its first version I will come back to it to fix stuff I got wrong or missed. If you have any comments, please reach out. Thank you….

FaaS

FaaS means Factorization as a Service and it is a name of a cool factorization framework relying on AWS EC2 cluster.

The framework code was released in 2015 and since then:

  • some patches were introduced to it, BUT
  • EC2 environment itself changed a lot – it makes it kinda hard for newcomers to start with FaaS as what awaits is a a serious troubleshooting session…

The below info tries to describe items of interest by focusing on a sequence of changes one has to introduce to the original FaaS config, and the hosting environment to make it work in 2020/2021.

Host OS
I created a new ‘main’ VM I wanted to use for this setup from the scratch, using Ubuntu 20.04 (ubuntu-20.04-desktop-amd64.iso). I then installed and updated python2 & python3 & libs as I went along (pip install …). I can’t describe all the changes here, but they are easy enough to spot. If your python code doesn’t work –> update python and the libraries. PIP command works very well and takes care of almost everything. Also, the good thing about this version of the Ubuntu system is that you really are in luck – Ubuntu 20.04 has almost everything you need + changes/adjustments required for FaaS are cosmetic in nature…

Amazon AWS

If you never used AWS, I want you to think of it as a place where you go to buy a server like you buy a beer. The original FaaS is focusing on buying that beer at the end of the party (Spot Instances) as opposed to buying anytime you want (On Demand). The difference in price is substantial – usually 10-11 times. Yes, seriously. And after a while you will notice that Spot Instances are hard to get sometimes (covid times!) and you may need to opt in for On Demand instances – these will cost you a proper dollar. If you want to drink anytime you want, you need to pay premium and ‘just in time’.

Another thing to remember is ‘where’ you buy that beer aka procure these instances. Bar hopping is fun, but… you must be VERY CAREFUL about procuring instances across regions. It’s extremely easy to start toying around with AWS across regions and get ‘unexpected’ bills at the end of the month. What is AWS region? It’s a place where you buy beer. It could be US, EMEA, APAC. And within these regions there are sub-regions that you need to explore.

If it doesn’t make sense… let’s start again. You want to lease a bunch of servers within a data center that is physically located in one of a few available places on Earth where Amazon hosts them. The bill for using each of these data centers comes to you separately. The moment you run/test/acquire some servers you owe money. Still not sure what I mean? Abandon this article. Or this will cost you money. Yes, go away and read more and come back when you are more comfy buying and paying for what you are using… There is no free AWS lunch.

You may think that it’s a very inflexible and ‘should be centralized’ pricing model, but… it’s your own responsibility. There is no easy way to manage it other than keeping some sort of logs of what region you started playing with. And yes, take it very seriously. These dollars add up very quickly and you don’t want to pay a huge bill for forgetting that you have spawn a few resources in other regions which you forgot to terminate. Note: I really really don’t blame Amazon/AWS. It’s you who procure and utilize resources. Always clean after yourself. You have been warned. It’s almost for granted that when you are new to AWS you will have to pay for bills in more than region. Yes, we all have to start somewhere.

At this stage YOU HAVE BEEN WARNED like 3 times? Continue reading at your own risk.

So… coming back to instance acquisition. Spot instance is a shared resource you have to bid for, and the OnDemand instance is something you acquire when you want and are eager to pay. How you choose Spot vs OnDemand instances from FaaS level? Read on.

First things first.

In both cases you DO need to raise Support tickets to AWS to request larger number of instances to be available to you. When you sign up to AWS for the very first time you are just a nobody and you are not trusted by default. And yes, they won’t raise these quotas w/o a proper justification, so be prepared to answer the questions they ask in the most honest/precise fashion. My experience is that AWS is pretty quick in replying and you get answers within 24-48h. They rarely give you what you ask for, that’s why you should ask for more than you need, by default.

After raising the tickets this is what I got:

Requested higher number of Spot Instances  --> 450+
Requested higher number of Instances --> 450+

Mind you – this is just for ONE region. If you plan to use instances in other regions you need to raise separate tickets!

Still with me?

Yes, it costs money.

Yes, it is pretty complicated.

FaaS build

I don’t have a very intricate knowledge of how FaaS works. It sounds absurd, but it’s true. I have read many files of this project and kinda ‘get it’, but I don’t know everything, and lots of terms I was introduced to while reading these files were new to me. I am not kidding. I was learning as I was going along.

So… my naive perception of how things work is as follows: FaaS builds a ‘master’ image where all the calculations are scheduled from, and where all the results are collected. It also build ‘slave’ images that do the actual work. The latter is built via Amazon Machine Images (AMI). (I know this section needs to be extended to include more info on AMI and MPIs.)

One of the things you do when you use FaaS is building that AMI image. During the build you will see a fail to old python version (it will say that ‘remote version of python is too old’), so have to ssh to that instance and update python on it & restart the process:

YML scripts

All of the FaaS scripts are using old notation!!!

email: {{}}
N: {{}}

so need to change it to new with quotes and ticks

email: "{{}}"
N: '<number>'

also, they use old notation for elevated shell and they refer to ‘sudo’; in newer Ansible you use ‘become’ i.e.:

sudo: yes|no

should become :

become: yes|no

EC2 folder:

added Debug section to main.yml -- not affecting anything, just listing detailed info which helps with trobleshooting

    - name: Debug

      debug:

        msg: "{{ ec2 }}"

in some instances had to enforce python3

ec2/build-finish.yml &
ec2/roles/build/tasks/install-msieve.yml

vars:

ansible_python_interpreter: /usr/bin/python3

also added install of full python3

ec2/roles/build/tasks/install-common.yml

changes:

  • apt: name=python3
    become: yes

ec2/roles/factor/templates/post_linalg.j2 & ec2/roles/factor/templates/post_sieve.j2

changes:

disabled termination of instances, JIC

ec2/roles/launch/tasks/main.yml

added:

wait: true 

instance_initiated_shutdown_behavior: terminate

ec2/vars/launch.yml

adjusted number of instances & changed them to

type: c4.8xlarge
cores: 36

And this is it pretty much it.

Sounds complicated?

Yes, it is. It should be. It took me 2-3 days of troubleshooting to make it finally work and I must honestly admit that still don’t know how all the parts work together 100%, but the exercise was worth the effort. Not only I was introduced to AWS and EC2 clusters, I actually ran a distributed calculation – something that a few years ago would not even be possible. The possibilities of ‘rent what you need’ cannot be under or overstated – it’s a completely different world than 20 years ago. Having an ability to launch a parallel computing task w/o being a privileged scientist, large corporate, or government still blows my mind. I mean… If it is all about CPU cycles, then you can just acquire it and go with it.

csrss.exe and its manifests

This is yet another odd behavior I spotted using Procmon. I was curious what .manifest files may be missing on my test Windows 10 system. The idea was that if I could find ‘phantom manifests’ I could use them as a persistence trick, or to escalate privileges.

To my surprise, one of the first findings was csrss.exe constantly trying to access Microsoft.Windows.Common-Controls.MANIFEST. So intensive are these efforts that the process is looking for this file in a couple of locations:

  • C:\WINDOWS\SysWOW64\en-US\Microsoft.Windows.Common-Controls.mui\Microsoft.Windows.Common-Controls.mui.MANIFEST
  • C:\WINDOWS\SysWOW64\en-US\Microsoft.Windows.Common-Controls\Microsoft.Windows.Common-Controls.MANIFEST
  • C:\WINDOWS\SysWOW64\en\Microsoft.Windows.Common-Controls.mui\Microsoft.Windows.Common-Controls.mui.MANIFEST
  • C:\WINDOWS\SysWOW64\en\Microsoft.Windows.Common-Controls\Microsoft.Windows.Common-Controls.MANIFEST
  • C:\WINDOWS\en-US\Microsoft.Windows.Common-Controls.mui\Microsoft.Windows.Common-Controls.mui.MANIFEST
  • C:\WINDOWS\en-US\Microsoft.Windows.Common-Controls\Microsoft.Windows.Common-Controls.MANIFEST
  • C:\WINDOWS\system32\en-US\Microsoft.Windows.Common-Controls.mui\Microsoft.Windows.Common-Controls.mui.MANIFEST
  • C:\WINDOWS\system32\en-US\Microsoft.Windows.Common-Controls\Microsoft.Windows.Common-Controls.MANIFEST
  • C:\WINDOWS\system32\en\Microsoft.Windows.Common-Controls.mui\Microsoft.Windows.Common-Controls.mui.MANIFEST
  • C:\WINDOWS\system32\en\Microsoft.Windows.Common-Controls\Microsoft.Windows.Common-Controls.MANIFEST
  • C:\Windows\SysWOW64\en-US\Microsoft.Windows.Common-Controls.MANIFEST
  • C:\Windows\SysWOW64\en-US\Microsoft.Windows.Common-Controls.mui.MANIFEST
  • C:\Windows\SysWOW64\en\Microsoft.Windows.Common-Controls.MANIFEST
  • C:\Windows\SysWOW64\en\Microsoft.Windows.Common-Controls.mui.MANIFEST
  • C:\Windows\System32\en-US\Microsoft.Windows.Common-Controls.MANIFEST
  • C:\Windows\System32\en-US\Microsoft.Windows.Common-Controls.mui.MANIFEST
  • C:\Windows\System32\en\Microsoft.Windows.Common-Controls.MANIFEST
  • C:\Windows\System32\en\Microsoft.Windows.Common-Controls.mui.MANIFEST
  • C:\Windows\en-US\Microsoft.Windows.Common-Controls.MANIFEST
  • C:\Windows\en-US\Microsoft.Windows.Common-Controls.mui.MANIFEST

Note the unusual .mui.MANIFEST file extension as well as the directories: Microsoft.Windows.Common-Controls.mui and Microsoft.Windows.Common-Controls that are being accessed as well.

After poking around I discovered that the actual code that does all these searches resides inside sxs.dll – it all happens when SxsGenerateActivationContext API is called. One of the functions this API calls is SxspExpandProbingCandidate and this one probes various system locations for a manifest file. Interestingly, some of the SXS code seems to be probing .dll and .mui files found during these searches and checks their resources as well (to see if any matching manifest resource can be found). I guess some more finding to be expected from this portion of code in the future.

Of course, once I discovered that a specific manifest file csrss.exe is looking for is not present on a system, I immediately created a dummy one. I then restarted the system and it simply hang. That was a good sign :-).

I then tried to test the whole thing one more time but this time w/o immediate restart and with Procmon running. The manifest file I introduced was using the file tag with a name attribute pointing to my test DLL that was placed in the same directory as manifest file and inside the c:\windows\system32\:

<file name="test.dll"></file>

Once I created C:\Windows\en-US\Microsoft.Windows.Common-Controls.MANIFEST, the csrss.exe process could access it and… it did read it. On a surface nothing changed, however, next time I tried running a GUI application i.e. calc.exe, I got this message:

Hmm. This is a nice proof that my manifest file is being taken into account, and it apparently broke something. As expected, removing the .manifest file I introduced removes the issue, plus confirms that this manifest file could be modified during run-time as csrss.exe does not seem to be caching its content.

As a side note, csrss.exe seems to be accessing C:\Windows\WindowsShell.Manifest as well, so since this one exists on the system by default it could be modified.

Now, the question is what is the manifest content that could make csrss.exe ‘like’ it.

Ideas?

After poking around a bit more I discovered that csrss.exe ‘likes’ manifest files a lot. I let the VM run with the Procmon on. After a while I got a few good hits. Example paths include:

  • C:\Program Files (x86)\Microsoft Visual Studio 14.0\Common7\IDE\coloader80.dll.manifest
  • C:\Program Files (x86)\Microsoft Visual Studio 14.0\Common7\IDE\DebuggerProxy.dll.manifest
  • C:\Program Files (x86)\Microsoft Visual Studio 14.0\Common7\IDE\Microsoft.VisualStudio.CompilerHostObjectsProxy.dll.manifest

A-ha.

They actually exist on my test system so I can have a peep.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!-- Copyright (C) 1981-2007 Microsoft Corporation -->
<assembly xmlns="urn:schemas-microsoft-com:asm.v1" manifestVersion="1.0">
<noInheritable/>
<assemblyIdentity type="win32" processorArchitecture="x86" name="debuggerproxy.dll" version="1.0.0.0" />
<file name="debuggerproxy.dll">
<comClass clsid="{C5621364-87CC-4731-8947-929CAE75323E}" threadingModel="Both"/>
</file>
<comInterfaceExternalProxyStub name="CausalityInternal_IAD7ALCausalityEventBridge" iid="{F6A124D7-5BB7-47B2-A9AF-AAB0EEAB60E3}" numMethods="5" proxyStubClsid32="{C5621364-87CC-4731-8947-929CAE75323E}"/>

OR

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<assembly xmlns="urn:schemas-microsoft-com:asm.v1" manifestVersion="1.0" copyright="Copyright (c) Microsoft Corporation. All Rights Reserved." xmlns:cmiv2="urn:schemas-microsoft-com:asm.v3" cmiv2:copyright="Copyright (c) Microsoft Corporation. All Rights Reserved.">
<noInheritable />
<assemblyIdentity name="Microsoft.Windows.Common-Controls" version="6.0.18362.1016" processorArchitecture="x86" publicKeyToken="6595b64144ccf1df" type="win32" />
<file name="comctl32.dll" cmiv2:importPath="$(build.nttree)\asms\60\msft\windows\common\controls" cmiv2:sourceName="">
<windowClass>ToolbarWindow32</windowClass>
<windowClass>ComboBoxEx32</windowClass>

So, hmm both file and COM stuff seem to be supported well.

I guess the file must be signed or something?

Ideas?

I followed with the simplest example ever – I put the comctl32.dll as a value of a name attribute inside the manifest file, then placed copy of comctl32.dll inside the same directory. Then I restarted the computer.

Hello nothingness.

After restart no Explorer in sight. Task Manager shows as below:

A-ha. Let’s try to run explorer.

Okay, so everything is broken as before. A good sign, I guess.

Ideas?