Flare-On 9 – Task 8

For those of you who don’t know, Flare-On is an annual “reverse engineering marathon” organized by Mandiant (formerly by FireEye). It runs for 6 weeks, and contains usually 10-12 tasks of increasing difficulty. This year I completed as 103 (solves board here). In this short series you will find my solutions of the tasks I enjoyed the most.

Unquestionably, the most interesting and complex challenge of this year was the 8th one.

You can find the package here: 08_backdoor.7z , password: flare


This challenge is a PE written in .NET. Even at first sight we can see it is some atypical. It contains 74 sections. In addition to the standard sections like .text, .rsrc and .reloc, there are sections that clearly contain some encrypted/obfuscated content. Their names look like some byte strings (that could be checksums or fragments of hashes).

As usually when encountering a .NET file, I opened it in dnSpy to have a look at the decompiled code.

The program contains multiple classes with a names starting with “FLARE”:

Deobfuscating the stage 1

The Entry Point is in the class named Program. Looking inside we can realize that the bytecode of most of the methods is obfuscated, and can’t be decompiled with dnSpy:

It looks very messy and intimidating, but we still have some methods that haven’t been obfuscated, so let’s start from those ones.

The function that is executed first, FLARE15.flare_74 , initializes some tables, that are going to be used further:

The next function to be executed, Program.flared_38, can’t be decompiled. So I previewed the CIL code, to check if it makes any sense:

It doesn’t – we can see some instructions that are marked as UNKNOWN. So we can assume, that this function is here only to throw an exception, and the meaningful code is going to be in the exception handler. So, let’s take a look there.

The function flare_70 that is executed in the exception handler, follows the same logic. It calls a function flared_70 which contains invalid, nonsensical code, just to trigger an exception.

And then, in the exception handler, flare_71 is executed. It gets as parameters two of the global variables, that were initialized in the Main, by the function FLARE15.flare_74.

The first of those passed variables is a dictionary, and the other – an array of bytes.

Fortunately, this rabbit-hole doesn’t go deeper for now, and the function flare_71 contains a meaningful code:

// Token: 0x060000BC RID: 188 RVA: 0x00013EB8 File Offset: 0x0001AEB8
public static object flare_71(InvalidProgramException e, object[] args, Dictionary<uint, int> m, byte[] b)
StackTrace stackTrace = new StackTrace(e);
int metadataToken = stackTrace.GetFrame(0).GetMethod().MetadataToken;
Module module = typeof(Program).Module;
MethodInfo methodInfo = (MethodInfo)module.ResolveMethod(metadataToken);
MethodBase methodBase = module.ResolveMethod(metadataToken);
ParameterInfo[] parameters = methodInfo.GetParameters();
Type[] array = new Type[parameters.Length];
SignatureHelper localVarSigHelper = SignatureHelper.GetLocalVarSigHelper();
for (int i = 0; i < array.Length; i++)
array[i] = parameters[i].ParameterType;
Type declaringType = methodBase.DeclaringType;
DynamicMethod dynamicMethod = new DynamicMethod("", methodInfo.ReturnType, array, declaringType, true);
DynamicILInfo dynamicILInfo = dynamicMethod.GetDynamicILInfo();
MethodBody methodBody = methodInfo.GetMethodBody();
foreach (LocalVariableInfo localVariableInfo in methodBody.LocalVariables)
byte[] signature = localVarSigHelper.GetSignature();
foreach (KeyValuePair<uint, int> keyValuePair in m)
int value = keyValuePair.Value;
uint key = keyValuePair.Key;
bool flag = value >= 1879048192 && value < 1879113727;
int tokenFor;
if (flag)
tokenFor = dynamicILInfo.GetTokenFor(module.ResolveString(value));
MemberInfo memberInfo = declaringType.Module.ResolveMember(value, null, null);
bool flag2 = memberInfo.GetType().Name == "RtFieldInfo";
if (flag2)
tokenFor = dynamicILInfo.GetTokenFor(((FieldInfo)memberInfo).FieldHandle, ((TypeInfo)((FieldInfo)memberInfo).DeclaringType).TypeHandle);
bool flag3 = memberInfo.GetType().Name == "RuntimeType";
if (flag3)
tokenFor = dynamicILInfo.GetTokenFor(((TypeInfo)memberInfo).TypeHandle);
bool flag4 = memberInfo.Name == ".ctor" || memberInfo.Name == ".cctor";
if (flag4)
tokenFor = dynamicILInfo.GetTokenFor(((ConstructorInfo)memberInfo).MethodHandle, ((TypeInfo)((ConstructorInfo)memberInfo).DeclaringType).TypeHandle);
tokenFor = dynamicILInfo.GetTokenFor(((MethodInfo)memberInfo).MethodHandle, ((TypeInfo)((MethodInfo)memberInfo).DeclaringType).TypeHandle);
b[(int)key] = (byte)tokenFor;
b[(int)(key + 1U)] = (byte)(tokenFor >> 8);
b[(int)(key + 2U)] = (byte)(tokenFor >> 16);
b[(int)(key + 3U)] = (byte)(tokenFor >> 24);
dynamicILInfo.SetCode(b, methodBody.MaxStackSize);
return dynamicMethod.Invoke(null, args);
view raw flare_71.cs hosted with ❤ by GitHub

By analyzing the code we finally come to know what is happening here. The function that has thrown the exception, along with its prototype, is retrieved, as well as the parameters that were passed to it.

Then, a dynamic method is created, as a replacement, using the values passed as flare_71 arguments (FLARE15.wl_mFLARE15.wl_b in the analyzed case). The last function parameter, containing the byte array, is in fact a bytecode of the new method.

Finally, the newly created dynamic function is called, with the same prototype and arguments as the function that thrown the exception that leaded to here:

Creation of the dynamic function:

So, if we manage to get the code that was about to be executed, and fill it in on the place of the nonsensical code, we could get the function decompiled, and the flow deobfuscated.

I found 7 functions total that were obfuscated in the same way:

  1. flared_35
  2. flared_47
  3. flared_66
  4. flared_67
  5. flared_68
  6. flared_69
  7. flared_70

My first thought was to just dump the code before the execution, and fill it in at the offset where the original function was located. I tried to do it, and although the code that I got looked like a valid IL code, still something was clearly wrong. Some of the functions (i.e. flared_70 ) decompiled correctly, but had fragments that were not making sense:

Other function wasn’t decompiling. When I looked at the bytecode preview, I noticed that some references inside are clearly invalid:

Invalid function – .NET bytecode viewed in IDA

But why is it so, if I dumped exactly the same code that worked fine while dynamically executed? Well – there is a catch (thanks to Alex Skalozub for a hint on this!). Before the function can be executed, all the referenced tokens need to be rebased. This is the responsible fragment:

When the function was prepared to be executed dynamically, they were rebased to that dynamic token. To be able to fill it in, back to the place of the static function, we need to rebase them to the original, static function’s token. This modified version of the function does the job:

public static byte[] flare_71(Dictionary<uint, int> m, byte[] b)
foreach (KeyValuePair<uint, int> keyValuePair in m)
int value = keyValuePair.Value;
uint key = keyValuePair.Key;
int tokenFor = value;
b[(int)key] = (byte)tokenFor;
b[(int)(key + 1U)] = (byte)(tokenFor >> 8);
b[(int)(key + 2U)] = (byte)(tokenFor >> 16);
b[(int)(key + 3U)] = (byte)(tokenFor >> 24);
return b;

I implemented a simple decoder, basing on the original, decompiled code, plus the modified version of flare_71. The decoder was initializing all the global variables, and then calling the function flare_71 with parameters appropriate for a particular function. After that the resut was saved into a file.


Example – decoded bytecode for the function flared_70:

There were only 7 functions to be filled at this stage, so I decided to copy-paste the resulted bytecode manually. The file offset where the function starts can be found in dnSpy:

However, we need to take into consideration that that the function starts with a header, and then the bytecode follows. We can see this layout in dnSpy hexeditor:

So, in above function, the bytecode starts at the offset 0x1AE10, and this is where we can copy the decoded content. As we can see, the size of the decoded bytecode is exactly the same as the size of the nonsensical code that was used as the filler – that makes this whole operation possible.

The same method filled with the decoded body:

After pasting all the fragments we can see a big progress – all the 7 functions decompiled fine!

Yet – this is just a beginning, because there is another stage to be deobfuscated…

Deobfuscating the stage 2

Now, after deobfuscating the function `flared_70` we can see what is happening there.

The function flare_66 that is called first, is responsible for calculating a SHA256 hash from a body of the obfuscated function which has thrown the exception:

Then, the function flared_69 takes this hash, and enumerate all the PE sections, searching for the section names exactly like the beginning of that hash. The body of this section is being read:

The function flared_47 (called by flare_46 ) decodes the read section’s content:

And finally, the function flared_67 uses the decoded content and creates a dynamic function to be called, out of the supplied bytecode.

Full function snippet here.

It turns out that we need to decode it analogous to the previous layer.

This time the original token is first decoded:

So, this is the value that we need to use as a token for the static version of the function:

uint num = (uint)FLARE15.flared_68(b, j);
num ^= 2727913149U;
uint tokenFor = num; // use decoded num as a token
b[j] = (byte)tokenFor;
b[j + 1] = (byte)(tokenFor >> 8);
b[j + 2] = (byte)(tokenFor >> 16);
b[j + 3] = (byte)(tokenFor >> 24);
j += 4;

This time, the number of the functions to be filled is much bigger than in the previous layer, making filling it by hand inefficient and unreasonable.

There are various ways to automate it.

For automating the decoding of the body of each function, I used .NET reflection. I loaded the challenge executable (with the stage 1 patched) from the disk, and retrieved the list of all included types. Then walked through that list, filtering out non-static types, and those with names not starting from flared_ (which was a prefix of every obfuscated function):

Assembly a = Assembly.LoadFrom(fileToPatch);
Module[] m = a.Modules.ToArray();
if (m.Length == 0) return false;
Module module = m[0];
Type[] tArray = module.FindTypes(Module.FilterTypeName, "*");
int notFound = 0;
foreach (Type t in tArray)
foreach (MethodInfo mi in t.GetMethods())
var metadataToken = mi.MetadataToken;
string name = mi.Name;
if (!mi.IsStatic) { continue; }
if (!name.StartsWith("flared_")) { continue; }
// Do the stuff
view raw snippet1.cs hosted with ❤ by GitHub

This is how I got the list of methods to be deobfuscated. I could retrieve their deobfuscated bodies pretty easily, by applying the (slightly modified) original functions, that were discussed above: calculating the hash of the content, finding proper section, decoding it).

Still the remaining problem to be solved, was to automatically patch the executable with the decoded contents. Probably the most elegant solution here would be to use dnlib. What I did was more “hacky” but nevertheless it worked fine. I decided to make a lookup table of the file offsets where the functions were located. As we saw earlier, those offsets are given as a comments generated by dnSpy. So, I saved the full decompiled project from dnSpy, and then used the grep to filter the lines with the file offsets. Post-processed the output a bit, in a simple text editor, and as a result I’ve got the following table: file_offsets.txt. Now this table needs to be read by the decoder, and parsed into a dictionary:

static Dictionary<int, int> createMapOfTokens(string tokensFile)
string tokenStr = "Token: ";
string offsetStr = "File Offset: ";
string sepStr = " RID:";
var tokenToOffset = new Dictionary<int, int>();
foreach (string line in System.IO.File.ReadLines(tokensFile))
int tokenStart = line.IndexOf(tokenStr);
int sep = line.IndexOf(sepStr);
int offsetStart = line.IndexOf(offsetStr);
int len = sep (tokenStart + tokenStr.Length);
string tokenPart = line.Substring(tokenStart + tokenStr.Length, len);
string offsetPart = line.Substring(offsetStart + offsetStr.Length);
int tokenVal = Convert.ToInt32(tokenPart, 16);
int offsetVal = Convert.ToInt32(offsetPart, 16);
Console.WriteLine(System.String.Format(@"Adding: '{0}' '{1:X}'", tokenPart, offsetVal));
tokenToOffset[tokenVal] = offsetVal;
return tokenToOffset;
view raw map_tokens.cs hosted with ❤ by GitHub

That’s how we have the offset where each function starts. Yet, as we mentioned before, this offset is not exactly the offset where the patch is to be applied – there is still a header. And to make things more complicated, multiple different versions of header are possible, with different lengths.

Still, I could retrieve the original (obfuscated) function’s body with .NET reflection. So, as a workaround of the mentioned problem, I decided to just search where the obfuscated function’s body is located in the file, starting from the function’s offset.

byte[] currentBody = methodBody.GetILAsByteArray();
if (currentBody.Length != decChunk.Length)
Console.WriteLine("Length mismatch: {0:X} {1}", metadataToken, mi.Name);
// offset where the method body starts (headers may have various sizes)
int bodyOffset = 0;
for (var i = offset; i < (offset + hdrSize + decChunk.Length); i++)
bool isOk = true;
for (var k = 0; k < decChunk.Length; k++)
if (fileBuf[i + k] != currentBody[k])
isOk = false;
if (isOk)
bodyOffset = i;
if (bodyOffset == 0)
Console.WriteLine("Function body not found: {0:X} {1}", metadataToken, mi.Name);
// apply the patch on the file buffer:
Buffer.BlockCopy(decChunk, 0, fileBuf, bodyOffset, decChunk.Length)

I dumped the patched file on the disk, and finally, the whole code decompiles!

Analysis of the decompiled application

I saved the decompiled dnSpy project, and it turns out, that after some trivial cleaning, it became possible to even compile it back to the binary. The sourcecode of my decompiled and cleaned version is available here:

Working on the code gives much more flexibility – allows to add logs, quickly rename the functions and variables, etc. So overall, the understanding of the whole logic is a lot easier.

One thing that was very helpful in the analysis, was noticing that the challenge is actually based on Saitama malware.

I’ve got Saitama Agent from Virus Total (79c7219ba38c5a1971a32b50e14d4a13).

Decompiling both applications, and comparing them side by side, allowed me very quickly to notice what parts are added by the challenge authors, and where the flag can be located. Additionally, in contrast to the FlareOn task, Saitama’s code is not obfuscated, and functions have meaningful names. So, following them, and renaming all the functions in the challenge to the same names as in Saitama, was an easy way to understand the whole functionality.

The main function of the Saitama Agent gives right away the hint that we are dealing with a state machine, and what functionality is it going to provide:

The same state machine, and analogous functions, we can find in the deobfuscated challenge executable:

There are already some writeups available detailing how Saitama’s state machine work, i.e. https://x-junior.github.io/malware%20analysis/2022/06/24/Apt34.html

Following the Saitama code, and renaming the matching functions, I produced the cleaned version of the challenge. It will be also helpful for further experiments and better understanding of inner workings of the app. The final version of the processed code (including modifications that are described further in this writeup), is given here:

How it works

Saitama is a RAT that executes various commands requested by the Command-and-Control (C2) server. The C2 communication is encoded as DNS requests/responses. Details about how they are encoded are described here and here.

The agent installed on the victim machine sends to the C2 some domain to be “resolved”. In reality the it is a keep alive token, showing that the agent is active and waiting for commands. Just like a normal DNS, the C2 responds with an IP address – however, those IPs are in reality commands, just wrapped in a custom format.

Our challenge works exactly the same – sends to the C2 requests to resolve generated domains, ending with flare-on.com, and then parse the response.

The function responsible for executing the requested tasks: https://github.com/hasherezade/flareon2022/blob/main/task8/FlareOn.Backdoor_dobfuscated_cleaned/FlareOn.Backdoor/TaskClass.cs#L199 .

As we can see, tasks are identified by their IDs, given as ASCII strings.

The task ID is retrieved from the DNS response. First, the length of the next response (that will carry the command) is be retrieved, in form of an IP. The IP addresses that carry the size must start with a chunk with a value >= 128. (See the code here).

Then, in the next IP, the command itself is passed. The first chunk of the IP address defines the command type, as given in the enum. We will be using command type 43 (Static), which means plaintext. Then, in the next chunks of the IP, follows the command ID in ASCII.

The output of the successfully executed command will be saved in a file named: flare.agent.recon.[unique_id]. Example:

Finding where the flag is decoded

By processing the code, it was also easy to notice where the authors added their custom code. In the function analogous to Saitama’s DoTask we can see some chunks being appended to an internal buffer on each command execution. Example:

bool flag27 = text == "17";
if (flag27)
	TaskClass.AppendFlagKeyChunk(int.Parse(text), "2e4");
	//$.(.p.i.n.g. .-.n. .1. .1.0...6.5...4.5...1.8. .|. .f.i.n.d.s.t.r. ./.i. .t.t.l.). .-.e.q. .$.n.u.l.l.;.$.(.p.i.n.g. .-.n. .1. .1.0...6.5...2.8...4.1. .|. .f.i.n.d.s.t.r. ./.i. .t.t.l.). .-.e.q. .$.n.u.l.l.;.$.(.p.i.n.g. .-.n. .1. .1.0...6.5...3.6...1.3. .|. .f.i.n.d.s.t.r. ./.i. .t.t.l.). .-.e.q. .$.n.u.l.l.;.$.(.p.i.n.g. .-.n. .1. .1.0...6.5...5.1...1.0. .|. .f.i.n.d.s.t.r. ./.i. .t.t.l.). .-.e.q. .$.n.u.l.l.
	TaskClass.CommandsAndMethods.AppendData(Encoding.ASCII.GetBytes(TaskClass.GetMethodNamesFromStack() + text));


We can see that on each chunk being appended to the buffer, some value from a hardcoded buffer Util.c is being removed:

// Token: 0x06000097 RID: 151 RVA: 0x00004C6C File Offset: 0x0000BC6C
public static void _AppendFlagKeyChunk(int i, string s)
	bool flag = Util.c.Count != 0 && Util.c[0] == (i ^ 248);
	if (flag)
		TaskClass.FlagSectionNameHash += s;
		Util.c.Remove(i ^ 248);
		TaskClass._someFlag = false;

// Token: 0x06000098 RID: 152 RVA: 0x00004CD0 File Offset: 0x0000BCD0
public static void AppendFlagKeyChunk(int i, string s)
		TaskClass._AppendFlagKeyChunk(i, s);
	catch (InvalidProgramException e)
		Util.flare_70(e, new object[]

Util.c is an observable collection, initialized with the following values:

Util.c = new ObservableCollection<int>

When the collection gets emptied, the following function is executed:

// Token: 0x06000095 RID: 149 RVA: 0x00004B94 File Offset: 0x0000BB94
public static void _DecodeAndSaveFlag()
	byte[] sectionContent = Util.FindSectionStartingWithHash(TaskClass.ReverseString(TaskClass.FlagSectionNameHash));
	byte[] hash = TaskClass.CommandsAndMethods.GetHashAndReset();
	byte[] flagContent = FLARE12.RC4(hash, sectionContent);
	string text = Path.GetTempFileName() + Encoding.UTF8.GetString(FLARE12.RC4(hash, new byte[]
	using (FileStream fileStream = new FileStream(text, FileMode.Create, FileAccess.Write, FileShare.Read))
		fileStream.Write(flagContent, 0, flagContent.Length);

This function drops and executes some file, and we can guess at this point that this is where the flag is located.

So, by analyzing the above function, we know that:

  • the flag is RC4 encrypted, and stored in one of the PE sections
  • this section’s name matches the beginning of the reversed string, that was made of the collected chunks
  • the chunks are collected when the command is executed, so, in order to get the proper string, we need to execute them in a proper order
  • we need to preserve the original callstack, because it will be used to generate the hash, that is used as the RC4 password – so, we should use the original, unpatched binary.

Finding and encoding the valid command sequence

Although in order to obtain the valid flag we need the original binary, still, the recompiled one will be very helpful for some experiments, testing assumptions, and figuring out the valid commands sequence.

My first assumption is that the elements in the observable collection Util.c have to be removed in the same order as they are defined, so, they will give us the answer to the question in which order the commands should be run. So, by looping over the full list, and XOR-ing each value with the value 248 (as in the function referenced as _AppendFlagKeyChunk) we obtain each command ID. Now we just have to encode those commands as IP addresses – as the Saitama communication protocol defines. This is the sequence works,the decoder that generates proper IPs sequence:

static void decodeIndexes()
byte[] indexes = {
List<string> resolved = new List<string>();
for (var i = 0; i < indexes.Length; i++)
var val = indexes[i] ^ 248;
//make IP
string str = val.ToString();
byte[] a = Encoding.ASCII.GetBytes(str);
string lenIP = String.Format("199.0.0.{0}", str.Length + 1);
string valIP = "";
if (str.Length > 1)
valIP = String.Format("43.{0}.{1}.0", a[0], a[1]);
valIP = String.Format("43.{0}.0.0", a[0]);
for (var i = 0; i < resolved.Count; i++)
//Console.WriteLine("DomainsList.Add(\"{0}\");", resolved[i]);
Console.WriteLine("{0}", resolved[i]);
static void Main(string[] args)

I obtained a list of domains, and modified the code of the recompiled crackme, in order to emulate the appropriate responses to the DNS requests.

The list:

    public static void initDomainsList()
        DomainsList = new List<string>();
        DomainsList.Add(""); // Init id -> 1


The modifications in the domain retrieving function, in order to fetch the domain from the list instead of making a DNS query:

    // Token: 0x06000045 RID: 69 RVA: 0x00003820 File Offset: 0x0000A820
    public static bool DnsQuery(out byte[] r)
        bool result = true;
        r = null;
            //IPHostEntry iphostEntry = Dns.Resolve(FLARE05.A);
            //r = iphostEntry.AddressList[0].GetAddressBytes();
            string domainStr = DomainsList[DomainIndex % DomainsList.Count];
            IPAddress ip = IPAddress.Parse(domainStr);
            r = ip.GetAddressBytes();

            Console.WriteLine("IP: {0}.{1}.{2}.{3}", r[0], r[1], r[2], r[3]);
            DnsClass._Try = 0;
            result = false;
        return result;

I also patched out some sleeps to speed up the execution, and added more logging. Then I run my recompiled application, to verify if this is really the correct sequence to reach the flag decoding function.

WARNING: mind the fact that before running the application, it is required to remove all the previous files generated by the challenge, such as flare.agent.id etc, otherwise they will distort the sequence.

And it works! So it is confirmed that the list of the IPs is valid. Also, the composed string leads to a section in the original PE, so the previous assumptions were correct:

Found section where the RC4 encrypted flag is located

Now all we have to do is to feed the sequence of the DNS responses to the original app.

Obtaining the flag

In order to obtain the flag, we will use the original application and feed into it the list of the resolved IPs.

At first I thought about using some fake DNS, but finally I decided to just make a hooking DLL (based on MS Detours) and inject it into the original app. This is my implementation:


My app assume that there is a simple fake DNS running, giving a dummy response for any queried IP. So, I am just replacing the content of this response with the IP from the list. The cleaner solution would be to construct the full fake response from scratch, and make it independent from a dummy response, but I had Apate DNS already running on my machine, and it was faster.

I injected the DLL into the executable using dll_injector:

And now we can watch the IPs queried, and just wait for the flag to be dropped…

At the same time we can see the domains being listed by ApateDNS, where they first reach:

After a while, this beautiful animated GIF is dropped to the TEMP, and popped out:

So, the task is solved!

Posted in CrackMe | Tagged , | 3 Comments

Flare-On 9 – Task 9

For those of you who don’t know, Flare-On is an annual “reverse engineering marathon” organized by Mandiant (formerly by FireEye). It runs for 6 weeks, and contains usually 10-12 tasks of increasing difficulty. This year I completed as 103 (solves board here). In this short series you will find my solutions of the tasks I enjoyed the most.

Time for some crypto challenge:

You can find the package here: 09_encryptor.7z , password: flare

After unpacking the archive we see:

It is a 64-bit PE – “a ransomware”, plus a file encrypted by it, that needs to be recovered. So, we have an emulation of the ransomware decryption scenario.

I used to crack ransomware in the past, and I still find this kind of cryptoanalysis tasks very enjoyable. As usually in such cases, two algorithms are used:

  1. symmetric, to encrypt a file (with a random key)
  2. asymmetric, to protect the generated random key

A flaw can be in one of the following:

  • how the random key is generated (was the strong random generator used?)
  • how the symmetric encryption is implemented (any implementation flaws making it weaker?)
  • how the asymmetric encryption is implemented
  • finally: are the algorithms applied correctly?

The task is written in C, and the code is pretty small, and focused on the main goal, so the analysis is easy.

The file is encrypted with ChaCha:

This version of ChaCha uses 32 byte key, and 12 byte nonce. The implementation of ChaCha seems correct. Also, for the generation of the key and nonce, a strong random generator is used (SystemFunction036). So at this point my guess is that the bug must be somewhere around the asymmetric algorithm.

After the file is encrypted, the buffer containing the key and nonce is encrypted with a private key from a newly generated keypair.

So, the 4 hex strings that we see at the end of the file suppose to contain the following elements:

{RSA master public key - the hardcoded master public key}
{RSA generated public key - the public key from the generated keypair}
{RSA generated private key, protected by the RSA master public key}
{ChaCha key and nonce, protected by the RSA generated private key}

If everything is correct there, we need the RSA master private key, in order to decrypt the RSA generated private key, in order to decrypt the ChaCha key and nonce… Let’s take a closer look if it really is this way.

A good cheatsheet describing all the RSA building blocks is available here.

Snippet describing the parts related to RSA implementation:

int __fastcall encrypt_file_content_and_save_keys(FILE *out_file, FILE *in_file)
__int64 v4; // rcx
_DWORD *v5; // rdi
__int128 *_key; // rdi
__int64 i; // rcx
_QWORD key_out_buf[17]; // [rsp+20h] [rbp+0h] BYREF
__int128 key[2]; // [rsp+A8h] [rbp+88h] BYREF
__int128 nonce[9]; // [rsp+C8h] [rbp+A8h] BYREF
v4 = 34i64;
v5 = key_out_buf;
while ( v4 )
*v5++ = 0;
_key = key;
for ( i = 34i64; i; –i )
*(_DWORD *)_key = 0;
_key = (__int128 *)((char *)_key + 4);
SystemFunction036(key, 32u);
SystemFunction036((char *)nonce + 4, 12u);
chacha_encrypt(out_file, in_file, key, nonce);
protect_by_assymetric_crypt(key_out_buf, key, RSA_d, RSA_n);
print_in_hex_to_file(out_file, RSA_master_public_key);
putc(10, out_file);
print_in_hex_to_file(out_file, RSA_n);
putc(10, out_file);
print_in_hex_to_file(out_file, RSA_protected_gen_priv_key);
putc(10, out_file);
print_in_hex_to_file(out_file, key_out_buf); // protected ChaCha key
return putc(10, out_file);
__int64 init_stuff()
__int64 rsa_p[17]; // [rsp+30h] [rbp-348h] BYREF
__int64 rsa_q[17]; // [rsp+B8h] [rbp-2C0h] BYREF
__int64 buf1_sub1[17]; // [rsp+140h] [rbp-238h] BYREF
__int64 buf2_sub1[17]; // [rsp+1C8h] [rbp-1B0h] BYREF
__int64 rsa_euler[17]; // [rsp+250h] [rbp-128h] BYREF
char RSA_generated_private[160]; // [rsp+2D8h] [rbp-A0h] BYREF
while ( !(unsigned int)is_prime((unsigned __int64 *)rsa_p) );
while ( !(unsigned int)is_prime((unsigned __int64 *)rsa_q) );
bignum_mul(RSA_n, (unsigned __int64 *)rsa_p, (unsigned __int64 *)rsa_q);
calc_sub1((unsigned __int64 *)buf1_sub1, (unsigned __int64 *)rsa_p);
calc_sub1((unsigned __int64 *)buf2_sub1, (unsigned __int64 *)rsa_q);
bignum_mul(rsa_euler, (unsigned __int64 *)buf1_sub1, (unsigned __int64 *)buf2_sub1);
calculate_d(RSA_d, RSA_d, rsa_euler);
return protect_by_assymetric_crypt(
view raw notes.cpp hosted with ❤ by GitHub

I made a small loader for the original app, and hooked the functions with detours (loader.cpp), in order to quickly log all their input and output parameters. At some point, I noticed something very suspicious: instead of the generated private key being provided to encrypt the generated ChaCha key, what was passed was the standard public exponent! So, in reality is is RSA signing.

To recover the “encrypted” content, all we have to do is to use the exponent 10001 as a private key.

For solving the final equation, I used the following online tool: https://www.boxentriq.com/code-breaking/rsa

By looking at the output we can see that it is in the correct format of key and nonce. However, we still need to reverse the bytes before using.

Now in order to decode the file content, we can just rename the file to “.EncryptMe” and we can set a breakpoint after the key and nonce are generated, to replace them in memory.

And we get the original content decrypted:


The flag is:

Posted in CrackMe | Tagged , | Leave a comment

Flare-On 9 – Task 10

For those of you who don’t know, Flare-On is an annual “reverse engineering marathon” organized by Mandiant (formerly by FireEye). It runs for 6 weeks, and contains usually 10-12 tasks of increasing difficulty. This year I completed as 103 (solves board here). In this short series you will find my solutions of the tasks I enjoyed the most.

Flare-On Task 10 was related to emulation of an old Macintosh machine, based on Motorola 68000 processor.

You can find the package here: 10_Nur_getraumt.7z , password: flare

We are provided with the disk image, containing the application that we need to reverse.

The first step was to prepare the emulator. As the author of the task suggested, I used Mini vMac. However, this solution doesn’t just work out of the box. We need to provide it a ROM image (more info here), that is not included on the website. Fortunately, after some googling around I found a github which owner was kind enough to make their ROM available, along with other utilities to be used for vMac:


After running the emulator with the ROM, it was necessary to install the OS. Fortunately, System 6.0.8 was provided on the vMac site (“SSW_6.0.8-1.4MB_Disk1of2.sea.bin” and “SSW_6.0.8-1.4MB_Disk2of2.sea.bin”). We just need to unpack it with the provided tool (ua608d) and then we can drag-and-drop on the running emulator Window. That’s how we get the working system.

Once the system is up and running, we need to also mount the disk with our challenge. We can do it also by drag-and-drop, but first we need to rename the file to ANSI (I used “chall.img”). And it works! We can see the original compiled application, that is the challenge, and also, the Res Edit by which we can view particular elements of the challenge.

We can see that one of the resources contains our encrypted flag:

As the description suggests, the flag should be viewed in hex. Fortunately, the Reg Edit provides this option in the menu.

This is how the flag looks when viewed in hex:

The task description hints us that the challenge is going to be somehow related with the music of those times. And indeed, the name of the resource points to the song “99 Lufrbaloons” of a German singer Nena.

Walking through various elements displayed in Reg Edit, we can also see the code of the application. It is displayed in a built-in disassembler.

The function is pretty short, and it can be reimplemented knowing some 68000 assembler basics (i.e. following this manual). We can see an EOR (Exclusive OR logical) instruction, which is an equivalent of XOR. At this point we can guess that the flag may be obfuscated with a XOR-based algorithm.

But before jumping to implementation, I wanted some less error prone way to understand this unfamiliar code. And it turns out that it was easier to achieve than I expected. It turns out that Ghidra provides built-in disassembler for this architecture.

But first I needed to carve out the application from the whole image. I installed a hexeditor on the emulator:

… and checked how the program starts.

Then I opened the whole image in a hexeditor on my host machine, and searched for those patterns. Carved out the whole app, and opened it in Ghidra.

We need to go to the beginning of our decoding function, and make Ghidra disassemble it:

And great, we see the same code as we could preview in Res Edit, so it means everything is ok. Plus, there is another view, with this code decompiled.

This is how the decompiled function looks – much more clear, isn’t it?

We can see that the first byte of the string is it’s size. The last WORD, after the characters buffer, is a CRC16 of it.

I reimplemented the whole algorithm in C, to test my assumptions (snippet here). All good… so it turns out to be a simple XOR with the supplied key!

As we know, the flag will end with @flare-on.com – so this is what we need to use to XOR the ending of the provided encrypted string.

This is a part of the first line from the lyrics of Nena’s song “99 Luftbaloons”!.

Hast du etwas Zeit für mich?
Dann singe ich ein Lied für dich

Nena – “99 Luftballons”

If we write the full line, we get:

Which is the second line of the same song. We can fill the missing characters in, and submit the flag.

And that’s all for the task 10!

BTW, the title of the challenge (“Nur geträumt” – “Just a dream”) is also a reference to a Nena’s song.

Posted in CrackMe | Tagged , | Leave a comment

Ida tips: how to use a custom structure

Applying custom structures make the result of decompilation much more readable.

This is how the same fragment of the code looks before and after proper structures being applied:



In this short post, I will demonstrate how to add custom structure definitions into IDA, on the example of a PE structure.

Creating the structure

My definition of PE file structure is available here.

Note, that some of the data types that we would normally use when we write a C/C++ code on Windows, are not available in IDA. And other types may be defined a bit differently. For example, the types such as WORD and DWORD from windows.h are defined in IDA, but with a “_” prefix. For example:

 _WORD e_res2[10];
 _DWORD e_lfanew;

Adding the structure into IDA

With the help of the following steps, we can add the custom structure into IDA.

1 – First we need to open the subview “local types” where all such definitions are stored:

2 – We click on “Insert…”

3 – The window for the new definition opens. We can paste there our custom structure.

4 – After we pasted and clicked OK, the new types should appear on the list.

Using the custom structures

Now our custom structures are ready to be used!

Whenever we find a variable that has the that type, we can convert it to our custom structure. For example:

1 – Select the variable that you want to convert:

2 – Select the structure from the list:

Sometimes you may need to manually refresh the decompiler view, by pressing F5.

And it’s ready!

Note, that although PE header was used here as an example, some of the common structures (including this one) are already predefined in IDA, and can be referenced by their names.

Posted in Tutorial | Tagged | Leave a comment

Python scripting for WinDbg: a quick introduction to PyKd

PyKd is a plugin for WinDbg allowing to deploy Python scripts . It can be very helpful i.e. for tracing and deobfuscation of obfuscated code. In this small tutorial I will demonstrate how to install it and make everything work.


Download and install the PyKd.dll

I assume that we already have a WinDbg installed. First we need to download PyKd DLL. Ready made builds are available in the project’s repository:


The package contains two versions of the DLL: 32 and 64 bit. We need to use the version appropriate to the bitness of our WinDbg installation (i assume 64 bit).

First we create a directory where we will store plugins for WinDbg. For example: “C:\windbg_ext”. We drop there the pykd.dll.

Then we need to set the path to this directory in and environment variable (_NT_DEBUGGER_EXTENSION_PATH) , so that WinDbg can find it.

Install Python and pykd Python library

We need to have a Python installed, as well as Pip. I have chosen the latest Python installer from the official page.

Now let’s install Pip. The detailed guide how to do it is presented here. I have chosen to download the script get-pip.py, and run it by previously installed Python. The installed pip (example):

The next step is to install the pykd Python library via Pip (from command prompt):

pip install pykd

Testing PyKd

If all the above steps succeeded, our PyKd is ready to be deployed. In order to test it, we will run WinDbg, and attach to some process (i.e. notepad).

First, let’s load the PyKd extension:

.load pykd

If it is loaded, we can see its commands by using help:


If we have multiple versions of Python installed, the latest one will be set as default, but yet it is possible to switch between them.

Once the PyKd extension for WinDbg (PyKd.dll) is loaded, we can run the python command prompt and check if the PyKd library for Python is available. We run the prompt by:


Now we can issue:

import pykd

And test by issuing some WinDbg command via PyKd:

print(pykd.dbgCommand("<any WinDbg command>")


The results of the command are printed with the help of Python print. After the text we can exit console by issuing:


Running scripts

If we get the results as above, everything is installed and ready. Now, instead of running the python commands from the WinDbg command prompt, we can save them as a script: test.py, and run by giving the path to the script. Example:

!py C:\pykd_scripts\test.py

We can also pass arguments to our script. Demo given below.

Content of the “test.py”:

import pykd
import sys

for i in range(1, len(sys.argv)):
    print('arg[', i, '] = ', sys.argv[i])



Posted in Tools, Tutorial | 6 Comments

Flare-On 8 – Task 6

Flare-On is an annual “reverse engineering marathon” organized by Mandiant (formerly by FireEye). You can see more information here. It is a Capture-The-Flag type of a contest, where you are given a set of crackmes with growing difficulity. This year we were provided with 10 tasks. I finished as 125. In this series of writeups I will present my solutions to the selected challenges, and guide you through the task, all the way till the final flag.

The description of the challenge 6:

Download: 06_PetTheKitty.7z (password: flare)

In this task we are given a PCAP file.

I opened it in a Wireshark and followed the TCP steams.

There are two streams, first of them consists of a request, followed by a longer response, containing a PNG:

Another contains many shorter packets, requests and responses:

We can see the keyword “ME0W”, but also “PA30” repeating…

PA30 is a patch format, introduced by Windows Vista, and called Intra-Package Delta (IPD). More information about it we can find in the following blog. We will find there also a python script delta_patch.py that can be used for applying the patches.

First I extracted the components from the first stream. As we saw at the first sight, the response contains a PNG. At the end of the PNG we can see an ASCII art:

A PA30 patch follows after.

In order to separate them correctly, we need to understand the headers of the “ME0W” packet:

4d 45 30 57  d0 24 0a 00  d0 24 0a 00 | ME0W .$.. .$..

The header contains the magic number “ME0W” followed by two DWORDs, denoting the size of the data repeated twice, and then the data buffer.

After extracting the data buffers, we get two elements listed below (along with their MD5 hashes):

2c691262493ceaaa5de974adab36ed69  cat.png
440c49962f81e3d828ddcc3354c879c9  patch.p30

The PNG:

The image looks valid and looks very innocent, but after applying the patch it will change completely…

I guessed that the patch from this stream must be used along with the given PNG. I applied it with the help of the following command:

delta_patch.py -i cat.png -o out.bin patch.p30

The output turned out to be a DLL:

By looking closer at the code we realize that this is the “malware” responsible for generating the further communication. It connects to the URL that was referenced in the PCAP:

In order to understand how to decode the rest of the PCAP, we need to check how the the received data is processed. The relevant fragment of the code:

It turns out to be fairly simple. First the data is decoded by being applied as a patch on an empty buffer. Then, the output is XORed with a hardcoded key “meoow”.

Applying of the patch is done by the same function as was used before (to decode the DLL from the picture) – ApplyDeltaB:

Now we can decrypt the rest of the traffic following this pattern. First we need to apply the patch on a buffer filled with 0s, and then XOR the output with the key.

We can see the decrypted traffic contains some exfiltrated data from a victim machine. Among this data there is a listing containing the flag:


Posted in Uncategorized | Tagged , , | 1 Comment

Flare-On 8 – Task 7

Flare-On is an annual “reverse engineering marathon” organized by Mandiant (formerly by FireEye). You can see more information here. It is a Capture-The-Flag type of a contest, where you are given a set of crackmes with growing difficulity. This year we were provided with 10 tasks. I finished as 125. In this series of writeups I will present my solutions to the selected challenges, and guide you through the task, all the way till the final flag.

The task 7 comes with the following intro:

Download: 07_spel.7z (password: flare)

The attached file is a Windows executable, 64-bit.

When we run the application, the following window pops up:

At the beginning I wasn’t sure if the task runs correctly on my system. But I decided to trace it with Tiny Tracer to see what happens.

Watching the tracelog in real-time with the help of Baretail, I noticed that when I closed the window, something got unpacked it the memory and executed. Relevant fragment of the log:

	Arg[0] = ptr 0x00007ffa8a8f0000
	Arg[1] = ptr 0x00007ff7094dc0b0 -> "VirtualAllocExNuma"

17972f;called: ?? [1747b490000+0]
> 1747b490000+1cd;ntdll.LdrLoadDll
> 1747b490000+1f2;ntdll.LdrGetProcedureAddress
> 1747b490000+218;ntdll.LdrGetProcedureAddress
> 1747b490000+23d;ntdll.LdrGetProcedureAddress
> 1747b490000+263;ntdll.LdrGetProcedureAddress
> 1747b490000+289;ntdll.LdrGetProcedureAddress
> 1747b490000+2ae;ntdll.LdrGetProcedureAddress
> 1747b490000+2d4;ntdll.LdrGetProcedureAddress
> 1747b490000+377;kernel32.GetNativeSystemInfo
> 1747b490000+3c0;kernel32.VirtualAlloc
> 1747b490000+648;kernel32.LoadLibraryA
	Arg[0] = ptr 0x00000001800152b6 -> "KERNEL32.dll"

> 1747b490000+6ad;ntdll.LdrGetProcedureAddress
> 1747b490000+6ad;ntdll.LdrGetProcedureAddress
> 1747b490000+6ad;ntdll.LdrGetProcedureAddress
> 1747b490000+6ad;ntdll.LdrGetProcedureAddress
> 1747b490000+6ad;ntdll.LdrGetProcedureAddress
> 1747b490000+6ad;ntdll.LdrGetProcedureAddress

We can see that it uses a function VirtualAllocExNuma to allocate memory:


Then, something is loaded into this memory and executed (the entry point at offset 0 suggests that it is a shellcode, not a PE):

17972f;called: ?? [1747b490000+0]

Next, we can see the functions executed from inside of the shellcode (prepended with “>“):

> 1747b490000+1cd;ntdll.LdrLoadDll
> 1747b490000+1f2;ntdll.LdrGetProcedureAddress
> 1747b490000+218;ntdll.LdrGetProcedureAddress

We can see that it loads multiple imports (using LdrGetProcedureAddress). This suggests that this shellcode is yet another loader (possibly for a PE payload).


The previous experiment showed that the executable is packed. So, I decided to unpack it with the help of mal_unpack (one of the tools from PE-sieve family). Since manual closing of the window is required in order to trigger payload unpacking, I run mal_unpack with the following commandline (infinite timeout):

mal_unpack.exe /timeout 0 /exe spel.exe

And then I closed the window.

Some DLLs got dumped.

Shellcode, as well as one of the DLLs seems to be nothing but the next stage loaders.

However, I noticed among them an interesting DLL with one function exported:

Unfortunately, the relocation table of this DLL was removed:

Data Directory view shows that the Relocation Table is cut out

Due to this fact, it could not be used as a standalone DLL.

Manual reconstruction of a relocation table is difficult, and sometimes even impossible. But I got an idea that maybe I can still find a raw copy of this DLL, with the relocation table intact. So I scanned it again, this time with an option /data 3 to dump also PEs from non-executable memory.

mal_unpack.exe /timeout 0 /exe spel.exe /data 3

This time more DLLs were dumped.

One of them was indeed a raw copy of the DLL I was looking for – this time with a valid relocation table.

Now, all I needed to do was to remove padding of the dumped file. I did it with PE-bear:

PE-bear: removing the padding at the end of the dumped DLL

And the DLL is ready to be run… I just renamed it to its original name ldr.dll.

Tracing the DLL and writing a loader

I decided to trace the found DLL with a TinyTracer. The DLL exports a function Start so I suspected this will be the function that should be called.

I set it in Tiny Tracer:

set DLL_EXPORTS="Start"

Then I executed tracing the DLL by Tiny Tracer.

Reading the trace log, I noticed the DLL tries to load some resource. The resource is supposed to be fetched from the main application. I added to the TinyTracer tracking of related parameters, and I saw what exactly is being loaded. It is a PNG (full trace log available here).

	Arg[0] = ptr 0x00007ff72e9e0000
	Arg[1] = 0x0000000000000080 = 128
	Arg[2] = ptr 0x0000005628ebf5e4 -> "PNG"

The relevant PNG is in the resources of the main application:

Interestingly, PE-bear fails to display it. It turns out other tools have the same problem. The content of the PNG is just invalid. I suspected that it will contain some encrypted data, possibly the flag.

The content of the PNG: possibly an encrypted buffer

Now we know that this PNG needs to be passed to the DLL. In order to do so, saved the resources by PE-bear. The aforementioned PNG is in the file named: _1_429cc0.png.

Then, I created my own loader, that includes this PNG as a resource with identical name as the DLL requires. The code of the loader is available here. Now we can trace the execution of the ldr.dll via the prepared wrapper. We just need to change the traced module in the TinyTracer’s run_me.bat (as described here).

set TRACED_MODULE="ldr.dll"

The other thing that we can notice in the trace log is a SleepEx function. I also watched its parameter in Tiny Tracer:

	Arg[0] = 0x0000000000057e40 = 360000
	Arg[1] = 0

The sleep time turns out pretty long: 6 minutes. Fortunately we can overwrite it in TinyTracer (more info here).

Static analysis

I opened the DLL in IDA in order to analyze it statically. Overview of the Start function:

The decompiled code – final result of my analysis – is available here.

Used obfuscation

Most of the API functions are resolved by hashes, so the TAG file generated by TinyTracer came handy. I just applied tags on the IDA view (using IFL plugin), and the code became much more understandable. Example:

NOTE: this way of resolving API calls have some limitations: since the tags are generated during tracing, only the calls that were actually executed will be resolved. So, still we are left with some hashes that are not mapped. Fortunately, a quick google lookup shows that the hashing algorithm is well known, and there are already lists of common API functions with their corresponding hashes. This helped to find some more functions.

Not only the API calls are obfuscated, but also strings. Each used string is deobfuscated just before use, with the help of an inline XOR loop. Example:

Since the application doesn’t use many strings, I decided not to write any automatic solutions, but to resolve them manually under the debugger as I progress with the analysis.

Examining the checked conditions

There are some condition that the DLL checks, for example, the executable must be named Spell.EXE – so I renamed my loader to this name.

After renaming my loader (and enabling sleep hooking in Tiny Tracer, as it was shown before), I traced it again. The produced log is available here. This time we can see something interesting: the application is trying to connect to the socket:

	Arg[0] = 0x0000000000057e40 = 360000
	Arg[1] = 0

	NtDelayExecution hooked. Overwriting DelayInterval: ffffffff296c5c00 -> fffffffffffe7960

	Arg[0] = ptr 0x000000c094952400 -> "ws2_32.dll"

	Arg[0] = ptr 0x000000c0949522e0 -> "user32.dll"


I added tracking of the gethostbyname parameters, and I saw the address it is trying to connect to:

	Arg[0] = ptr 0x0000007a6ef5f9b0 -> "inactive.flare-on.com"

After checking more details under the debugger, I found out that it queries one of the two addresses: invalid.flare-on.com and invalid2.flare-on.com , trying to connect to the port 888. None of those addresses is active, so we have to somehow emulate this communication.

Once it connects to the C2, it sends a beacon “@” and is waiting for a command.

There are 3 commands available: “exe”, “run”, “flare.com”.

First two commands are used for running some received shellcode, or a PE file. Third of them leads to a function that seems to decrypt something…

Emulating the C2

One of the possible ways of emulating the communication, is to start a server locally, for example using netcat.

netcat -l -p 888

Then we can redirect the domain to it by editing the following file:


We need to create the entry that will cause the the domain to be resolved as our localhost: inactive.flare-on.com

Running the binary again, we can see that indeed the crackme connects to our emulated C2, and sends the expected prompt:

Running the commands

As mentioned earlier, the third command (“flare.com”) looks interesting, because it leads to some decryption. We can run the prepared loader again, via TinyTracer, and watch the APIs called during the communication with the fake C2. I let it connect, then set the command “flare-on.com”, at the same time observing the trace log in real-time and checking what happens.

First, the BCrypt library is loaded, and it is used to decrypt some buffer. Relevant fragment:

	Arg[0] = ptr 0x000000726ce82180 -> "bcrypt.dll"


After that, some registry keys are set, and finally the function exits (execution goes back to the loader):


The full trace-log from this session is available here.

After adding the BCrypt functions to watched, and tracing again, we get some additional information:

	Arg[0] = ptr 0x0000001ec8de0b90 -> {...}
	Arg[1] = ptr 0x0000001ec8cbfc24 -> L"ChainingMode"
	Arg[2] = ptr 0x0000001ec8cbfc44 -> L"ChainingModeCBC"
	Arg[3] = 0x0000000000000020 = 32
	Arg[4] = 0x0000001e00000000 = 128849018880

	Arg[0] = ptr 0x0000001ec8de0b90 -> {...}
	Arg[1] = ptr 0x0000001ec8cbfc78 -> {...}
	Arg[2] = ptr 0x0000001ec8de5f00 -> {...}
	Arg[3] = 0x000000000000028e = 654
	Arg[4] = ptr 0x0000001edb9c0000 -> "d41d8cd98f00b204e9800998ecf8427e"
	Arg[5] = 0x0000000000000020 = 32
	Arg[6] = 0

	Arg[0] = ptr 0x0000001ec8de5f00 -> L" "
	Arg[1] = ptr 0x00007ff6ebae610f -> {\xd7\xfb~b\x8d\xab\x87e\xcdq\x85\xceS\x0fZ\x8c-\x8aE7\x12Ky\x1d@\xdav\x86&\xd3\xd3r}
	Arg[2] = 0x0000000000000020 = 32
	Arg[3] = 0
	Arg[4] = ptr 0x0000001ec8cbfca8 -> {...}
	Arg[5] = 0x0000000000000010 = 16
	Arg[6] = ptr 0x0000001ec8cbfc88 -> {\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00}
	Arg[7] = 0x0000000000000020 = 32
	Arg[8] = ptr 0x0000001ec8cbfc80 -> {...}
	Arg[9] = 0x0000001e00000000 = 128849018880

We can spot that the content of the PNG file gets decrypted. Buffer:


Is the same as the content of the previously reviewed PNG:

The used algorithm is AES in CBC mode, with the key generated from the string: “d41d8cd98f00b204e9800998ecf8427e”.

If we follow those functions under the debugger, we can see the aforementioned decryption:

Before decryption

…and the string that we got as the result of it:

After decryption

Later, this buffer is rewritten, with the suffix “flare-on.com” (typical for the flag) appended:

The string didn’t make much sense, but at least it is ASCII, so I thought it may be a flag. I tried to submit it, however, it turned out invalid. So I had to dig deeper.

I noticed this string is being XORed, scrambled, and the result is written into Windows Registry:

The function responsible for scrambling:

I decided to clear the buffers that are used for the XOR operations. The buffers:

C3 C1 A8 06 C2 96 33 00 00 00 00 00 00 00 00 00 8A 1D 89 15 14 9F C1 1D 99 7E 8A 1B 00 00 00 00

E2 A4 B7 A7 D7 AC 87 8D 9B 9C 85 0D D8 8E E5 FA

…were set to all 0s under the debugger:

As the result the valid flag was saved in the registry:


Best spell checker ever… This time the flag makes sense, moreover, it passes the verification!

Posted in CrackMe | Tagged , , | Leave a comment

Flare-On 8 – Task 9

Flare-On is an annual “reverse engineering marathon” organized by Mandiant (formerly by FireEye). You can see more information here. It is a Capture-The-Flag type of a contest, where you are given a set of crackmes with growing difficulity. This year we were provided with 10 tasks. I finished as 125. In this series of writeups I will present my solutions to the selected challenges, and guide you through the task, all the way till the final flag.

The 9-th is named “evil”, and the description says:

Download: 09_evil.7z (password: flare)

As mentioned, it comes with several false flags, so we need to watch out!

It is a Windows executable, 32-bit.

Overview and understanding the goal

Running the task doesn’t give us much information, because no output is displayed.

Opening it in IDA shows that the code is obfuscated: we can see some invalid chunks in between of code:

Due to this we IDA can neither decompile it, nor create graphs.

If we load it under x64dbg, we can see that the application keeps throwing exceptions:

We can step through them, and finally it reaches a far return:

Far returns are often used in Heaven’s Gate technique. However, here it is not the case, and the presence of it doesn’t make much sense. So it indicates that probably the debugger was detected and we went into a wrong execution path.

We can try once again, by setting x64dbg to ignore the exceptions:

Now, the debugger won’t stop at the exceptions, but it doesn’t help much: the application will soon terminate.

The next thing I did was tracing it with TinyTracer. Some trace is being produced, but again it breaks at the invalid far return:

It happens at the same RVA as the debugger show before: 0x2F14. Once again in x64dbg, we can see the path that leaded to that invalid instruction:

Patching (#1)

A simple patch can help avoid going this way: NOPing out the conditional jump:


RVA: 2fb5 -> NOP

Tracing the patched application

The above patch finally caused the trace to go much further.

Yet, it is worth to note that not all my attempts of tracing gave the same results: in some it was clear the application terminates immaturely. So, it made me guess that the defensive checks are somehow randomized. This was later confirmed with a static analysis, and will be described further in this blog.

Not seeing that the application reads any input I tried to trace it with some commandline argument (I used “Test123”). This turned out to be a good idea, as we could observe on the trace that the execution goes further. I obtained the following log: log1.tag.

The application terminates soon, yet, towards the end of the log, we can see some interesting calls, related to socket creation:


Seeing it, I suspected that opening of the socket has failed. I traced it again, but this time with tracking parameters of those functions.

Relevant fragments of the trace show that the commandline argument was used as a socket address:

	Arg[0] = ptr 0x00755000 -> "Test123"

Then, by checking the arguments passed to the function socket, we can see that the created socket is of the type raw, and dedicated to UDP communication:

	Arg[0] = 0x00000002 = 2 // AF_INET
	Arg[1] = 0x00000003 = 3 // SOCK_RAW
	Arg[2] = 0x00000011 = 17 // IPPROTO_UDP

Since the application will be opening a raw socket, need to be run as an Administrator.

I changed the commandline argument to “”, and traced it again, this time as an Administrator. The following alert shows up:

This time the application runs further. In the log we can see the calls to other functions related to the socket:


Fragments of the trace with added parameters tracking:

	Arg[0] = ptr 0x00b233a8 -> ""

	Arg[0] = 0x00000002 = 2
	Arg[1] = 0x00000003 = 3
	Arg[2] = 0x00000011 = 17

	Arg[0] = 0x0000028c = 652
	Arg[1] = ptr 0x008bf9b4
	Arg[2] = 0x00000010 = 16

	Arg[0] = 0x0000028c = 652
	Arg[1] = 0x98000001 = 2550136833
	Arg[2] = ptr 0x008bf9dc
	Arg[3] = 0x00000004 = 4

	Arg[0] = 0x0000028c = 652
	Arg[1] = 0x0000ffff = 65535
	Arg[2] = 0x00001006 = 4102
	Arg[3] = ptr 0x008bf9c8
	Arg[4] = 0x00000004 = 4

	Arg[0] = 0x00000002 = 2
	Arg[1] = 0x00000003 = 3
	Arg[2] = 0x00000011 = 17

	Arg[0] = 0x00000290 = 656
	Arg[1] = 0
	Arg[2] = 0x00000002 = 2
	Arg[3] = ptr 0x008bf9dc
	Arg[4] = 0x00000004 = 4

	Arg[0] = 0x0000028c = 652
	Arg[1] = ptr 0x00b753f0
	Arg[2] = 0x000005dc = 1500
	Arg[3] = 0

The other important things is, the socket expects a buffer of maximal length 1500 bytes:

	Arg[0] = 0x0000028c = 652
	Arg[1] = ptr 0x00b753f0 // buffer pointer
	Arg[2] = 0x000005dc = 1500 // buffer length
	Arg[3] = 0

At this point we can suspect that this buffer is the input of our crackme that will take part in obtaining the flag. For communicating with the socket, we can use nping. Example:

nping --udp -p 1234 --dest-ip -c 1 --data [test_data:in hex]

But understanding what exactly should be filled into the sent buffer requires some code deobfuscation…

Self-modifying code

I decided to run the crackme again (as an Administrator, with the argument “”), and scan it with PE-sieve/HollowsHunter.


hollows_hunter.exe /pname evil.exe /hooks /imp A

Dumped material:

It turns out that the dumped executable contains a lot of in-memory patches. Basically, the application patches itself as it goes.

Dumping it with the option /imp A gave a sample with a recreated Import Table. This can make a static analysis a bit easier, as (at least some) of the dynamic calls are now replaced with static imports. The other calls, that could not be deobfuscated this way, can be added to IDA by loading the trace log (.tag) via IFL plugin.

The Import Table recreated by PE-sieve

Hooked functions

In advapi32.dll

The dumped material also shows us that advapi32.dll has been hooked. The hook is at the beginning of the function CryptImportKey and it redirects to the crackme. The relevant TAG file (from the dump):


Looking at the hook target in IDA we can see the following trampoline function:

Its role is very simple: if the CryptImportKey was called with the parameter CALG_SEAL it will be changed to CALG_RC4. It suggests that the crackme is gonna use RC4 function to decrypt something (possibly the flag).

In ntdll.dll

There are also patches in ntdll.dll. The relevant TAG file:


The first patch disables the function DbgBreakPoint (a function that breaks into the kernel debugger):

The other patch is set at the beginning of the function DbgUiRemoteBreakin – a function used by a debugger to break into a process. Due to the patch, calling this function causes immediate process termination (function TerminateProcess).

Both of those patches are part of the defensive techniques of the crackme.

Flow modified by exceptions

If we apply the tracelog on the crackme, we can clearly see the points in the code where each exception has been thrown. Such points are represented as calls to the Exception Dispatcher (ntdll.KiUserExceptionDispatcher).

Exception: attempt to read a NULL pointer – view from original binary

The log also shows that soon after an exception, some API call has occurred: but in the original executable this part of code is invalid. By this observation we can assume, that the exception handler somehow overwritten the invalid bytes, and caused the API call instead.

When we apply the same tracelog, but on the dumped version of the binary, we can see how exactly the written patch looks like. Now, only one invalid byte is left, and the rest of them has been replaced with CALL EAX:

View from the dumped binary

The full code of the application is sprinkled with various instructions like this, which intentionally cause exceptions.

If we look again into the trace log, we can see that at the beginning of the execution the VEH is being registered. So, when the aforementioned exception is thrown, it is handled by VEH (Vectored Exception Handler). Let’s have a look in IDA:

The function added as a handler:

The exception handler responsible for patching the code

The exception handler fetches values of the registers (ECX, EDX) from the exception context. It passes them to the function that is responsible for resolving address of the API to be called (fetch_by_hash). The obtained address is then stored into EAX of the exception context. After that, we can see the code patching. First, the memory protection at the point where exception was thrown, is set to writable. Then, at EIP + 3 (3 bytes after the point of the exception) the patch is being made: CALL EAX is written. As we know, the EAX contains now the address of the API, so this is what will be called here. The EIP of the exception is set to point to this line, so this will be the next instruction after the exception handler finishes.

Aligning the instructions

The instructions generating the exception (i.e. div eax) are 2 bytes long, while the patch is created with 3 bytes offset. Due to this fact, between the instruction causing the exception, and the newly written CALL EAX there is a trash byte.

Trash byte between the line causing the exception, and the written call

This trash byte destroys the alignment of the instructions, and causes problems to IDA in interpreting the code that follows after (by default it is interpreted as data, and we need to change it manually each time).

In order to fix the alignment, I decided to patch the handler, and make it write aligned instructions. However, the space in the code was too small for making appropriate assembly modifications. So I decided to rewrite the full exception handler, and then hook the function AddVectoredExceptionHandler so that it will set my own version instead of the original one. For hooking I used MS Detours (with my template), but any sort of hooking engine will do the job.

The snippet below shows the modified handler:

LONG __cdecl my_patch_some_code(struct _EXCEPTION_POINTERS *ExceptionInfo)
    struct _EXCEPTION_POINTERS *except_ptr; // esi
    PCONTEXT v2; // eax
    int edx_val; // edi
    int ecx_val; // ebx
    DWORD new_eax; // edi

    except_ptr = ExceptionInfo;
    v2 = ExceptionInfo->ContextRecord;
    edx_val = v2->Edx;
    ecx_val = v2->Ecx;

    new_eax = resolve_func(edx_val, ecx_val);
    if (!new_eax) {
        return 0;

    VirtualProtect((LPVOID)(except_ptr->ContextRecord->Eip-2), 0x1000u, 0x40u, (PDWORD)&ExceptionInfo);
    except_ptr->ContextRecord->Eax = (DWORD)new_eax;

    *(WORD *)(except_ptr->ContextRecord->Eip + 2) = 0x9090;// NOPs
    *(WORD *)(except_ptr->ContextRecord->Eip + 3) = 0xD0FF;// CALL EAX

    except_ptr->ContextRecord->Eip += 3;
    VirtualProtect((LPVOID)(except_ptr->ContextRecord->Eip-2), 0x1000u, (DWORD)ExceptionInfo, (PDWORD)&ExceptionInfo);
    return -1;

As we can see in above code, I replicated the original handler with just one difference: added a NOP instruction before CALL EAX. This will be enough to achieve the main goal: aligning the code. But I decided to still improve it a bit…

The instructions that cause exceptions to be thrown are diversified. Sometimes we can see it is an attempt to read from a NULL address, sometimes a division by 0, and so on. It will be a bit cleaner if we can replace them with only one type: for example by the “read from the NULL address”. So I modified my hook so that it will also replace this part:

// change all exception to follow the same pattern:
if (*(WORD *)(except_ptr->ContextRecord->Eip) != 0x008B) {
  *(WORD *)(except_ptr->ContextRecord->Eip - 2) = 0xC033;// mov  eax, [eax]
  *(WORD *)(except_ptr->ContextRecord->Eip) = 0x008B;// mov  eax, [eax]

The code of the full DLL patching the crackme is available here.

It can be injected into the crackme with the help of dll_injector:

The above example shows the most classic way of hooking. Yet, at the time when I was solving this task, I wanted to do multiple experiments and many quick changes in the hooks. So, instead of running the evil.exe in a separate process, and hooking it by injecting a DLL, I wanted something faster: all-in-one loader. The code is available here. This loader requires that first we convert the evil.exe into a DLL, by EXE_to_DLL. Then, we just load this DLL within the current process, which hooks itself.

Now, the new handler will produce properly aligned instructions: the trash byte has been replaced with a NOP.

However, we need to keep in mind that it modifies the code only as it goes: it will patch only the branches that have been executed. So, the others are still not cleaned. Yet, it is enough to get a decent overview of the code, and the few branches that haven’t been taken can be cleaned later by manual patching (or by an IDA script). Also, by sending various data to the socket, we can cause more branches to be taken, so that more code will be cleaned.

After running the crackme for a while, with the hooked handler, we can dump it again from the memory by PE-sieve, to get the modified version.

Now IDA has no problem with interpreting the modified part of the code:

The dumped version of the app, with the TAGs from the Pin tracing session applied

Understanding the decompiled code

If we managed to get rid of all trash instructions in a certain function, it becomes possible to decompile the code. This makes analysis a lot easier.

We know that the application uses a raw socket, so the buffer that is received by recvfrom contains IPv4 headers, as well as UDP headers (not stripped). Filling those structures in IDA can make interpretation a lot easier.

struct ip_v4
_BYTE ver_and_IHL;
_WORD total_len;
_WORD fo_and_flags; // flags : 3 , fragment offset: 13
_BYTE ttl;
_BYTE protocol;
_WORD checksum;
_DWORD source_addr;
_DWORD dst_addr;

struct udp_hdr
_WORD source_port;
_WORD dst_port;
_WORD len;
_WORD checksum;

We can see that the port in the UDP header must be set to a certain value: 0x1104 (4356).

The WORD in IPv4 header that contains bitfields: flags and fragment offset is checked by AND with 0x80. It means the “reserved” flag must be set:

NOTE: The “reserved” flag is also called “an evil bit” (read more here) – so this is probably the origin of this task’s name.

Only if those conditions are fulfilled, the received data will be processed further.

Then, the received data from the packet is rewritten to another, custom structure.

The received data is being copied

My reconstruction of this structure is given below:

struct stored_packet_data
  _DWORD source_addr;
  _DWORD dst_addr;
  _WORD source_port;
  _BYTE *data_buf_ptr;
  _WORD data_len;

Decompiled and cleaned code of the receiving function is available here.

The receiving function does nothing but the initial checks of the data, and the filling of this structure. But there is another function, running in a separate thread, that reads this filled buffer and verifies it further (I denoted it as to_some_rc4):

Those two threads are run with the same buffer as an input argument

By analyzing the second function, we can see that the first value of the data buffer must be either 1, 2, or 3, or other (>3). It will be used as a command to be executed:

We can further see some CRC32 calculating function, and some decrypting. So, this must be the exact function to analyze in order to obtain the flag.

The decompiled code of the thread processing the buffer is available here.

Patching out the defensive checks

At this point I decided that it will be the most convenient to follow the flow by dynamic analysis. But as we saw, the crackme is loaded with various defensive checks that doesn’t let it run under the debugger. So, in order to continue, they must be patched out.

Earlier I already patched out one of the defensive checks (the one causing the far jump). It required nothing but NOPing a single conditional jump. But to remove the rest of them will be much more difficult.

First, the checks are initialized.

The same function is responsible for patching NTDLL:

Functions responsible for various defensive checks are added into the map:

Only one of those checks will be deployed: it is selected randomly, basing on the current time. This explains non-deterministic behavior during the tracing.

Unfortunately, we cannot simply NOP the call to this function, because that would cause crashes later. The map of the checks is used in multiple places, and it cannot be empty.

So, instead of trying to remove it, I decided to neutralize it in a less invasive way. As we saw, there are various functions with checks added to the map, with various IDs. Those functions vary in the complexity. The simplest of them seemed to be the one that just calls CheckRemoteDebuggerPresent, and causes application to exit if the debugger was detected.

Inside the check_remote_debug – original version

I made a patch inside this function, just to blind the check (changed the conditional jump into unconditional):

Then I modified the mapping, so that the above function will be the only one added to the map, at every possible index:

By this way we still have the checks running, but in a way that is not disturbing. The crackme can be run under the debugger with no problems.

Patching the IPv4 flag

As we saw during static analysis, the crackme proceeds with the received buffer only if the IPv4 “reserved” flag is set. The problem is, it is not a standard situation. When we send the packet by nping, the “reserved” flag will be clear.

Rather than trying to somehow enforce passing this flag, I decided to simply do the patch in the code, to avoid it being checked.

NOPed the conditional jump

Analysis of the verification function

Finally we are ready for the dynamic analysis of the verification function.

I decided to make some experiments by sending the buffer with one of the expected commands with the help of nping, and then watch under the debugger how it is processed.

Command #1


nping --udp -p 4356 --dest-ip -c 1 --data 01000000

The command 1 causes a fake flag to be decrypted:

Yet another artifact that gets decrypted on this command is a BMP, that is a frame from the famous “Rick roll” video clip. Interestingly, this frame is being displayed on the console.

We can easily conclude, that this command serves no other purpose than being a red herring.

Command #2

At first, sending the buffer with this command was causing an application to crash. After taking a closer look, I realized that the DWORD defining the command must be followed by another DWORD : this time defining the size of the buffer that comes after that. When we send a buffer in a valid format, it is being copied, and then compared with four keywords, that are dynamically decrypted:

"L0ve", "s3cret", "5Ex", "g0d"

If the comparison passes, the crc32 of the buffer is being calculated, and stored in another buffer. Initially I dismissed those strings, thinking they are yet another red herring, but they turned out to be very important…

Command #3

This command expects three additional arguments (DWORDs). The first one must be 3, second: 2, and the third: ‘MZ’.

nping --udp -p 4356 --dest-ip -c 1 --data 03000000020000004d5a0000

After we send the buffer in the expected format, something new will be decrypted with the help of RC4 algorithm (using WinAPI, and the patched version of the function CryptImportKey). I expected it to be the flag…

Obtaining the flag

Initially, when I tried to send the command 3, it was reaching the RC4 decryption part, but the buffer used as the RC4 key was empty. At first I thought that maybe I destroyed something because of my patching, so I asked for a hint if this is really the way this part of the crackme should look like. Fortunately, it turned out that everything is fine, I just should take a closer look at what other command can fill this key.

After some more experiments it became clear that the CRC32 checksums from the command #2 are going to be filled into the RC4 key buffer.

So, all what was needed at this point was to send those buffers one by one, in a properly formatted packets:

02000000 05000000 4C 30 76 65 00 -> L0ve
02000000 07000000 73 33 63 72 65 74 00 -> s3cret
02000000 04000000 35 45 78 00 -> 5Ex
02000000 04000000 67 30 64 00 -> g0d


dnping --udp -p 4356 --dest-ip -c 1 --data 02000000050000004C30766500
nping --udp -p 4356 --dest-ip -c 1 --data 020000000700000073336372657400
nping --udp -p 4356 --dest-ip -c 1 --data 020000000400000035457800
nping --udp -p 4356 --dest-ip -c 1 --data 020000000400000067306400

This causes filling of the full RC4 key.

Then we need to send the command 3:

nping --udp -p 4356 --dest-ip -c 1 --data 03000000020000004d5a0000

This will trigger the decryption of the flag.

CryptImportKey is called

Finally, the flag got decrypted!


No more exceptions please! This is how we reached the end of this challenge…

Posted in Uncategorized | Tagged , , | 2 Comments

Flare-On 7 – Task 10

This year’s FlareOn was very interesting. I managed to finish it with 87th place. In this small series I will describe my favorite tasks, and how I solved them. I hope to provide some educational value for others, so this post is intended to be beginner-friendly.

My writeup to the previous task can be found here.


In this task we are provided with the following package (password: flare). It contains a 32 bit ELF (break), and a description that says:

As a reward for making it this far in Flare-On, we've decided to give you a break. Welcome to the land of sunshine and rainbows!

No hints this time, only trolling! And this is what we must get used to while doing this task that turns out far from the promised easy. Yet, it is full of red herrings and false hints…

This challenge is the most interesting crackme I ever encounter. Yet, it is very exhausting. In is in reality, it is more like 3 tasks in one. Instead of searching for one flag, we need to collect 3 different fragments of it. Each of them is protected by a different cipher that we need to break. But this is not the only challenge! Even to make sense of the code is going to be difficult – the flow is protected using some sort of nanomites – at least the first two layers. Functionality-wise, each layer is a bit different. Even to find where is the code that we need to analyze, may be a challenge itself (stage 3 is a shellcode, that is loaded to the main application by an overflow, that is exploited by the crackme itself).

Walk-through my solutions for particular parts:

Thanks to everyone who gave me hints during this long journey!

Posted in CrackMe | Tagged , | 1 Comment

Flare-On 7 – Task 9

This year’s FlareOn was very interesting. I managed to finish it with 87th place. In this small series I will describe my favorite tasks, and how I solved them. I hope to provide some educational value for others, so this post is intended to be beginner-friendly.


In this task we are provided with the following package (password: flare). It contains a 64 bit PE (crackinstaller.exe), and a description that says:

What kind of crackme doesn't even ask for the password? We need to work on our COMmunication skills.

By the name and the description we can guess that it is going to be an installer for some other components, and also that some knowledge about COM (Component Object Model) is going to be required.


Before we go into details of the solution, lets see the roadmap of the elements that we are going to discover.

The following diagram presents the loading order of particular components involved in this task:

The elements with solid borders are loaded from files. The elements with dash line borders are loaded in-memory only. Yellow – executes only in a usermode, blue – only in a kernelmode, gray – part in usermode and part in kernel mode.


The crackme runs silently, without displaying any UI. In order to see what is happening during execution, we can use some methods of tracing the activities (i.e. ProcMon). I wanted to see what exactly are the APIs called from the main application, so started by running it via Tiny Tracer. In order to get the complete trace, it must be run as an Administrator.

This is the trace log that I obtained:


It gives a pretty good overview what is going on at what points of the code. Let’s go through the log first, and see how much can we discover by reading the order of APIs called.

The first fragment that triggered my interest is the following:


By reading it we can find that the crackinstaller:

  1. drops some file (CreateFileW, CreateFileMappingW, MapViewOfFile, CloseHandle)
  2. installs it as a service (OpenSCManager, OpenServiceW, StartService)
  3. sends an IOCTL (DeviceIoControl) – most likely the receiver is this newly installed service, that is a driver
  4. uninstalls the created service (OpenServiceW, DeleteService)

Another interesting fragment of the log follows the previous one:


In this fragment we can see that some file is being dropped (CreateFileW, WriteFile). Then it is registered as a COM server.

So, at this point we can expect two elements are going to be installed: a driver (which is uninstalled right after use) and the COM component. In order to find them we must see what are the files that are being dropped. We can load the generated .tag into x64dbg, and set breakpoints on the interesting functions.

The dropped components

First I set breakpoints at CreateFileW to see what are the paths to the dropped components. We can collect them from those paths once they are saved.

As we observed before, there are two elements dropped:

  1. The driver: da6ca1fb539f825ca0f012ed6976baf57ef9c70143b7a1e88b4650bf7a925e24
    • dropped in: C:\Windows\System32\cfs.dll
  2. The COM server: 4d5bf57a7874dcd97b19570b8bad0fa748698671d67593744df08d104e6bd763
    • dropped in: C:\Users\[username]\AppData\Local\Microsoft\Credentials\credHelper.dll

The first element executed is the driver, so this is where I started the analysis.

The dropped driver (cfs.dll)

As we could find out by reading the comments on Virus Total, this is a legitimate, but vulnerable Capcom driver, that was a part of the Street Fighter V game (more about it you can read here and here). Due to the vulnerable design, this signed driver allows for execution of an arbitrary code in kernel mode. By sending a particular IOCTL we can pass it a buffer that will be executed (it is possible since the driver disabled SMEP as well). This vulnerability makes it a perfect vector to install untrusted kernelmode code on the machine – that feature is used by the current crackme.

First, the driver is dropped from the crackinstaller into:


And installed as a service. Its path is:


Then, the aforementioned IOCTL is being called. Below you can see an example of the parameters that were passed to the IOCTL (DeviceIoControl function), along with their explanation:

1: rcx 00000000000001E4 ; driver
2: rdx 00000000AA013044 ; IOCTL
3: r8 0000007B3EAFF6C8 ; input buffer
4: r9 0000000000000008 ; input buffer size
5: [rsp+28] 0000007B3EAFF6C0 ; output buffer

The input buffer turns out to be the following small stub, written in additionally allocated executable memory page:

025E86BD0008 | sti
025E86BD0009 | mov rdx,25E86AF2080 ; address of: driver.sys
025E86BD0013 | mov r8d,5800 ; size of the driver
025E86BD0019 | mov r9d,3170 ; address of DriverBootstrap function
025E86BD001F | jmp qword ptr ds:[25E86BD0025] ; function inside crackinstaller.exe

The stub sets parameters, that are going to be used by the next function. Then it leads the execution back to the crackinstaller.exe – to another function (at RVA 0x2A10). Although the dropper is a userland application, this part of the code will be called in a kernel mode – because the execution to this function is redirected via the kernelmode component.

This function is responsible for loading yet another driver (driver.sys) that is also passed as one of the parameters.

By looking at the loading function, we can see that this driver is going to be mapped manually into the kernel-mode memory. The “DriverBootstrap” function exported by driver.sys is a kernel-mode Reflective Loader variant (similar to this one).

After this installation, the first driver (cfs.dll) gets unloaded and uninstalled – however, the second one: driver.sys – persists in the memory (in contrast to usermode applications, the memory allocated by a driver is not freed automatically when the driver is unloaded).

What I initially did, was dumping this driver.sys in a user mode (before the IOCTL was executed), and analyzed it statically. Then, I tried to load it as a standalone driver. However, it was a mistake. This driver has a buffer that is supposed to be overwritten on load, in kernel mode. At this stage, it is not filled with the proper content yet. This buffer is crucial for decoding a password. Since I overlooked the part that was overwriting it, although I understood the full logic of the driver, the output that I was getting was a garbage. After consulting it with other researchers, confirmed that the output was supposed to be a valid ASCII – so I realized that I missed something on the way, and I shouldn’t have been making shortcuts and dumping the driver in the userland. I then decided to walk through the full way of loading the driver in the kernel mode, and dumped it again in kernel mode, just before its execution.

The driver.sys

Before we move further to the dynamic analysis, let’s have a look at the driver.sys in IDA. As I mentioned earlier, dumping this driver in userland is not a perfect option (some important buffer is filled on load in kernel mode). However, for now, this version is good enough for the static analysis of the driver’s logic.

As always the execution starts in DriverEntry.

In our case, this function redirects execution to another one, which I labeled as “driver_main”.

Click to enlarge

Some interesting strings inside the driver are obfuscated – they are dynamically decoded just before use. There are various ways to retrieve them – I have chosen to write a simple wrapper in libPeConv that allowed me to call the decoding function without analyzing it, and apply it on the chosen buffers.

This module (driver.sys) is a filter driver with an altitude of 360000, which means “FSFilter Activity Monitor”.

The main function is pretty simple: its role is to initialize the device, and to set the callback that will be used for event filtering. The function CmRegisterCallback sets the callback that will be triggered each time an operation on Windows Registry is executed.

The routine that is registered to handle the callback (DispatchCallback) must follow the prototype of EX_CALLBACK_FUNCTION.

The second argument (denoted as Arg1) is of type REG_NOTIFY_CLASS – it informs about what type of the operation triggered the callback. In our case the event is processed further only in the case if the value of the REG_NOTIFY_CLASS is 26 (RegNtKeyHandleClose ?). The next argument (Arg2) holds a pointer to the structure of different types, depending on the value of the previous one (Arg1). In our case, Arg2 holds the pointer to the UNICODE_STRING with the name of the operated Registry Key.

The name of the key is copied into additionally allocated memory with a tag “FLAR”. It is compared further with a dynamically decoded string:

Only if the name of the key matches the hardcoded one, the next, more interesting part of the code is executed. If we checked the changes in the registry made during the execution of crackinstaller, we will notice, that this registry key is created on the installation on the COM server. So, this is how those components are tangled together.

The next part of the driver’s code decrypts some mysterious buffer. We can recognize the involved algorithms by their typical constants. First, SHA256 hash is calculated from a buffer hardcoded in the driver (denoted as “start_val”). Then, the hash is used as a key for the next algorithm, that is probably Salsa20 (eventually it may be a similar cipher, ChaCha).

Click to enlarge

At this point we can guess that our next goal is to get this decoded buffer.

In order to get the valid solution, we need to first get the overwritten version of the above driver, so, the one that is loaded in the kernel mode.

Notes on kernel mode debugging

Before we can start kernel mode debugging, we need to have an environment set up. The setup that I used is almost identical to this one. Yet, there are few differences that I am going to mention in this part.

First of all, we need a 64 bit version of Windows – I used Windows 10 64 bit VM on VirtualBox (linked clones for Debugee and Debugger).

As always, the usermode analysis tools (i.e. x64 dbg) as well as the crackme itself, are going to be run on the Debugee VM. The kernel mode debugger (WinDbg) will be run on the Debugger VM, connected to the Debugee.

Configuring the Debugee VM

There are few more steps (in addition to the ones described here) that we have to take in order to configure the Debugee VM. In case of Windows 10, explicitly setting the debug interface is necessary (by default, even if we enable debugging on the machine, it is going to be set in a local mode, and we will not be able to connect the Debugger VM). Since we are going to establish a debug session over a serial port, the following settings apply:

bcdedit /dbgsettings serial debugport:1 baudrate:115200

We can test if the proper options are applied by deploying the command dbgsettings without parameters:

bcdedit /dbgsettings

Expected result:

DbgSettings after

We need to remember that on 64 bit Windows a driver must be signed in order to be loaded. This is not gonna be an issue if we want to load the first driver: cfs.dll – because this is a legitimate, signed driver. However the second one: driver.sys – which is more important to the task – is not signed. It loads just fine as long as the first, signed driver is used as a loader. But for the sake of the convenience, at some point we are going to load the driver.sys as a standalone module. To be able to do so, we must change an option in bcdedit, in order to allow unsigned drivers to be loaded. It can be done running this command on the Debugee machine:

bcdedit /set TESTSIGNING ON

After changing the settings, the system must be rebooted.

We also have to disable Windows Defender, otherwise the crackme will be mistaken as a malware and removed.

Dumping driver.sys in kernel mode

In order to understand what exactly is going on, and not to miss anything, I decided to walk through the full flow since the IOCTL is executed inside cfs.sys, till the driver.sys is loaded in memory.

To start following it in kernel mode, we need to locate the address of the function inside cfs.dll that is going to be triggered when the IOCTL is sent. Let’s open cfs.dll in IDA, and see the function registered to handle IOCTLs:

Inside we can see the IOCTLs numbers being checked, and then the function to execute the passed buffer is being called:

In the next function (that I labeled “to_call_shellcode”) we can see the operations of disabling SMEP, calling the passed buffer, and then enabling the SMEP again:

The function disabling SMEP :

So, we need to set the breakpoint at the address just after the function disabling SMEP returns, because in this line there is a call passing execution to the shellcode. This happens at VA = 0x10573 (RVA = 0x573):

If we step into that call in WinDbg, we will be able to follow the passed shellcode executed in kernel mode.

Before we will go to set the breakpoint in kernel mode, we need to load the crackinstaller into a userland debugger (such as x64dbg) and set the breakpoint before the DeviceIoControl function is called.

Then, on the Debugger machine (connected to the Debugee where the crackme runs) we deploy WinDbg and connect to the Debugee.

We can set a breakpoint on load of the cfs.dll in WinDbg by:

sxe ld cfs

After that, we run the crackme. The breakpoint should hit and the Debugee freezes. With the help of the following command:


We can see the list of all the loaded modules, and find the module of our interest on the list:

If we want to view this list from the Debugee perspective, we can also use Driver List by Daniel Pistelli.

Now, let’s set a breakpoint on the offset inside the driver, that executes the shellcode:

bp cfs + 0x573

And we resume the Debugee. Lets step over the breakpoint at DeviceIoControl in x64dbg. Now, in the Debugger VM, we can see again that the breakpoint has been hit.

Opening the Disassembly window allows us to see this line in the original context:

Click to enlarge

As we can see, it is the same code fragment that we observed in IDA before, analyzing the relevant fragment of cfs.dll.

Using the command:


We can step into the call. And what do we see? The very same shellcode that we observed being passed to the DeviceIoControl!

The address moved to RDX is the address of the buffer holding driver.sys.

Now as we know from the previous analysis, the execution should be redirected back to crackme.exe, but the execution will take place in a kernel mode. We can set the breakpoint at the first jump which will do the redirection

bp [address]

After setting the breakpoint, we can resume the execution (“g”) and once the breakpoint is hit, step in again (“t”):

This is where we end up:

…and it is exactly the function at 0x2A10 in crackinstaller.exe, that we found before. As we know, this function will do the modifications in the driver, and then redirect execution to there, inside the DriverBootstrap function (RVA = 0x3D70 , raw = 0x3170).

By analyzing the flow of the corresponding function in crackinstaller, we can guess that the redirection happens at RVA = 0x2c26

inside crackistaller.exe

Let’s set a breakpoint there, and resume the execution.

At this point we can see the function PSCreateSystemThread is being called. The start routine is going to be the DriverBootstrap function.

The address of the bootstrap function is stored in RAX register:

At this point the driver is in the raw format, so we know that the raw address of the bootstrap function was used: 0x3170. By subtracting it from the whole address, we can get the driver’s base. By looking up this address in the Memory window we can see that indeed this is where the driver has been loaded:

Now it’s time to dump the driver. We can do it with the help of command .writemem. We need to supply it the path where we want to save the dump, and the range to be dumped. The size of the driver was supplied to the shellcode, and it is 0x5800. So, we can dump the range in the following way:

The new version dumped as “mydriver.sys”

After having the driver dumped, we can see what was patched. The comparison done via PE-bear:

Comparison – the original vs the modified

The patched content is the buffer that was used to derive the Salsa20 key (the “start_val” is filled with a string “BBACABA”).

Extracting the password in kernel mode

After the driver.sys is loaded in the memory, the crackinstaller.exe installs the COM server. On installation, the COM server creates the Registry key with the server GUID: “{CEEACC6E-CCB2-4C4F-BCF6-D2176037A9A7}\Config”. Creation of this key triggers the filter function inside the driver.sys to decrypt the hardcoded password. Our next goal is to fetch this password from the memory while it is being decoded.

Finding of this password can be achieved easily – all we need to do is to set a breakpoint in WinDbg, that will be triggered after the password is decoded, and then dump the output from the memory.

Yet, setting the breakpoint on the function of the reflectively loaded driver would be very inconvenient. Reflectively loaded driver will not be listed among the loaded modules, so we cannot reference it by its name. We also don’t know the base at which it was loaded. So, this is the point where it comes very handy to load the driver.sys independently.

For this part, we are going to use the patched version of the driver.sys – the one that was dumped as mydriver.sys in the previous part.

Loading the driver.sys as a standalone driver

Once we dumped the modified version of the driver, we can load it as an independent module. However, now the loader is not signed, so it won’t load in Windows unless we disable signature checking in the bcdedit (as mentioned before, reboot is required each time we change the settings):

bcdedit /set TESTSIGNING ON

We install it on the Debugee VM:

sc create [service name] type=kernel binpath=[driver path] 
sc start [service name] 

Let’s break the execution via Debugger VM (WinDbg : Debug -> Break) and see if the driver.sys is present on the list of the modules, using the command:


We should see it on the list, just like on the example above.

Dumping the password from the memory

Now we can set the breakpoint inside the filter function. As mentioned before, it is gonna be called each time when some registry key is read/written. Then the name of the key is going to be compared with the hard-coded one (which is dynamically decrypted). If the name matches, another buffer is decrypted with the help of Salsa20. So, the password decryption is executed immediately when the COM server creates this key.

We can set the breakpoint after the key name verification is passed (RVA = 0x48C9):

bp driver + 0x48C9

In order to trigger the event, we need to use the the credhelper.dll now, and run the DllRegisterServer function. It can be done just by running (on Debugee):

rundll32.exe credhelper.dll,DllRegisterServer

This will trigger the breakpoint that we can follow in WinDbg…

Let’s set a breakpoint at the address where the Salsa20 algorithm was executed (it happens at RVA = 0x49AC):

driver.sys – IDA view
bp driver + 0x49AC

After that we can resume the execution


…and the breakpoint will be hit:

At this point, the address of the output buffer is in the R8 register. So we need copy this address to the memory view. Now we can step over the function.

And the decryptet content got filled in the buffer that we previewed:

So this is the password: “H@n $h0t FiRst!”.

Now we need to learn how to use this password to decode the flag…

The COM component

The driver.sys is quite small, and there is nothing more in it to decode, so I guessed the next pieces of this puzzle are hidden somewhere in the COM component. Let’s take a look…

We aleady saw in the Pin tracer log. that one function from this DLL is being called:


If we open the credhelper.dll in IDA, we can see that this function is probably the one responsible for decoding the flag:

We can see the registry keys “Password” and “Flag” being referenced.

However, if we take a closer look, we will see that the function responsible for setting the Flag is not inside the DllRegisterServer.

There are two unreferenced functions that manipulate the same registry keys:

The first one, reads the value of the Password from the registry, and initializes some structure with its help (snippet here).

The other is responsible for decoding the Flag (snippet here).

I guessed that the “Password” must be the string decoded from the driver.sys. So, we need to fill it in the registry, and then call those functions in proper order – probably using the COM interface.

This should probably be the “right” way to solve this task. However, when I was taking a closer look at those functions, they started to remind me something familiar: the functions used by RC4 encryption algorithm, which is commonly used in malware.

So, my guess was:

  1. The function that I denoted as “get_password_value” was an RC4 password expansion function – it was initializing the context with the password (“H@n $h0t FiRst!”).
  2. The function that I denoted as “set_flag_value” was using this context, and decoding a hardcoded buffer by the RC4 decryption algorithm

I dumped the hardcoded buffer, and decided to check those assumptions using CyberChef. It turned out correct: S0_m@ny_cl@sse$_in_th3_Reg1stry@flare-on.com

So, the final flag was RC4 encrypted, with the password extracted from the driver.

Posted in CrackMe, KernelMode, Tutorial | Tagged , , | 6 Comments