Ida tips: how to use a custom structure

Applying custom structures make the result of decompilation much more readable.

This is how the same fragment of the code looks before and after proper structures being applied:



In this short post, I will demonstrate how to add such structures into IDA, on the example of a PE structure.

Creating the structure

My definition of PE file structure is available here.

Note, that some of the data types that we would normally use when we write a C/C++ code on Windows, are not available in IDA. And other types may be defined a bit differently. For example, the types such as WORD and DWORD from windows.h are defined in IDA, but with a “_” prefix. For example:

 _WORD e_res2[10];
 _DWORD e_lfanew;

Adding the structure into IDA

With the help of the following steps, we can add the custom structure into IDA.

1 – First we need to open the subview “local types” where all such definitions are stored:

2 – We click on “Insert…”

3 – The window for the new definition opens. We can paste there our custom structure.

4 – After we pasted and clicked OK, the new types should appear on the list.

Using the custom structures

Now our custom structures are ready to be used!

Whenever we find a variable that has the that type, we can convert it to our custom structure. For example:

1 – Select the variable that you want to convert:

2 – Select the structure from the list:

Sometimes you may need to manually refresh the decompiler view, by pressing F5.

And it’s ready!

Posted in Tutorial | Tagged | Leave a comment

Python scripting for WinDbg: a quick introduction to PyKd

PyKd is a plugin for WinDbg allowing to deploy Python scripts . It can be very helpful i.e. for tracing and deobfuscation of obfuscated code. In this small tutorial I will demonstrate how to install it and make everything work.


Download and install the PyKd.dll

I assume that we already have a WinDbg installed. First we need to download PyKd DLL. Ready made builds are available in the project’s repository:

The package contains two versions of the DLL: 32 and 64 bit. We need to use the version appropriate to the bitness of our WinDbg installation (i assume 64 bit).

First we create a directory where we will store plugins for WinDbg. For example: “C:\windbg_ext”. We drop there the pykd.dll.

Then we need to set the path to this directory in and environment variable (_NT_DEBUGGER_EXTENSION_PATH) , so that WinDbg can find it.

Install Python and pykd Python library

We need to have a Python installed, as well as Pip. I have chosen the latest Python installer from the official page.

Now let’s install Pip. The detailed guide how to do it is presented here. I have chosen to download the script, and run it by previously installed Python. The installed pip (example):

The next step is to install the pykd Python library via Pip (from command prompt):

pip install pykd

Testing PyKd

If all the above steps succeeded, our PyKd is ready to be deployed. In order to test it, we will run WinDbg, and attach to some process (i.e. notepad).

First, let’s load the PyKd extension:

.load pykd

If it is loaded, we can see its commands by using help:


If we have multiple versions of Python installed, the latest one will be set as default, but yet it is possible to switch between them.

Once the PyKd extension for WinDbg (PyKd.dll) is loaded, we can run the python command prompt and check if the PyKd library for Python is available. We run the prompt by:


Now we can issue:

import pykd

And test by issuing some WinDbg command via PyKd:

print(pykd.dbgCommand("<any WinDbg command>")


The results of the command are printed with the help of Python print. After the text we can exit console by issuing:


Running scripts

If we get the results as above, everything is installed and ready. Now, instead of running the python commands from the WinDbg command prompt, we can save them as a script:, and run by giving the path to the script. Example:

!py C:\pykd_scripts\

Posted in Tools, Tutorial | Leave a comment

Flare-On 8 – Task 6

Flare-On is an annual “reverse engineering marathon” organized by Mandiant (formerly by FireEye). You can see more information here. It is a Capture-The-Flag type of a contest, where you are given a set of crackmes with growing difficulity. This year we were provided with 10 tasks. I finished as 125. In this series of writeups I will present my solutions to the selected challenges, and guide you through the task, all the way till the final flag.

The description of the challenge 6:

Download: 06_PetTheKitty.7z (password: flare)

In this task we are given a PCAP file.

I opened it in a Wireshark and followed the TCP steams.

There are two streams, first of them consists of a request, followed by a longer response, containing a PNG:

Another contains many shorter packets, requests and responses:

We can see the keyword “ME0W”, but also “PA30” repeating…

PA30 is a patch format, introduced by Windows Vista, and called Intra-Package Delta (IPD). More information about it we can find in the following blog. We will find there also a python script that can be used for applying the patches.

First I extracted the components from the first stream. As we saw at the first sight, the response contains a PNG. At the end of the PNG we can see an ASCII art:

A PA30 patch follows after.

In order to separate them correctly, we need to understand the headers of the “ME0W” packet:

4d 45 30 57  d0 24 0a 00  d0 24 0a 00 | ME0W .$.. .$..

The header contains the magic number “ME0W” followed by two DWORDs, denoting the size of the data repeated twice, and then the data buffer.

After extracting the data buffers, we get two elements listed below (along with their MD5 hashes):

2c691262493ceaaa5de974adab36ed69  cat.png
440c49962f81e3d828ddcc3354c879c9  patch.p30

The PNG:

The image looks valid and looks very innocent, but after applying the patch it will change completely…

I guessed that the patch from this stream must be used along with the given PNG. I applied it with the help of the following command: -i cat.png -o out.bin patch.p30

The output turned out to be a DLL:

By looking closer at the code we realize that this is the “malware” responsible for generating the further communication. It connects to the URL that was referenced in the PCAP:

In order to understand how to decode the rest of the PCAP, we need to check how the the received data is processed. The relevant fragment of the code:

It turns out to be fairly simple. First the data is decoded by being applied as a patch on an empty buffer. Then, the output is XORed with a hardcoded key “meoow”.

Applying of the patch is done by the same function as was used before (to decode the DLL from the picture) – ApplyDeltaB:

Now we can decrypt the rest of the traffic following this pattern. First we need to apply the patch on a buffer filled with 0s, and then XOR the output with the key.

We can see the decrypted traffic contains some exfiltrated data from a victim machine. Among this data there is a listing containing the flag:

Posted in Uncategorized | Tagged , , | 1 Comment

Flare-On 8 – Task 7

Flare-On is an annual “reverse engineering marathon” organized by Mandiant (formerly by FireEye). You can see more information here. It is a Capture-The-Flag type of a contest, where you are given a set of crackmes with growing difficulity. This year we were provided with 10 tasks. I finished as 125. In this series of writeups I will present my solutions to the selected challenges, and guide you through the task, all the way till the final flag.

The task 7 comes with the following intro:

Download: 07_spel.7z (password: flare)

The attached file is a Windows executable, 64-bit.

When we run the application, the following window pops up:

At the beginning I wasn’t sure if the task runs correctly on my system. But I decided to trace it with Tiny Tracer to see what happens.

Watching the tracelog in real-time with the help of Baretail, I noticed that when I closed the window, something got unpacked it the memory and executed. Relevant fragment of the log:

	Arg[0] = ptr 0x00007ffa8a8f0000
	Arg[1] = ptr 0x00007ff7094dc0b0 -> "VirtualAllocExNuma"

17972f;called: ?? [1747b490000+0]
> 1747b490000+1cd;ntdll.LdrLoadDll
> 1747b490000+1f2;ntdll.LdrGetProcedureAddress
> 1747b490000+218;ntdll.LdrGetProcedureAddress
> 1747b490000+23d;ntdll.LdrGetProcedureAddress
> 1747b490000+263;ntdll.LdrGetProcedureAddress
> 1747b490000+289;ntdll.LdrGetProcedureAddress
> 1747b490000+2ae;ntdll.LdrGetProcedureAddress
> 1747b490000+2d4;ntdll.LdrGetProcedureAddress
> 1747b490000+377;kernel32.GetNativeSystemInfo
> 1747b490000+3c0;kernel32.VirtualAlloc
> 1747b490000+648;kernel32.LoadLibraryA
	Arg[0] = ptr 0x00000001800152b6 -> "KERNEL32.dll"

> 1747b490000+6ad;ntdll.LdrGetProcedureAddress
> 1747b490000+6ad;ntdll.LdrGetProcedureAddress
> 1747b490000+6ad;ntdll.LdrGetProcedureAddress
> 1747b490000+6ad;ntdll.LdrGetProcedureAddress
> 1747b490000+6ad;ntdll.LdrGetProcedureAddress
> 1747b490000+6ad;ntdll.LdrGetProcedureAddress

We can see that it uses a function VirtualAllocExNuma to allocate memory:


Then, something is loaded into this memory and executed (the entry point at offset 0 suggests that it is a shellcode, not a PE):

17972f;called: ?? [1747b490000+0]

Next, we can see the functions executed from inside of the shellcode (prepended with “>“):

> 1747b490000+1cd;ntdll.LdrLoadDll
> 1747b490000+1f2;ntdll.LdrGetProcedureAddress
> 1747b490000+218;ntdll.LdrGetProcedureAddress

We can see that it loads multiple imports (using LdrGetProcedureAddress). This suggests that this shellcode is yet another loader (possibly for a PE payload).


The previous experiment showed that the executable is packed. So, I decided to unpack it with the help of mal_unpack (one of the tools from PE-sieve family). Since manual closing of the window is required in order to trigger payload unpacking, I run mal_unpack with the following commandline (infinite timeout):

mal_unpack.exe /timeout 0 /exe spel.exe

And then I closed the window.

Some DLLs got dumped.

Shellcode, as well as one of the DLLs seems to be nothing but the next stage loaders.

However, I noticed among them an interesting DLL with one function exported:

Unfortunately, the relocation table of this DLL was removed:

Data Directory view shows that the Relocation Table is cut out

Due to this fact, it could not be used as a standalone DLL.

Manual reconstruction of a relocation table is difficult, and sometimes even impossible. But I got an idea that maybe I can still find a raw copy of this DLL, with the relocation table intact. So I scanned it again, this time with an option /data 3 to dump also PEs from non-executable memory.

mal_unpack.exe /timeout 0 /exe spel.exe /data 3

This time more DLLs were dumped.

One of them was indeed a raw copy of the DLL I was looking for – this time with a valid relocation table.

Now, all I needed to do was to remove padding of the dumped file. I did it with PE-bear:

PE-bear: removing the padding at the end of the dumped DLL

And the DLL is ready to be run… I just renamed it to its original name ldr.dll.

Tracing the DLL and writing a loader

I decided to trace the found DLL with a TinyTracer. The DLL exports a function Start so I suspected this will be the function that should be called.

I set it in Tiny Tracer:

set DLL_EXPORTS="Start"

Then I executed tracing the DLL by Tiny Tracer.

Reading the trace log, I noticed the DLL tries to load some resource. The resource is supposed to be fetched from the main application. I added to the TinyTracer tracking of related parameters, and I saw what exactly is being loaded. It is a PNG (full trace log available here).

	Arg[0] = ptr 0x00007ff72e9e0000
	Arg[1] = 0x0000000000000080 = 128
	Arg[2] = ptr 0x0000005628ebf5e4 -> "PNG"

The relevant PNG is in the resources of the main application:

Interestingly, PE-bear fails to display it. It turns out other tools have the same problem. The content of the PNG is just invalid. I suspected that it will contain some encrypted data, possibly the flag.

The content of the PNG: possibly an encrypted buffer

Now we know that this PNG needs to be passed to the DLL. In order to do so, saved the resources by PE-bear. The aforementioned PNG is in the file named: _1_429cc0.png.

Then, I created my own loader, that includes this PNG as a resource with identical name as the DLL requires. The code of the loader is available here. Now we can trace the execution of the ldr.dll via the prepared wrapper. We just need to change the traced module in the TinyTracer’s run_me.bat (as described here).

set TRACED_MODULE="ldr.dll"

The other thing that we can notice in the trace log is a SleepEx function. I also watched its parameter in Tiny Tracer:

	Arg[0] = 0x0000000000057e40 = 360000
	Arg[1] = 0

The sleep time turns out pretty long: 6 minutes. Fortunately we can overwrite it in TinyTracer (more info here).

Static analysis

I opened the DLL in IDA in order to analyze it statically. Overview of the Start function:

The decompiled code – final result of my analysis – is available here.

Used obfuscation

Most of the API functions are resolved by hashes, so the TAG file generated by TinyTracer came handy. I just applied tags on the IDA view (using IFL plugin), and the code became much more understandable. Example:

NOTE: this way of resolving API calls have some limitations: since the tags are generated during tracing, only the calls that were actually executed will be resolved. So, still we are left with some hashes that are not mapped. Fortunately, a quick google lookup shows that the hashing algorithm is well known, and there are already lists of common API functions with their corresponding hashes. This helped to find some more functions.

Not only the API calls are obfuscated, but also strings. Each used string is deobfuscated just before use, with the help of an inline XOR loop. Example:

Since the application doesn’t use many strings, I decided not to write any automatic solutions, but to resolve them manually under the debugger as I progress with the analysis.

Examining the checked conditions

There are some condition that the DLL checks, for example, the executable must be named Spell.EXE – so I renamed my loader to this name.

After renaming my loader (and enabling sleep hooking in Tiny Tracer, as it was shown before), I traced it again. The produced log is available here. This time we can see something interesting: the application is trying to connect to the socket:

	Arg[0] = 0x0000000000057e40 = 360000
	Arg[1] = 0

	NtDelayExecution hooked. Overwriting DelayInterval: ffffffff296c5c00 -> fffffffffffe7960

	Arg[0] = ptr 0x000000c094952400 -> "ws2_32.dll"

	Arg[0] = ptr 0x000000c0949522e0 -> "user32.dll"


I added tracking of the gethostbyname parameters, and I saw the address it is trying to connect to:

	Arg[0] = ptr 0x0000007a6ef5f9b0 -> ""

After checking more details under the debugger, I found out that it queries one of the two addresses: and , trying to connect to the port 888. None of those addresses is active, so we have to somehow emulate this communication.

Once it connects to the C2, it sends a beacon “@” and is waiting for a command.

There are 3 commands available: “exe”, “run”, “”.

First two commands are used for running some received shellcode, or a PE file. Third of them leads to a function that seems to decrypt something…

Emulating the C2

One of the possible ways of emulating the communication, is to start a server locally, for example using netcat.

netcat -l -p 888

Then we can redirect the domain to it by editing the following file:


We need to create the entry that will cause the the domain to be resolved as our localhost:

Running the binary again, we can see that indeed the crackme connects to our emulated C2, and sends the expected prompt:

Running the commands

As mentioned earlier, the third command (“”) looks interesting, because it leads to some decryption. We can run the prepared loader again, via TinyTracer, and watch the APIs called during the communication with the fake C2. I let it connect, then set the command “”, at the same time observing the trace log in real-time and checking what happens.

First, the BCrypt library is loaded, and it is used to decrypt some buffer. Relevant fragment:

	Arg[0] = ptr 0x000000726ce82180 -> "bcrypt.dll"


After that, some registry keys are set, and finally the function exits (execution goes back to the loader):


The full trace-log from this session is available here.

After adding the BCrypt functions to watched, and tracing again, we get some additional information:

	Arg[0] = ptr 0x0000001ec8de0b90 -> {...}
	Arg[1] = ptr 0x0000001ec8cbfc24 -> L"ChainingMode"
	Arg[2] = ptr 0x0000001ec8cbfc44 -> L"ChainingModeCBC"
	Arg[3] = 0x0000000000000020 = 32
	Arg[4] = 0x0000001e00000000 = 128849018880

	Arg[0] = ptr 0x0000001ec8de0b90 -> {...}
	Arg[1] = ptr 0x0000001ec8cbfc78 -> {...}
	Arg[2] = ptr 0x0000001ec8de5f00 -> {...}
	Arg[3] = 0x000000000000028e = 654
	Arg[4] = ptr 0x0000001edb9c0000 -> "d41d8cd98f00b204e9800998ecf8427e"
	Arg[5] = 0x0000000000000020 = 32
	Arg[6] = 0

	Arg[0] = ptr 0x0000001ec8de5f00 -> L" "
	Arg[1] = ptr 0x00007ff6ebae610f -> {\xd7\xfb~b\x8d\xab\x87e\xcdq\x85\xceS\x0fZ\x8c-\x8aE7\x12Ky\x1d@\xdav\x86&\xd3\xd3r}
	Arg[2] = 0x0000000000000020 = 32
	Arg[3] = 0
	Arg[4] = ptr 0x0000001ec8cbfca8 -> {...}
	Arg[5] = 0x0000000000000010 = 16
	Arg[6] = ptr 0x0000001ec8cbfc88 -> {\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00}
	Arg[7] = 0x0000000000000020 = 32
	Arg[8] = ptr 0x0000001ec8cbfc80 -> {...}
	Arg[9] = 0x0000001e00000000 = 128849018880

We can spot that the content of the PNG file gets decrypted. Buffer:


Is the same as the content of the previously reviewed PNG:

The used algorithm is AES in CBC mode, with the key generated from the string: “d41d8cd98f00b204e9800998ecf8427e”.

If we follow those functions under the debugger, we can see the aforementioned decryption:

Before decryption

…and the string that we got as the result of it:

After decryption

Later, this buffer is rewritten, with the suffix “” (typical for the flag) appended:

The string didn’t make much sense, but at least it is ASCII, so I thought it may be a flag. I tried to submit it, however, it turned out invalid. So I had to dig deeper.

I noticed this string is being XORed, scrambled, and the result is written into Windows Registry:

The function responsible for scrambling:

I decided to clear the buffers that are used for the XOR operations. The buffers:

C3 C1 A8 06 C2 96 33 00 00 00 00 00 00 00 00 00 8A 1D 89 15 14 9F C1 1D 99 7E 8A 1B 00 00 00 00

E2 A4 B7 A7 D7 AC 87 8D 9B 9C 85 0D D8 8E E5 FA

…were set to all 0s under the debugger:

As the result the valid flag was saved in the registry:


Best spell checker ever… This time the flag makes sense, moreover, it passes the verification!

Posted in CrackMe | Tagged , , | Leave a comment

Flare-On 8 – Task 9

Flare-On is an annual “reverse engineering marathon” organized by Mandiant (formerly by FireEye). You can see more information here. It is a Capture-The-Flag type of a contest, where you are given a set of crackmes with growing difficulity. This year we were provided with 10 tasks. I finished as 125. In this series of writeups I will present my solutions to the selected challenges, and guide you through the task, all the way till the final flag.

The 9-th is named “evil”, and the description says:

Download: 09_evil.7z (password: flare)

As mentioned, it comes with several false flags, so we need to watch out!

It is a Windows executable, 32-bit.

Overview and understanding the goal

Running the task doesn’t give us much information, because no output is displayed.

Opening it in IDA shows that the code is obfuscated: we can see some invalid chunks in between of code:

Due to this we IDA can neither decompile it, nor create graphs.

If we load it under x64dbg, we can see that the application keeps throwing exceptions:

We can step through them, and finally it reaches a far return:

Far returns are often used in Heaven’s Gate technique. However, here it is not the case, and the presence of it doesn’t make much sense. So it indicates that probably the debugger was detected and we went into a wrong execution path.

We can try once again, by setting x64dbg to ignore the exceptions:

Now, the debugger won’t stop at the exceptions, but it doesn’t help much: the application will soon terminate.

The next thing I did was tracing it with TinyTracer. Some trace is being produced, but again it breaks at the invalid far return:

It happens at the same RVA as the debugger show before: 0x2F14. Once again in x64dbg, we can see the path that leaded to that invalid instruction:

Patching (#1)

A simple patch can help avoid going this way: NOPing out the conditional jump:


RVA: 2fb5 -> NOP

Tracing the patched application

The above patch finally caused the trace to go much further.

Yet, it is worth to note that not all my attempts of tracing gave the same results: in some it was clear the application terminates immaturely. So, it made me guess that the defensive checks are somehow randomized. This was later confirmed with a static analysis, and will be described further in this blog.

Not seeing that the application reads any input I tried to trace it with some commandline argument (I used “Test123”). This turned out to be a good idea, as we could observe on the trace that the execution goes further. I obtained the following log: log1.tag.

The application terminates soon, yet, towards the end of the log, we can see some interesting calls, related to socket creation:


Seeing it, I suspected that opening of the socket has failed. I traced it again, but this time with tracking parameters of those functions.

Relevant fragments of the trace show that the commandline argument was used as a socket address:

	Arg[0] = ptr 0x00755000 -> "Test123"

Then, by checking the arguments passed to the function socket, we can see that the created socket is of the type raw, and dedicated to UDP communication:

	Arg[0] = 0x00000002 = 2 // AF_INET
	Arg[1] = 0x00000003 = 3 // SOCK_RAW
	Arg[2] = 0x00000011 = 17 // IPPROTO_UDP

Since the application will be opening a raw socket, need to be run as an Administrator.

I changed the commandline argument to “”, and traced it again, this time as an Administrator. The following alert shows up:

This time the application runs further. In the log we can see the calls to other functions related to the socket:


Fragments of the trace with added parameters tracking:

	Arg[0] = ptr 0x00b233a8 -> ""

	Arg[0] = 0x00000002 = 2
	Arg[1] = 0x00000003 = 3
	Arg[2] = 0x00000011 = 17

	Arg[0] = 0x0000028c = 652
	Arg[1] = ptr 0x008bf9b4
	Arg[2] = 0x00000010 = 16

	Arg[0] = 0x0000028c = 652
	Arg[1] = 0x98000001 = 2550136833
	Arg[2] = ptr 0x008bf9dc
	Arg[3] = 0x00000004 = 4

	Arg[0] = 0x0000028c = 652
	Arg[1] = 0x0000ffff = 65535
	Arg[2] = 0x00001006 = 4102
	Arg[3] = ptr 0x008bf9c8
	Arg[4] = 0x00000004 = 4

	Arg[0] = 0x00000002 = 2
	Arg[1] = 0x00000003 = 3
	Arg[2] = 0x00000011 = 17

	Arg[0] = 0x00000290 = 656
	Arg[1] = 0
	Arg[2] = 0x00000002 = 2
	Arg[3] = ptr 0x008bf9dc
	Arg[4] = 0x00000004 = 4

	Arg[0] = 0x0000028c = 652
	Arg[1] = ptr 0x00b753f0
	Arg[2] = 0x000005dc = 1500
	Arg[3] = 0

The other important things is, the socket expects a buffer of maximal length 1500 bytes:

	Arg[0] = 0x0000028c = 652
	Arg[1] = ptr 0x00b753f0 // buffer pointer
	Arg[2] = 0x000005dc = 1500 // buffer length
	Arg[3] = 0

At this point we can suspect that this buffer is the input of our crackme that will take part in obtaining the flag. For communicating with the socket, we can use nping. Example:

nping --udp -p 1234 --dest-ip -c 1 --data [test_data:in hex]

But understanding what exactly should be filled into the sent buffer requires some code deobfuscation…

Self-modifying code

I decided to run the crackme again (as an Administrator, with the argument “”), and scan it with PE-sieve/HollowsHunter.


hollows_hunter.exe /pname evil.exe /hooks /imp A

Dumped material:

It turns out that the dumped executable contains a lot of in-memory patches. Basically, the application patches itself as it goes.

Dumping it with the option /imp A gave a sample with a recreated Import Table. This can make a static analysis a bit easier, as (at least some) of the dynamic calls are now replaced with static imports. The other calls, that could not be deobfuscated this way, can be added to IDA by loading the trace log (.tag) via IFL plugin.

The Import Table recreated by PE-sieve

Hooked functions

In advapi32.dll

The dumped material also shows us that advapi32.dll has been hooked. The hook is at the beginning of the function CryptImportKey and it redirects to the crackme. The relevant TAG file (from the dump):


Looking at the hook target in IDA we can see the following trampoline function:

Its role is very simple: if the CryptImportKey was called with the parameter CALG_SEAL it will be changed to CALG_RC4. It suggests that the crackme is gonna use RC4 function to decrypt something (possibly the flag).

In ntdll.dll

There are also patches in ntdll.dll. The relevant TAG file:


The first patch disables the function DbgBreakPoint (a function that breaks into the kernel debugger):

The other patch is set at the beginning of the function DbgUiRemoteBreakin – a function used by a debugger to break into a process. Due to the patch, calling this function causes immediate process termination (function TerminateProcess).

Both of those patches are part of the defensive techniques of the crackme.

Flow modified by exceptions

If we apply the tracelog on the crackme, we can clearly see the points in the code where each exception has been thrown. Such points are represented as calls to the Exception Dispatcher (ntdll.KiUserExceptionDispatcher).

Exception: attempt to read a NULL pointer – view from original binary

The log also shows that soon after an exception, some API call has occurred: but in the original executable this part of code is invalid. By this observation we can assume, that the exception handler somehow overwritten the invalid bytes, and caused the API call instead.

When we apply the same tracelog, but on the dumped version of the binary, we can see how exactly the written patch looks like. Now, only one invalid byte is left, and the rest of them has been replaced with CALL EAX:

View from the dumped binary

The full code of the application is sprinkled with various instructions like this, which intentionally cause exceptions.

If we look again into the trace log, we can see that at the beginning of the execution the VEH is being registered. So, when the aforementioned exception is thrown, it is handled by VEH (Vectored Exception Handler). Let’s have a look in IDA:

The function added as a handler:

The exception handler responsible for patching the code

The exception handler fetches values of the registers (ECX, EDX) from the exception context. It passes them to the function that is responsible for resolving address of the API to be called (fetch_by_hash). The obtained address is then stored into EAX of the exception context. After that, we can see the code patching. First, the memory protection at the point where exception was thrown, is set to writable. Then, at EIP + 3 (3 bytes after the point of the exception) the patch is being made: CALL EAX is written. As we know, the EAX contains now the address of the API, so this is what will be called here. The EIP of the exception is set to point to this line, so this will be the next instruction after the exception handler finishes.

Aligning the instructions

The instructions generating the exception (i.e. div eax) are 2 bytes long, while the patch is created with 3 bytes offset. Due to this fact, between the instruction causing the exception, and the newly written CALL EAX there is a trash byte.

Trash byte between the line causing the exception, and the written call

This trash byte destroys the alignment of the instructions, and causes problems to IDA in interpreting the code that follows after (by default it is interpreted as data, and we need to change it manually each time).

In order to fix the alignment, I decided to patch the handler, and make it write aligned instructions. However, the space in the code was too small for making appropriate assembly modifications. So I decided to rewrite the full exception handler, and then hook the function AddVectoredExceptionHandler so that it will set my own version instead of the original one. For hooking I used MS Detours (with my template), but any sort of hooking engine will do the job.

The snippet below shows the modified handler:

LONG __cdecl my_patch_some_code(struct _EXCEPTION_POINTERS *ExceptionInfo)
    struct _EXCEPTION_POINTERS *except_ptr; // esi
    PCONTEXT v2; // eax
    int edx_val; // edi
    int ecx_val; // ebx
    DWORD new_eax; // edi

    except_ptr = ExceptionInfo;
    v2 = ExceptionInfo->ContextRecord;
    edx_val = v2->Edx;
    ecx_val = v2->Ecx;

    new_eax = resolve_func(edx_val, ecx_val);
    if (!new_eax) {
        return 0;

    VirtualProtect((LPVOID)(except_ptr->ContextRecord->Eip-2), 0x1000u, 0x40u, (PDWORD)&ExceptionInfo);
    except_ptr->ContextRecord->Eax = (DWORD)new_eax;

    *(WORD *)(except_ptr->ContextRecord->Eip + 2) = 0x9090;// NOPs
    *(WORD *)(except_ptr->ContextRecord->Eip + 3) = 0xD0FF;// CALL EAX

    except_ptr->ContextRecord->Eip += 3;
    VirtualProtect((LPVOID)(except_ptr->ContextRecord->Eip-2), 0x1000u, (DWORD)ExceptionInfo, (PDWORD)&ExceptionInfo);
    return -1;

As we can see in above code, I replicated the original handler with just one difference: added a NOP instruction before CALL EAX. This will be enough to achieve the main goal: aligning the code. But I decided to still improve it a bit…

The instructions that cause exceptions to be thrown are diversified. Sometimes we can see it is an attempt to read from a NULL address, sometimes a division by 0, and so on. It will be a bit cleaner if we can replace them with only one type: for example by the “read from the NULL address”. So I modified my hook so that it will also replace this part:

// change all exception to follow the same pattern:
if (*(WORD *)(except_ptr->ContextRecord->Eip) != 0x008B) {
  *(WORD *)(except_ptr->ContextRecord->Eip - 2) = 0xC033;// mov  eax, [eax]
  *(WORD *)(except_ptr->ContextRecord->Eip) = 0x008B;// mov  eax, [eax]

The code of the full DLL patching the crackme is available here.

It can be injected into the crackme with the help of dll_injector:

The above example shows the most classic way of hooking. Yet, at the time when I was solving this task, I wanted to do multiple experiments and many quick changes in the hooks. So, instead of running the evil.exe in a separate process, and hooking it by injecting a DLL, I wanted something faster: all-in-one loader. The code is available here. This loader requires that first we convert the evil.exe into a DLL, by EXE_to_DLL. Then, we just load this DLL within the current process, which hooks itself.

Now, the new handler will produce properly aligned instructions: the trash byte has been replaced with a NOP.

However, we need to keep in mind that it modifies the code only as it goes: it will patch only the branches that have been executed. So, the others are still not cleaned. Yet, it is enough to get a decent overview of the code, and the few branches that haven’t been taken can be cleaned later by manual patching (or by an IDA script). Also, by sending various data to the socket, we can cause more branches to be taken, so that more code will be cleaned.

After running the crackme for a while, with the hooked handler, we can dump it again from the memory by PE-sieve, to get the modified version.

Now IDA has no problem with interpreting the modified part of the code:

The dumped version of the app, with the TAGs from the Pin tracing session applied

Understanding the decompiled code

If we managed to get rid of all trash instructions in a certain function, it becomes possible to decompile the code. This makes analysis a lot easier.

We know that the application uses a raw socket, so the buffer that is received by recvfrom contains IPv4 headers, as well as UDP headers (not stripped). Filling those structures in IDA can make interpretation a lot easier.

struct ip_v4
_BYTE ver_and_IHL;
_WORD total_len;
_WORD fo_and_flags; // flags : 3 , fragment offset: 13
_BYTE ttl;
_BYTE protocol;
_WORD checksum;
_DWORD source_addr;
_DWORD dst_addr;

struct udp_hdr
_WORD source_port;
_WORD dst_port;
_WORD len;
_WORD checksum;

We can see that the port in the UDP header must be set to a certain value: 0x1104 (4356).

The WORD in IPv4 header that contains bitfields: flags and fragment offset is checked by AND with 0x80. It means the “reserved” flag must be set:

NOTE: The “reserved” flag is also called “an evil bit” (read more here) – so this is probably the origin of this task’s name.

Only if those conditions are fulfilled, the received data will be processed further.

Then, the received data from the packet is rewritten to another, custom structure.

The received data is being copied

My reconstruction of this structure is given below:

struct stored_packet_data
  _DWORD source_addr;
  _DWORD dst_addr;
  _WORD source_port;
  _BYTE *data_buf_ptr;
  _WORD data_len;

Decompiled and cleaned code of the receiving function is available here.

The receiving function does nothing but the initial checks of the data, and the filling of this structure. But there is another function, running in a separate thread, that reads this filled buffer and verifies it further (I denoted it as to_some_rc4):

Those two threads are run with the same buffer as an input argument

By analyzing the second function, we can see that the first value of the data buffer must be either 1, 2, or 3, or other (>3). It will be used as a command to be executed:

We can further see some CRC32 calculating function, and some decrypting. So, this must be the exact function to analyze in order to obtain the flag.

The decompiled code of the thread processing the buffer is available here.

Patching out the defensive checks

At this point I decided that it will be the most convenient to follow the flow by dynamic analysis. But as we saw, the crackme is loaded with various defensive checks that doesn’t let it run under the debugger. So, in order to continue, they must be patched out.

Earlier I already patched out one of the defensive checks (the one causing the far jump). It required nothing but NOPing a single conditional jump. But to remove the rest of them will be much more difficult.

First, the checks are initialized.

The same function is responsible for patching NTDLL:

Functions responsible for various defensive checks are added into the map:

Only one of those checks will be deployed: it is selected randomly, basing on the current time. This explains non-deterministic behavior during the tracing.

Unfortunately, we cannot simply NOP the call to this function, because that would cause crashes later. The map of the checks is used in multiple places, and it cannot be empty.

So, instead of trying to remove it, I decided to neutralize it in a less invasive way. As we saw, there are various functions with checks added to the map, with various IDs. Those functions vary in the complexity. The simplest of them seemed to be the one that just calls CheckRemoteDebuggerPresent, and causes application to exit if the debugger was detected.

Inside the check_remote_debug – original version

I made a patch inside this function, just to blind the check (changed the conditional jump into unconditional):

Then I modified the mapping, so that the above function will be the only one added to the map, at every possible index:

By this way we still have the checks running, but in a way that is not disturbing. The crackme can be run under the debugger with no problems.

Patching the IPv4 flag

As we saw during static analysis, the crackme proceeds with the received buffer only if the IPv4 “reserved” flag is set. The problem is, it is not a standard situation. When we send the packet by nping, the “reserved” flag will be clear.

Rather than trying to somehow enforce passing this flag, I decided to simply do the patch in the code, to avoid it being checked.

NOPed the conditional jump

Analysis of the verification function

Finally we are ready for the dynamic analysis of the verification function.

I decided to make some experiments by sending the buffer with one of the expected commands with the help of nping, and then watch under the debugger how it is processed.

Command #1


nping --udp -p 4356 --dest-ip -c 1 --data 01000000

The command 1 causes a fake flag to be decrypted:

Yet another artifact that gets decrypted on this command is a BMP, that is a frame from the famous “Rick roll” video clip. Interestingly, this frame is being displayed on the console.

We can easily conclude, that this command serves no other purpose than being a red herring.

Command #2

At first, sending the buffer with this command was causing an application to crash. After taking a closer look, I realized that the DWORD defining the command must be followed by another DWORD : this time defining the size of the buffer that comes after that. When we send a buffer in a valid format, it is being copied, and then compared with four keywords, that are dynamically decrypted:

"L0ve", "s3cret", "5Ex", "g0d"

If the comparison passes, the crc32 of the buffer is being calculated, and stored in another buffer. Initially I dismissed those strings, thinking they are yet another red herring, but they turned out to be very important…

Command #3

This command expects three additional arguments (DWORDs). The first one must be 3, second: 2, and the third: ‘MZ’.

nping --udp -p 4356 --dest-ip -c 1 --data 03000000020000004d5a0000

After we send the buffer in the expected format, something new will be decrypted with the help of RC4 algorithm (using WinAPI, and the patched version of the function CryptImportKey). I expected it to be the flag…

Obtaining the flag

Initially, when I tried to send the command 3, it was reaching the RC4 decryption part, but the buffer used as the RC4 key was empty. At first I thought that maybe I destroyed something because of my patching, so I asked for a hint if this is really the way this part of the crackme should look like. Fortunately, it turned out that everything is fine, I just should take a closer look at what other command can fill this key.

After some more experiments it became clear that the CRC32 checksums from the command #2 are going to be filled into the RC4 key buffer.

So, all what was needed at this point was to send those buffers one by one, in a properly formatted packets:

02000000 05000000 4C 30 76 65 00 -> L0ve
02000000 07000000 73 33 63 72 65 74 00 -> s3cret
02000000 04000000 35 45 78 00 -> 5Ex
02000000 04000000 67 30 64 00 -> g0d


dnping --udp -p 4356 --dest-ip -c 1 --data 02000000050000004C30766500
nping --udp -p 4356 --dest-ip -c 1 --data 020000000700000073336372657400
nping --udp -p 4356 --dest-ip -c 1 --data 020000000400000035457800
nping --udp -p 4356 --dest-ip -c 1 --data 020000000400000067306400

This causes filling of the full RC4 key.

Then we need to send the command 3:

nping --udp -p 4356 --dest-ip -c 1 --data 03000000020000004d5a0000

This will trigger the decryption of the flag.

CryptImportKey is called

Finally, the flag got decrypted!


No more exceptions please! This is how we reached the end of this challenge…

Posted in Uncategorized | Tagged , , | 2 Comments

Flare-On 7 – Task 10

This year’s FlareOn was very interesting. I managed to finish it with 87th place. In this small series I will describe my favorite tasks, and how I solved them. I hope to provide some educational value for others, so this post is intended to be beginner-friendly.

My writeup to the previous task can be found here.


In this task we are provided with the following package (password: flare). It contains a 32 bit ELF (break), and a description that says:

As a reward for making it this far in Flare-On, we've decided to give you a break. Welcome to the land of sunshine and rainbows!

No hints this time, only trolling! And this is what we must get used to while doing this task that turns out far from the promised easy. Yet, it is full of red herrings and false hints…

This challenge is the most interesting crackme I ever encounter. Yet, it is very exhausting. In is in reality, it is more like 3 tasks in one. Instead of searching for one flag, we need to collect 3 different fragments of it. Each of them is protected by a different cipher that we need to break. But this is not the only challenge! Even to make sense of the code is going to be difficult – the flow is protected using some sort of nanomites – at least the first two layers. Functionality-wise, each layer is a bit different. Even to find where is the code that we need to analyze, may be a challenge itself (stage 3 is a shellcode, that is loaded to the main application by an overflow, that is exploited by the crackme itself).

Walk-through my solutions for particular parts:

Thanks to everyone who gave me hints during this long journey!

Posted in CrackMe | Tagged , | 1 Comment

Flare-On 7 – Task 9

This year’s FlareOn was very interesting. I managed to finish it with 87th place. In this small series I will describe my favorite tasks, and how I solved them. I hope to provide some educational value for others, so this post is intended to be beginner-friendly.


In this task we are provided with the following package (password: flare). It contains a 64 bit PE (crackinstaller.exe), and a description that says:

What kind of crackme doesn't even ask for the password? We need to work on our COMmunication skills.

By the name and the description we can guess that it is going to be an installer for some other components, and also that some knowledge about COM (Component Object Model) is going to be required.


Before we go into details of the solution, lets see the roadmap of the elements that we are going to discover.

The following diagram presents the loading order of particular components involved in this task:

The elements with solid borders are loaded from files. The elements with dash line borders are loaded in-memory only. Yellow – executes only in a usermode, blue – only in a kernelmode, gray – part in usermode and part in kernel mode.


The crackme runs silently, without displaying any UI. In order to see what is happening during execution, we can use some methods of tracing the activities (i.e. ProcMon). I wanted to see what exactly are the APIs called from the main application, so started by running it via Tiny Tracer. In order to get the complete trace, it must be run as an Administrator.

This is the trace log that I obtained:

It gives a pretty good overview what is going on at what points of the code. Let’s go through the log first, and see how much can we discover by reading the order of APIs called.

The first fragment that triggered my interest is the following:


By reading it we can find that the crackinstaller:

  1. drops some file (CreateFileW, CreateFileMappingW, MapViewOfFile, CloseHandle)
  2. installs it as a service (OpenSCManager, OpenServiceW, StartService)
  3. sends an IOCTL (DeviceIoControl) – most likely the receiver is this newly installed service, that is a driver
  4. uninstalls the created service (OpenServiceW, DeleteService)

Another interesting fragment of the log follows the previous one:


In this fragment we can see that some file is being dropped (CreateFileW, WriteFile). Then it is registered as a COM server.

So, at this point we can expect two elements are going to be installed: a driver (which is uninstalled right after use) and the COM component. In order to find them we must see what are the files that are being dropped. We can load the generated .tag into x64dbg, and set breakpoints on the interesting functions.

The dropped components

First I set breakpoints at CreateFileW to see what are the paths to the dropped components. We can collect them from those paths once they are saved.

As we observed before, there are two elements dropped:

  1. The driver: da6ca1fb539f825ca0f012ed6976baf57ef9c70143b7a1e88b4650bf7a925e24
    • dropped in: C:\Windows\System32\cfs.dll
  2. The COM server: 4d5bf57a7874dcd97b19570b8bad0fa748698671d67593744df08d104e6bd763
    • dropped in: C:\Users\[username]\AppData\Local\Microsoft\Credentials\credHelper.dll

The first element executed is the driver, so this is where I started the analysis.

The dropped driver (cfs.dll)

As we could find out by reading the comments on Virus Total, this is a legitimate, but vulnerable Capcom driver, that was a part of the Street Fighter V game (more about it you can read here and here). Due to the vulnerable design, this signed driver allows for execution of an arbitrary code in kernel mode. By sending a particular IOCTL we can pass it a buffer that will be executed (it is possible since the driver disabled SMEP as well). This vulnerability makes it a perfect vector to install untrusted kernelmode code on the machine – that feature is used by the current crackme.

First, the driver is dropped from the crackinstaller into:


And installed as a service. Its path is:


Then, the aforementioned IOCTL is being called. Below you can see an example of the parameters that were passed to the IOCTL (DeviceIoControl function), along with their explanation:

1: rcx 00000000000001E4 ; driver
2: rdx 00000000AA013044 ; IOCTL
3: r8 0000007B3EAFF6C8 ; input buffer
4: r9 0000000000000008 ; input buffer size
5: [rsp+28] 0000007B3EAFF6C0 ; output buffer

The input buffer turns out to be the following small stub, written in additionally allocated executable memory page:

025E86BD0008 | sti
025E86BD0009 | mov rdx,25E86AF2080 ; address of: driver.sys
025E86BD0013 | mov r8d,5800 ; size of the driver
025E86BD0019 | mov r9d,3170 ; address of DriverBootstrap function
025E86BD001F | jmp qword ptr ds:[25E86BD0025] ; function inside crackinstaller.exe

The stub sets parameters, that are going to be used by the next function. Then it leads the execution back to the crackinstaller.exe – to another function (at RVA 0x2A10). Although the dropper is a userland application, this part of the code will be called in a kernel mode – because the execution to this function is redirected via the kernelmode component.

This function is responsible for loading yet another driver (driver.sys) that is also passed as one of the parameters.

By looking at the loading function, we can see that this driver is going to be mapped manually into the kernel-mode memory. The “DriverBootstrap” function exported by driver.sys is a kernel-mode Reflective Loader variant (similar to this one).

After this installation, the first driver (cfs.dll) gets unloaded and uninstalled – however, the second one: driver.sys – persists in the memory (in contrast to usermode applications, the memory allocated by a driver is not freed automatically when the driver is unloaded).

What I initially did, was dumping this driver.sys in a user mode (before the IOCTL was executed), and analyzed it statically. Then, I tried to load it as a standalone driver. However, it was a mistake. This driver has a buffer that is supposed to be overwritten on load, in kernel mode. At this stage, it is not filled with the proper content yet. This buffer is crucial for decoding a password. Since I overlooked the part that was overwriting it, although I understood the full logic of the driver, the output that I was getting was a garbage. After consulting it with other researchers, confirmed that the output was supposed to be a valid ASCII – so I realized that I missed something on the way, and I shouldn’t have been making shortcuts and dumping the driver in the userland. I then decided to walk through the full way of loading the driver in the kernel mode, and dumped it again in kernel mode, just before its execution.

The driver.sys

Before we move further to the dynamic analysis, let’s have a look at the driver.sys in IDA. As I mentioned earlier, dumping this driver in userland is not a perfect option (some important buffer is filled on load in kernel mode). However, for now, this version is good enough for the static analysis of the driver’s logic.

As always the execution starts in DriverEntry.

In our case, this function redirects execution to another one, which I labeled as “driver_main”.

Click to enlarge

Some interesting strings inside the driver are obfuscated – they are dynamically decoded just before use. There are various ways to retrieve them – I have chosen to write a simple wrapper in libPeConv that allowed me to call the decoding function without analyzing it, and apply it on the chosen buffers.

This module (driver.sys) is a filter driver with an altitude of 360000, which means “FSFilter Activity Monitor”.

The main function is pretty simple: its role is to initialize the device, and to set the callback that will be used for event filtering. The function CmRegisterCallback sets the callback that will be triggered each time an operation on Windows Registry is executed.

The routine that is registered to handle the callback (DispatchCallback) must follow the prototype of EX_CALLBACK_FUNCTION.

The second argument (denoted as Arg1) is of type REG_NOTIFY_CLASS – it informs about what type of the operation triggered the callback. In our case the event is processed further only in the case if the value of the REG_NOTIFY_CLASS is 26 (RegNtKeyHandleClose ?). The next argument (Arg2) holds a pointer to the structure of different types, depending on the value of the previous one (Arg1). In our case, Arg2 holds the pointer to the UNICODE_STRING with the name of the operated Registry Key.

The name of the key is copied into additionally allocated memory with a tag “FLAR”. It is compared further with a dynamically decoded string:

Only if the name of the key matches the hardcoded one, the next, more interesting part of the code is executed. If we checked the changes in the registry made during the execution of crackinstaller, we will notice, that this registry key is created on the installation on the COM server. So, this is how those components are tangled together.

The next part of the driver’s code decrypts some mysterious buffer. We can recognize the involved algorithms by their typical constants. First, SHA256 hash is calculated from a buffer hardcoded in the driver (denoted as “start_val”). Then, the hash is used as a key for the next algorithm, that is probably Salsa20 (eventually it may be a similar cipher, ChaCha).

Click to enlarge

At this point we can guess that our next goal is to get this decoded buffer.

In order to get the valid solution, we need to first get the overwritten version of the above driver, so, the one that is loaded in the kernel mode.

Notes on kernel mode debugging

Before we can start kernel mode debugging, we need to have an environment set up. The setup that I used is almost identical to this one. Yet, there are few differences that I am going to mention in this part.

First of all, we need a 64 bit version of Windows – I used Windows 10 64 bit VM on VirtualBox (linked clones for Debugee and Debugger).

As always, the usermode analysis tools (i.e. x64 dbg) as well as the crackme itself, are going to be run on the Debugee VM. The kernel mode debugger (WinDbg) will be run on the Debugger VM, connected to the Debugee.

Configuring the Debugee VM

There are few more steps (in addition to the ones described here) that we have to take in order to configure the Debugee VM. In case of Windows 10, explicitly setting the debug interface is necessary (by default, even if we enable debugging on the machine, it is going to be set in a local mode, and we will not be able to connect the Debugger VM). Since we are going to establish a debug session over a serial port, the following settings apply:

bcdedit /dbgsettings serial debugport:1 baudrate:115200

We can test if the proper options are applied by deploying the command dbgsettings without parameters:

bcdedit /dbgsettings

Expected result:

DbgSettings after

We need to remember that on 64 bit Windows a driver must be signed in order to be loaded. This is not gonna be an issue if we want to load the first driver: cfs.dll – because this is a legitimate, signed driver. However the second one: driver.sys – which is more important to the task – is not signed. It loads just fine as long as the first, signed driver is used as a loader. But for the sake of the convenience, at some point we are going to load the driver.sys as a standalone module. To be able to do so, we must change an option in bcdedit, in order to allow unsigned drivers to be loaded. It can be done running this command on the Debugee machine:

bcdedit /set TESTSIGNING ON

After changing the settings, the system must be rebooted.

We also have to disable Windows Defender, otherwise the crackme will be mistaken as a malware and removed.

Dumping driver.sys in kernel mode

In order to understand what exactly is going on, and not to miss anything, I decided to walk through the full flow since the IOCTL is executed inside cfs.sys, till the driver.sys is loaded in memory.

To start following it in kernel mode, we need to locate the address of the function inside cfs.dll that is going to be triggered when the IOCTL is sent. Let’s open cfs.dll in IDA, and see the function registered to handle IOCTLs:

Inside we can see the IOCTLs numbers being checked, and then the function to execute the passed buffer is being called:

In the next function (that I labeled “to_call_shellcode”) we can see the operations of disabling SMEP, calling the passed buffer, and then enabling the SMEP again:

The function disabling SMEP :

So, we need to set the breakpoint at the address just after the function disabling SMEP returns, because in this line there is a call passing execution to the shellcode. This happens at VA = 0x10573 (RVA = 0x573):

If we step into that call in WinDbg, we will be able to follow the passed shellcode executed in kernel mode.

Before we will go to set the breakpoint in kernel mode, we need to load the crackinstaller into a userland debugger (such as x64dbg) and set the breakpoint before the DeviceIoControl function is called.

Then, on the Debugger machine (connected to the Debugee where the crackme runs) we deploy WinDbg and connect to the Debugee.

We can set a breakpoint on load of the cfs.dll in WinDbg by:

sxe ld cfs

After that, we run the crackme. The breakpoint should hit and the Debugee freezes. With the help of the following command:


We can see the list of all the loaded modules, and find the module of our interest on the list:

If we want to view this list from the Debugee perspective, we can also use Driver List by Daniel Pistelli.

Now, let’s set a breakpoint on the offset inside the driver, that executes the shellcode:

bp cfs + 0x573

And we resume the Debugee. Lets step over the breakpoint at DeviceIoControl in x64dbg. Now, in the Debugger VM, we can see again that the breakpoint has been hit.

Opening the Disassembly window allows us to see this line in the original context:

Click to enlarge

As we can see, it is the same code fragment that we observed in IDA before, analyzing the relevant fragment of cfs.dll.

Using the command:


We can step into the call. And what do we see? The very same shellcode that we observed being passed to the DeviceIoControl!

The address moved to RDX is the address of the buffer holding driver.sys.

Now as we know from the previous analysis, the execution should be redirected back to crackme.exe, but the execution will take place in a kernel mode. We can set the breakpoint at the first jump which will do the redirection

bp [address]

After setting the breakpoint, we can resume the execution (“g”) and once the breakpoint is hit, step in again (“t”):

This is where we end up:

…and it is exactly the function at 0x2A10 in crackinstaller.exe, that we found before. As we know, this function will do the modifications in the driver, and then redirect execution to there, inside the DriverBootstrap function (RVA = 0x3D70 , raw = 0x3170).

By analyzing the flow of the corresponding function in crackinstaller, we can guess that the redirection happens at RVA = 0x2c26

inside crackistaller.exe

Let’s set a breakpoint there, and resume the execution.

At this point we can see the function PSCreateSystemThread is being called. The start routine is going to be the DriverBootstrap function.

The address of the bootstrap function is stored in RAX register:

At this point the driver is in the raw format, so we know that the raw address of the bootstrap function was used: 0x3170. By subtracting it from the whole address, we can get the driver’s base. By looking up this address in the Memory window we can see that indeed this is where the driver has been loaded:

Now it’s time to dump the driver. We can do it with the help of command .writemem. We need to supply it the path where we want to save the dump, and the range to be dumped. The size of the driver was supplied to the shellcode, and it is 0x5800. So, we can dump the range in the following way:

The new version dumped as “mydriver.sys”

After having the driver dumped, we can see what was patched. The comparison done via PE-bear:

Comparison – the original vs the modified

The patched content is the buffer that was used to derive the Salsa20 key (the “start_val” is filled with a string “BBACABA”).

Extracting the password in kernel mode

After the driver.sys is loaded in the memory, the crackinstaller.exe installs the COM server. On installation, the COM server creates the Registry key with the server GUID: “{CEEACC6E-CCB2-4C4F-BCF6-D2176037A9A7}\Config”. Creation of this key triggers the filter function inside the driver.sys to decrypt the hardcoded password. Our next goal is to fetch this password from the memory while it is being decoded.

Finding of this password can be achieved easily – all we need to do is to set a breakpoint in WinDbg, that will be triggered after the password is decoded, and then dump the output from the memory.

Yet, setting the breakpoint on the function of the reflectively loaded driver would be very inconvenient. Reflectively loaded driver will not be listed among the loaded modules, so we cannot reference it by its name. We also don’t know the base at which it was loaded. So, this is the point where it comes very handy to load the driver.sys independently.

For this part, we are going to use the patched version of the driver.sys – the one that was dumped as mydriver.sys in the previous part.

Loading the driver.sys as a standalone driver

Once we dumped the modified version of the driver, we can load it as an independent module. However, now the loader is not signed, so it won’t load in Windows unless we disable signature checking in the bcdedit (as mentioned before, reboot is required each time we change the settings):

bcdedit /set TESTSIGNING ON

We install it on the Debugee VM:

sc create [service name] type=kernel binpath=[driver path] 
sc start [service name] 

Let’s break the execution via Debugger VM (WinDbg : Debug -> Break) and see if the driver.sys is present on the list of the modules, using the command:


We should see it on the list, just like on the example above.

Dumping the password from the memory

Now we can set the breakpoint inside the filter function. As mentioned before, it is gonna be called each time when some registry key is read/written. Then the name of the key is going to be compared with the hard-coded one (which is dynamically decrypted). If the name matches, another buffer is decrypted with the help of Salsa20. So, the password decryption is executed immediately when the COM server creates this key.

We can set the breakpoint after the key name verification is passed (RVA = 0x48C9):

bp driver + 0x48C9

In order to trigger the event, we need to use the the credhelper.dll now, and run the DllRegisterServer function. It can be done just by running (on Debugee):

rundll32.exe credhelper.dll,DllRegisterServer

This will trigger the breakpoint that we can follow in WinDbg…

Let’s set a breakpoint at the address where the Salsa20 algorithm was executed (it happens at RVA = 0x49AC):

driver.sys – IDA view
bp driver + 0x49AC

After that we can resume the execution


…and the breakpoint will be hit:

At this point, the address of the output buffer is in the R8 register. So we need copy this address to the memory view. Now we can step over the function.

And the decryptet content got filled in the buffer that we previewed:

So this is the password: “H@n $h0t FiRst!”.

Now we need to learn how to use this password to decode the flag…

The COM component

The driver.sys is quite small, and there is nothing more in it to decode, so I guessed the next pieces of this puzzle are hidden somewhere in the COM component. Let’s take a look…

We aleady saw in the Pin tracer log. that one function from this DLL is being called:


If we open the credhelper.dll in IDA, we can see that this function is probably the one responsible for decoding the flag:

We can see the registry keys “Password” and “Flag” being referenced.

However, if we take a closer look, we will see that the function responsible for setting the Flag is not inside the DllRegisterServer.

There are two unreferenced functions that manipulate the same registry keys:

The first one, reads the value of the Password from the registry, and initializes some structure with its help (snippet here).

The other is responsible for decoding the Flag (snippet here).

I guessed that the “Password” must be the string decoded from the driver.sys. So, we need to fill it in the registry, and then call those functions in proper order – probably using the COM interface.

This should probably be the “right” way to solve this task. However, when I was taking a closer look at those functions, they started to remind me something familiar: the functions used by RC4 encryption algorithm, which is commonly used in malware.

So, my guess was:

  1. The function that I denoted as “get_password_value” was an RC4 password expansion function – it was initializing the context with the password (“H@n $h0t FiRst!”).
  2. The function that I denoted as “set_flag_value” was using this context, and decoding a hardcoded buffer by the RC4 decryption algorithm

I dumped the hardcoded buffer, and decided to check those assumptions using CyberChef. It turned out correct: S0_m@ny_cl@sse$

So, the final flag was RC4 encrypted, with the password extracted from the driver.

Posted in CrackMe, KernelMode, Tutorial | Tagged , , | 6 Comments

Flare-On 6 (tasks 10-12)

Flare-On 6

Flare-On Challenge is an annual competition organized by FireEye (the FLARE team). It is like a marathon of reverse engineering. Each year we get 12 crackmes of increasing difficulty to solve. You can download the tasks here.

This year I finished as 106.

In this post I will describe the last 3 tasks of the competition:

WARNING: Work in progress. I will be adding more details to this post.

Task 10 – “Mugatu”

[Mugatu.7z; password: flare]

In this task we get an EXE (Mugatu.exe) and two encrypted GIFs: best.gif.Mugatu, the_key_to_success_0000.gif.Mugatu.


The EXE is a ransomware, and the two GIFs are encrypted by it. We are supposed to decrypt one of those GIFs (best.gif.Mugatu) in order to get the flag.

The EXE is slightly obfuscated. For example, the Imports are replaced at runtime by some other imports. So, analyzing it statically we may get confused. In order to analyze it statically with a valid result, we should recover its real imports first. In order to do this, we can just dump it from memory once it is run by any dumper that can reconstruct the imports. In my opinion, the best for this task is Scylla. Once we have the main exe dumped with proper imports reconstructed, it becomes much more readable.

Inside the main executable there is a payload, that is the core of the ransomware. It is manually loaded by the main EXE. We can unpack this DLL statically in the following way:

  • in the resources of the main EXE there are 2 bitmaps on the same size.
  • We need to XOR one with another. (I did it using: )
  • As a result, we will get an executable (with some padding at the beginning).
  • We need to remove the padding, and that’s how we’ve got the resulting DLL, named Derelicte.dll.dll_name

However, it is not that simple. If we extracted the DLL statically, we will find that its imports don’t make much sense. It is because the main EXE replaces them on load. So, we need to find the valid imports for this DLL.

We can see the fragment of EXE’s code where the DLL is manually loaded.

The imports are loaded in an obfuscated way, that makes them quite difficult to reconstruct. They are not filled directly to the thunks, but into a proxy list, that looks in the following way:

Due to the used import obfuscation, the previous simple trick of running it and dumping won’t work again. We could try to deobfuscate them from the memory, but the better approach is to patch the loader, and just prevent the obfuscation from being applied.

Let’s take a look at the function that do the import loading. Just after the Import address is fetched, it is obfuscated:

It is being filled in the chunk of the emitted code:

In order to prevent the obfuscation, I applied some patches in the loader:

1) do not obfuscate the import address


2) write the import address directly to the thunk, not to the proxy


Then I dumped it with PE-sieve with option imp 3 (complete Import Table reconstruction). As a result I got a valid DLL that I could easily analyze statically. The import table reconstructed by PE-sieve:

After the DLL is manually loaded within the main EXE, it’s Entry Point (the DllMain function) is being called:

Then, an exported function is being called, with a parameter “CrazyPills!!!”:


Once we follow this function in a DLL, we will see the logic responsible for encrypting files.

The function that does the encryption is not called directly, but via obfuscated callback:


This callback is deobfuscated by XOR with the argument supplied to the function:


By following it in the debugger to the place where the deobfuscation is done, we can see the address of the callback function:

The callback is the function at RVA = 0x16b9.

We can follow it in IDA:


If we analyze it closer, we will find that it is an XTEA algorithm, but with few modifications. First we need to write a decrypting function for it.

I found this implementation very helpful to base my decryptor upon. The few things that are changed comparing to this implementation are: the delta, and the key buffer type (in the original implementation the key is an array of DWORDs). The second modification makes the strength of the crypto significantly lower: the key has only  4 BYTEs, not 4 DWORDs as in the valid implementation, so it is easy to be bruteforced.

The solution to this task:


Task 11 – “vv_max”

[vv_max.7z; password: flare]

In this task we are facing a Virtual Machine, using AVX2 instructions. That’s why it will not work on some older processors which have no AVX2 support. If we try to run it on such processor we get the following message: “Your processor/OS is ‘too old'”.


If the machine supports AVX2, it passes the verification, and prints “Nope!” in case of a wrong input.

Overview of the main function responsible for verifying the arguments.

The function that I renamed to “vm_process_bytecode” is responsible for calculating some “hash” from the input. Then in the function “vm_check_flag” this “hash” is being compared to a hardcoded one.

Inside this function “vm_check_flag”:

At this moment we know that the crackme expects 2 commandline arguments. The first one must be “FLARE2019”, the second: a 32 bit long string. The second argument is processed by a function implemented by the VM, and the result is compared with a hardcoded “hash” that is 24 bytes long.

The fragment of code responsible for making the comparison:

The valid “hash”:

70 70 B2 AC 01 D2 5E 61 0A A7 2A A8 08 1C 86 1A E8 45 C8 29 B2 F3 A1 1E

Rather than analyzing the functionality in the details, I decided to treat the VM as a black-box, and make some tests, checking how the output changes depending on the given input.

I noticed that the input is processed in chunks. A single chunk of 4 bytes gives 3 bytes of the output. Also, I understood that it is not a hash, but rather some encoding, because a change in a single byte of the the chunk content was not fully changing the output content.

At this moment I decided that I  will try to brutforce the solution, by finding appropriate chunk of the input for each chunk of the output. Yet, I wanted to avoid re-implementing the full VM, so I decided to go for some sort of instrumentation of the original code.

After trying various options, I decided to use the patched version of the original sample. I patched in this way that the returned value (DWORD) will contain the selected bytes of the output.

The modified version of the  “vm_check_flag” function:

I removed the part responsible for comparing the “hash” calculated from the input with the hardcoded one. Instead, we will just copy its chunk into EAX register. Then, we need to NOP out the code that sets the EAX register to 0.

Let’s test the prepared sample using one of the saved input-output sets. We will be checking the output chunk with the help of a command:

echo %errorlevel%


Input: "01111111111111111111111111111119"
Output: D3 5D 75 D7 5D 75 D7 5D 75 D7 5D 75 D7 5D 75 D7 5D 75 D7 5D 75 D7 5D 7D

DWORD=-680174125 -> D7755DD3 (little endian) -> D3 5D 75 D7

As we can see, the returned value is valid.

Now we just need to write a brutforcing application that will integrate the patched module, and crack the full value, chunk by chunk. After each iteration we need to patch the app again and change the ECX value, in order to advance to the next chunk. So, in the first round ECX = 0.

Since each output chunk is 3 bytes long, we will use only 3 bytes from the returned DWORD. Also, the value of the ECX will be advancing in the increments of 3.

During the tests with the brutforcer I noticed, that rather than trying to crack the 4 byte long input chunk as a whole, I should crack it by finding:

  1. 1-st byte of the input that gives the 1-st byte of the output
  2. 4-th (last) byte of the output hat gives the 3-rd (last) byte of the output
  3. two middle values of the input (2nd and 3rd) that gives middle (3rd) value of the output

Finding this was able to speed up the cracking time a lot. I also automated the process of patching the ECX in the modified vv_max executable.

This is the complete solution:

During the process of cracking I started to notice that the output looks like something familiar… Yes, it is Base64! I noticed it too late – but from the other hand side it was so much fun to write this crazy bruteforcer for it!

Task 12 – “help”

In this task we receive a memory dump: help.dmp, along with a pcap: help.pcap. Both have been captured on the infected system. Our task is to analyze the infection.

I started by using volatility. First, I found what was the profile appropriate to analyze the given OS. For some reason volatility detected it as Windows 10 64 bit. However, loading the dump with this profile resulted in errors. Most of the volatility functions were not working. It turned the OS is just detected wrongly. If we open the same dump in WinDbg, we see that in reality it is Windows 7 SP1 64bit. We needed to manually find the appropriate volatility profile. The one that turned out to be valid is:


Finally, after this change, we could see a significant improvement, and volatility started to work as it was supposed to.

I poked around, listing processes, network connections, drivers… One thing that drawn my attention was a driver man.sys, with a path containing “Flare On 2019” keywords.

volatility -f help.dmp --profile=Win7SP1x64_23418 modules

Volatility Foundation Volatility Framework 2.6
Offset(V)          Name                 Base                             Size File
------------------ -------------------- ------------------ ------------------ ----
0xfffffa800183e890 ntoskrnl.exe         0xfffff80002a49000           0x5e7000 \SystemRoot\system32\ntoskrnl.exe
0xfffffa800183e7a0 hal.dll              0xfffff80002a00000            0x49000 \SystemRoot\system32\hal.dll
0xfffffa800183e6c0 kdcom.dll            0xfffff80000bac000            0x2a000 \SystemRoot\system32\kdcom.dll
0xfffffa80039c4630 bthpan.sys           0xfffff880032c8000            0x20000 \SystemRoot\system32\DRIVERS\bthpan.sys
0xfffffa800428ff30 man.sys              0xfffff880033bc000             0xf000 \??\C:\Users\FLARE ON 2019\Desktop\man.sys

Unfortunately we cannot dump it by volatility, because its header is erased. So, I loaded the same dump to WinDbg and dumped it using .writemem:

.writemem C:\dumps\man1.bin fffff880`033bc000 fffff880`033cb000

Since the driver has no header, we need to reconstruct it manually. We don’t need to get all the sections right – we need just basic things to make it suitable for static analysis. The most important is to get the imports right.
First, I copied the PE-header from another driver – I used it as a base on which I started to rebuild. Then, I reviewed the file in a hexeditor, in search for familiar patterns. I could distinguish two sections, so I added their headers:
I noticed where the list of the imported DLLs is located, and tried to find the beginning of the structure, in order to fill it in the Data Directory.
My final version of Data Directory has the following form:
Now we can open the file in IDA, just like any other PE. We still need to find the Entry Point (DriverEntry), and fill it in the header. I found some unreferenced function it at RVA 0x5110 – it is very likely to be the Entry Point:


The full reconstructed Optional Header:

Let’s open the driver in IDA again, and analyze what is going on in DriverEntry. We can see that the driver injected something in the process 876:

Let’s dump this full process using volatility, so that we can see what was injected there:

volatility -f help.dmp --profile=Win7SP1x64_23418 memdump -p 876 -D mem_dumps/

Indeed – this element contains other pieces of the “malware”. I carved them out using a hexeditor.


Most of the strings used in the “malware” are encrypted with RC4 – each using a different, hardcoded key. The same obfuscation method is used in each module. So, it is useful to make a decoder that would be able to statically deobfuscate it.

We are also given a PCAP file. So, we need to somehow make sense out of the network traffic, and what is its relationship with the found “malware”. The volatility will also be helpful in seeing which process was responsible for what part of the traffic. We can see it using the command:

volatility -f help.dmp --profile=Win7SP1x64_23418 netscan

We can correlate the traffic generated by the svchost (PID 876) with the traffic recorded in the PCAP. Let’s dump the packages and try to decode them. There is a huge amount of the traffic on the port 7777. When we dump those packages, we can see inside some repeating patterns. I visualized one of the dumps (using to get an idea what can possibly be hidden inside. This is the result:

Looking at the visualization we can guess, that it is not a PE file encoded, but rather a bitmap. (If it was a PE the patterns inside would look very different: in that image there is a lot of content that looks to be filled by the same characters – and in PE we would not have so much padding between the sections.)

After finding the proper XOR key (thanks to Mark Lechtik), I got the decoded content, that was indeed a series of bitmaps. As it turned out: screenshots from the infected system.

The screenshots give some very important hints on how is the flag stored. As we can see, it is in the KeyPass database. The masterkey is covered, but we know that it is typed on the screen, so we can suspect that the keylogger component should have caught it. It will probably be sent in some other part of the traffic.

I decided to find the KeyPass database first. I needed to check what exactly was the version of KeyPass. In order to do this, I dumped the KeyPass process. It turned out to be KeyPass 1.37. I installed the same version and checked what is the header for this format. Then, I carved out the valid file with this header.


The next step is to find the password! I confirmed that crypto.dll is the layer that decrypts that part of the traffic (thanks to Alex Polyakov and Alex Skalozub for answering my questions and confirming that this is the good direction to follow). I analyzed the crypto.dll and made a decryptor for the packets.

The traffic at the port 8888 contained the keylogged content (captured by keylog.dll), and the traffic on the port 6666 – the stolen files (fetched by filedll.dll). It turned out that the uploded file was keys.kdb – the same file that I carved out from the disk – so it was another way of retrieving this piece. I found that the hashes of both match, so I confirmed that I have the valid kdb. I found also something that looked like the key to the database: “th1sisth33nd111”.


Yet, this key didn’t work!

At this point I wasn’t sure if I went in a good direction, so I asked Alex Polyakov for the hint. He confirmed that this is indeed the key captured by the keylogger, but it is in a bit different form than the key that was typed… I tried to find something similar but yet different recorded in the memory, by grepping through the strings of the main dump (help.dmp). After many failed attempts, I got an idea that the number ‘1’ at the end can be in reality ‘!’. And I tried the following command:

cat help_strings.txt | grep -i 3nd!

I found the following string:


Still, it didn’t work… But I noticed that the first characters are missing, so I tried:


And finally it worked, I got the kdb unlocked and the flag has shown up!

That’s all! I hope you enjoyed my writeup. Sourcodes of all my Flare-On solutions are available here:

Thanks to the Flare team for the great contest!


FlareOn6 Write-Up of Write-Ups – aggregator and summary of varorious solutions
View at

Posted in CrackMe | Tagged , | 1 Comment

Application shimming vs Import Table recovery

In this post I am sharing a case that I investigated recently, during the tests of my application, PE-sieve. It demonstrates how the shims applied by the operating system can disrupt Imports recovery.

Tested features

Recently I had a new release of PE-sieve. One of the added features was a complete Import Table recovery. From now, if you run the PE-sieve with an option /imp 3 it will collect all the addresses that the scanned module imported from the DLLs in the memory, and construct a new Import Table out of them. It is a very useful feature that many PE dumpers have. It helps i.e. to deal with packed applications. Let’s take an example of UPX: it may compress the Import Table of the payload, and load it dynamically during unpacking.

PE-sieve offers also another, “milder” mode of the recovery (/imp 2). In this case PE-sieve bases on the existing import table, and only reconstruct the erased elements. It can be used i.e. in the following case, when the Thunks were overwritten by the functions addresses during the imports loading:


PE-sieve is able to recognize the exports that are at those addresses, and fills their names back into the table:


Test cases

I decided to test my application on some ready-made and well-known examples. I selected Anthracene’s unpacking series, available here.

The first sample (1dbfd12ad3ee39930578b949c6899d0a) looks pretty straight-forward. It is a simple application showing a MessageBox, packed with the help of UPX.


I run this example on one of my Virtual Machines (Windows 8 64 bit) and the imports got recovered flawlessly (video).

Later, I tried to do the same on a different machine, with another set of patches installed. For some reason, I could not reproduce the valid results. Import recovery “magically” stopped working – the received results were incomplete. When I tried to dump the payload with PE-sieve, using the option /imp 2 , all the functions imported from Kernel32.dll got recovered, but the function from User32: MessageBoxA did not.

First I assumed that it must be a bug in PE-sieve, but it turned out that other applications i.e. Scylla have the same problem: they cannot map this address to any of the exported functions.

I investigated the mentioned address under the debugger, and this is what I saw. The call to a MessageBoxA was proxied via apphelp:

The function used from the apphelp was not present in the export table, so it is logical that the applications recovering imports could not recognize it. So, it is not really a bug in the application – but a problem caused by some peculiar way in which this import was loaded.


Of course I wanted to understand what is the reason of such behavior. I suspected that there must be some shimming going on, but why did it happen? I checked other applications importing MessageBoxA, but each of them used this function directly from User32.dll, and apphelp.dll was not used as a proxy.

I started thinking that it may be related with the fact that my test case is a very old application.

The OS Version in the PE header is set to 4 (Win 95):

I made an experiment and “upgraded it”, just by changing the version number:

And it worked! After this simple change, the shim was no longer applied. The application used the import directly, and, as a result PE-sieve (and other applications) were able to properly recognize the function.

So, it turned out that the operating system applies this shim automatically for the backward compatibility with old applications.

Now when I think of it, it looks pretty obvious, but it was not so intuitive when I saw it for the first time, that’s why I decided to document this case. So, just a small heads-up: when the import recovery is not going right, first check if shims are not the reason! I hope you enjoyed my small writeup.

Posted in Programming, Uncategorized | Tagged , , , , , , | 1 Comment

PE-bear – version 0.3.9 available

[UPDATE] This release introduced some stability issues, fixed in

Hello! Several months have passed since I released PE-bear 0.3.8. Since it was my old, abandoned project, I did not plan to start developing it again. Initially, I got convinced to be adding only bugfixes, treating it rather as a legacy app. However, it started doing pretty good for a “dead” project. It got 15K+ new downloads, has been mentioned in some cool presentations, featured on OALabs, and added to FlareVM. It all made me reconsider my decision. Also, I started getting messages from users requesting new features. Finally, I decided to break what I said before, and prepare another release.

The current one (0.3.9) comes with some new features. You can download it from the main site of the project:

1. Added Rich Header (viewing and editing), with calculated checksum. Preview:


New PE-bear displays all the fields of RichHeader, and allows for their editing. It automatically calculates and verifies the Checksum, so it can help spotting the cases when the Rich Header was forged.

2. Added support for the new fields in Load Config Directory. Preview:


Since PE-bear is a pretty old project, it was not able to parse the full Load Config Directory, but only its basic form, ending on SEHHandlerCount. Now it supports the extensions introduced in Windows 8.1 and Windows 10.

3. In Debug Directory: parse and display RSDSI Table (including PDB path etc):


In the old version, Debug Directory was displayed, but without parsing the structure nested inside. Now, one of the most popular types, including PDB path, is also parsed: you can view the project path, and also edit it.

In addition, project underwent some internal refactoring, and I added some other tiny improvements.

I must say I started enjoying working on PE-bear again, and already got several new ideas that I am planning to implement. So, this release is not gonna be the last.

Big thanks to all of you who motivated me to “resurrect” this project. I hope you will enjoy the new version, and the PE-bear’s comeback. As always, I am open for any comments and suggestions.

Posted in PE-bear, Tools | 6 Comments