Windows Cross-Architecture Code Injection

Intro

This post covers 2 methods for injecting code into another process. Both methods avoid writing anything to disk and allow injection from WoW64 to 64 bit processes and vice versa. This is useful if you exploit a host and find yourself in a WoW64 process and don't want to create another process to get 64 bit execution. Although it is possible to execute 64 bit shellcode in a WoW64 process (demonstrated below), you don't get the convenience of kernel32. We'll start with a brief intro to WoW64.

WoW64 Transition

WoW64 is a layer that allows 32 bit executables to be run unmodified on 64 bit systems within a 64 bit process. It works by translating syscalls to the native 64 bit libraries. Below is a walkthrough of calling kernel32!CreateFileW from WoW64.

Hitting our initial break point

NtCreateFile is the underlying native API

Wow64SystemServiceCall will call the 64 bit NtCreateFile (syscall 0x55)

Wow64Transition will put the CPU in 64 bit mode

CS (code segment) register value 0x23 indicates 32 bit mode. This code sets up a FAR jmp which will set CS to 0x33 (64 bit mode)

After jmp, CPU is now executing in 64 bit mode

wow64cpu!CpupReturnFromSimulatedCode handles the transition from WoW64 by saving the 32 bit processor state into a structure pointed at by thread local storage (TLS). Wow64SystemServiceEx will make the 64 bit syscall to NtCreateFile.

After the syscall completes, 32 bit processor state is restored. Finally, another FAR jmp is made to set CS to 0x23 and resume execution in WoW64.

This example shows what needs to be done to execute code across architectures:

Set the CS register to 0x23 (32 bit) or 0x33 (64 bit)
Preserve processor state (registers, stack, etc)

This transition will be used to execute 64 bit injection shellcode within a WoW64 process.

Technique #1: CreateRemoteThread

The classic example of code injection uses CreateRemoteThread to execute kernel32!LoadLibraryA on a DLL sitting on disk. We can use CreateRemoteThread to target shellcode instead without having to write any files.

A companion project to this blog post is available on github to demonstrate the technique.

Native 64 -> WoW64

First, we need to write our shellcode into the WoW64 process using VirtualAllocEx and WriteProcessMemory. Then we can target the remote shellcode with CreateRemoteThread.

WoW64 -> Native 64

This is a little trickier since CreateRemoteThread will fail when called from WoW64 and targeting a 64 bit process. However, we can still write our shellcode in the process using the method above. In order to create the thread, we need to transition from WoW64 to 64 bit mode and call ntdll!RtlCreateUserThread. The below function, ExecuteNative64, will execute 64 bit shellcode from WoW64. The shellcode will call RtlCreateUserThread like this shellcode from the meterpreter project or this position independent C code. Note that we must call ntdll!RtlCreateUserThread instead of kernel32!CreateRemoteThread since the 64 bit kernel32.dll is not present in WoW64 processes.

64 bit injection shellcode run from WoW64 will need to do the following:

void Inject64(Arg *arg) {
RtlCreateUserThread(arg->hProc, ..., arg->remoteShellcode, ...);
}

Simplified code for injecting from WoW64:
Note that there are 2 stages of shellcode here:

64 bit shellcode (inject64) which is run within the WoW64 host process to create a remote thread to run stage 2
64 bit shellcode (shellcode) that is run within the remote process in a new thread

Technique #2: QueueUserAPC

Microsoft provides another API for executing code in a remote process. QueueUserAPC (native API NtQueueApcThread) takes a thread handle and an address to begin executing. However, msdn gives the following warning:

"To ensure successful execution of functions used by the APC, APCs should be queued only to threads in the caller's process."

This will not be an issue if your shellcode is position independent and resolves dependencies dynamically. Additionally, QueueUserAPC doesn't care if you pass it a handle to a thread outside the current process.

Unlike CreateRemoteThread, this routine does not create a new thread but is executed when the targeted thread enters an alertable state (eg after calling WaitForSingleObject). I am not aware of any way to check if a thread is alertable from user land so queuing an APC on all threads may be necessary (or until execution is confirmed). Since this method hijacks a thread's execution, it is important that your shellcode creates a new thread and returns execution to the original context.

Native 64 -> WoW64

Under the hood, QueueUserAPC calls NtQueueApcThread and we will be calling this native API directly to avoid dealing with activation contexts. QueueUserAPC actually calls NtQueueApcThread with another function, RtlDispatchAPC, that winds up calling our provided address.

QueueUserAPC calls NtQueueApcThread with RtlDispatchAPC (rdx)

When NtQueueApcThread is called from a 64 bit process targeting a thread in a
WoW64 process, remote execution is started in 64 bit mode. This makes it necessary to write 64 bit shellcode to transition into 32 bit mode. Manually transitioning is not necessary, however, as a call to ntdll!RtlCreateUserThread from 64 bit mode will create a new thread in 32 bit mode. We will need 2 stages of shellcode:

64 bit stage 1 will call RtlCreateUserThread on stage 2
the 32 bit stage 2 payload

This code from metasploit can be modified for use as stage 1. The previous WriteProcessMemory code can be used to write both stages to the remote process. NtQueueApcThread will target stage 1 and the stage 2 address can be passed as a parameter like so:

NtQueueApcThread(hThread, (PAPCFUNC)stage1, stage2, 0, 0);

If spamming the APC to multiple threads, make sure to synchronize access in your shellcode.

WoW64 -> Native 64

Just like with CreateRemoteThread, we need to first transition our WoW64 thread into native 64 bit mode then we can call ntdll!NtQueueApcThread effectively turning this into 64 bit -> 64 bit injection. Since the 64 bit kernel32.dll is not present in WoW64 processes, we must call ntdll!NtQueueApcThread instead of kernel32!QueueUserAPC.

Our existing WriteProcessMemory code can be reused here to write the payload. The ExecuteNative64 routine can be used to execute 64 bit shellcode that calls NtQueueApcThread on the remote thread and payload.

There will be 2 stages of shellcode here as well:

local 64 bit stage 1 to call NtQueueApcThread on remote thread with stage 2 address
remote 64 bit stage 2

Your remote stage 2 shellcode will need to create a new thread. This code from metasploit can be used as a starting point.

Injection code will look something like this:

Final Thoughts

This post covered just two common options for user mode code injection on Windows that don't require accessing disk. These approaches use legitimate, documented APIs but in a way that was likely not intended.

The metasploit block_api provides a nice template for writing shellcode that interacts with win32. You can also follow this post to write shellcode in C and check out this DLL loader that uses those techniques.

You can also find relevant code in this dll injection project (requires Visual Studio 2017).

UserExistsError's Infosec Blog

Search This Blog