In order to make it almost undetectable and to take full control of the host system, I propose to use virtualization extensions of modern processors. Once in such position, the application can hook into the host's kernel to start threads that can interface with the rest of the host system, sending and receiving data, while the main application code remains out of reach.
With such little surface exposed to the host, it's extremely difficult for any host application to reliably detect it, but even in the case of detection, it's not possible to remove (at least without rebooting the machine), as long as the way it was installed is sealed for future uses.
On disk, I propose at first to simply replace the bootloader with a small app capable of loading it. Later it could be improved by installing it directly on the firmware/BIOS flash, thus taking control even earlier.
It would, of course, be fully extensible, as it would be able to load any code.
Thanks in advance.