views:

897

answers:

11

Without getting into the details of why, I'm looking for a clean (as possible) way to replace kernel functions and system calls from a loadable module. My initial idea was to write some code to override some functions, which would take the original function (perhaps, if possible, call the function), and then add some of my own code. The key is that the function that I write has to have the name of the original function, so other code, upon trying to access it, will access mine instead.

I can easily (comparatively) do this directly in the kernel by just throwing my code into the appropriate functions, but I was wondering if anyone knew a little C magic that isn't necessarily horrible kernel (or C) coding practice that could achieve the same result.

Thoughts of #defines and typedefs come to mind, but I can't quite hack it out in my head.

In short: does anyone know a way to effectively override functions in the Linux kernel (from a module)?

EDIT: Since it's been asked, I essentially want to log certain functions (creating/deleting directories, etc.) from within the kernel, but for sanity's sake, a loadable module seems to make sense, rather than having to write a big patch to the kernel code and recompile on every change. A minimal amount of added code to the kernel is okay, but I want to offload most of the work to a module.

+3  A: 

I'm not entirely sure I understand what you want to do, but I think that ksplice may be a good solution. It's still under development, so I don't know if it's in any sort of usable condition right now.

Adam Rosenfield
What I'm looking at doesn't really have to do with reloading the kernel as much as modularizing my code within the kernel, and avoiding making changes to the existing code as much as possible.
Dan Fego
A: 

Most filesystem work is done in modules already, presuming that the filesystem code was built as a module, rather than built into the kernel (which means the 'real' answer depends on kernel build options).

Assuming that the bits you want to log are all filesystem-related, and that those filesystem routines are built as modules, you should just be able to alter the filesystem module(s) you're interested in, and reload them.

If those assumptions aren't true, or can't be made true, then things clearly get trickier, and I really couldn't point you much further.

Harper Shelby
Interesting take, though ideally I'd be logging at a higher level than the particular filesystem (like sys_write or vfs_write). I figure doing it at that level would make my code filesystem-agnostic.
Dan Fego
+3  A: 

You probably want to hook the system calls (PDF link), which would effectively let you log user-processes calling kernel functions. If you really want to log the kernel use of kernel functions, you want to look into kernel function trace.

geocar
What PDF link? :-P
Dan Fego
The one butchered by the markdown parser. Thanks for noticing; fixed it.
geocar
A: 

Since you want to only log the calls (i.e. you will not actually override them), and a small amount of changes to the kernel code is acceptable, the cleanest way would be to add a hook to each function you are interested in (using a notifier chain or even a plain function pointer). Your module then simply registers itself to all the hooks you added (and unregisters from them when unloaded).

It is also quite possible that someone else has already done the work of adding the hooks for you.

CesarB
A: 

You don't want to modify existing system calls, you want to instrument them. This is what SystemTap is for. If you really want to do it the hard way and intercept system calls by coding your own module, I suggest you read some rootkit literature but I don't have any link handy (although phrack comes to mind).

Krunch
+1  A: 

There has been a lot of work done in the kernel to make sure this does not happen, especially work to not expose the syscall table to modules. The only supported mechanism to log file access is LSM, but it is oriented towards security and has an uncertain future. Here is a PDF that documents the API, but it may not be up to date.

inotify is a much better way to monitor the creation, deletion and modification of files than trying to subvert the kernel syscall functions, but it works from userspace.

Quoted from Wikipedia (http://en.wikipedia.org/wiki/Inotify): Some of the events that can be monitored for are:

* IN_ACCESS - read of the file
* IN_MODIFY - last modification
* IN_ATTRIB - attributes of file change
* IN_OPEN and IN_CLOSE - open or close of file
* IN_MOVED_FROM and IN_MOVED_TO - when the file is moved or renamed
* IN_DELETE - a file/directory deleted
* IN_CREATE - a file/directory created
* IN_DELETE_SELF - file monitored is deleted

inotify exists in the kernel since 2.6.13, its predecesor is dnotify (http://en.wikipedia.org/wiki/Dnotify).

Phillip Whelan
A: 

I think you can use audit for that

kmilo
A: 

have a look at http://www.tldp.org/LDP/lkmpg/2.6/html/x978.html

Thanks, but I believe that method is no longer possible as of 2.6. sys_call_table[] is no longer exported.
Dan Fego
+1  A: 

G'day,

Have you looked at deploying your function using LD_PRELOAD?

Your function would be deployed via a shared lib that would live in a directory that is specified by the environment variable LD_PRELOAD.

The convention is that you intercept system calls and then, after performing your magic, pass the call onto the actual system shlib. But you don't have to do that.

Maybe take a look at the article "Building library interposers for fun and profit". While it is Solaris specific, it is also applicable to Linux.

BTW This is how most memory analysis tools, e.g. Purify, work.

HTH

cheers,

Rob Wells
A: 

According to KernelTrap.org you can do a simple patch and recompile of your kernel to export the sys_call_table variable:

// add the following in the file arch/i386/kernel/i386_ksyms.c
extern void* sys_call_table[];
EXPORT_SYMBOL(sys_call_table);

Then just follow this procedure for replacing system calls from the Linux Kernel Module Programming Guide:

The source code here is an example of such a kernel module. We want to 'spy' on a certain user, and to printk() a message whenever that user opens a file. Towards this end, we replace the system call to open a file with our own function, called our_sys_open. This function checks the uid (user's id) of the current process, and if it's equal to the uid we spy on, it calls printk() to display the name of the file to be opened. Then, either way, it calls the original open() function with the same parameters, to actually open the file.

The init_module function replaces the appropriate location in sys_call_table and keeps the original pointer in a variable. The cleanup_module function uses that variable to restore everything back to normal. This approach is dangerous, because of the possibility of two kernel modules changing the same system call. Imagine we have two kernel modules, A and B. A's open system call will be A_open and B's will be B_open. Now, when A is inserted into the kernel, the system call is replaced with A_open, which will call the original sys_open when it's done. Next, B is inserted into the kernel, which replaces the system call with B_open, which will call what it thinks is the original system call, A_open, when it's done.

Robert S. Barnes
A: 

This might prove a useful read to you.

Basically, since the system call table is not directly exported in newer kernels, you have to do some searching to determine its location yourself. Then you can intercept your system calls of choice and manipulate them. Replacing other kernel functions, though, will be much more difficult, unless some of them are organized the same way system calls are (they appear on some dispatch table etc.) - which is not at all common.

Michael Foukarakis