Monday, December 14, 2009
Tree operations in Linux Kernel 2.6
Memory management uses Red-Black tree, a kind of binary tree. Please, pay more attention to rotations of Red-Black tree because it determines the efficiency of memory management. For more details, SEE "Introduction to Algorithms by T. Cormen at al.", a bible in computer science.
container_of() Macro in LInux Kernel 2.6
Assuem struct A { int a, int b}; struct A obj={3,4}. Then, given obj.b(member) and the type of obj (struct A), we find the address of obj. It's awesome, isn't it? For more details, SEE the Linux idioms in "Tips for hacking Linux Kernel 2.6".
List operations in Linux Kernel 2.6
Linux Kernel 2.6 uses a doubly-linked, circular list which is defined in include/linux/list.h. To add an element to an existing list, there are two options: list_add() and list_add_tail(). list_add() will behave as stack and list_add_tail() as queue. Let's see some examples. Assume that head is represented as 1 and nodes 2, 3 & 4 are added to 1.
For list_add():
prev: [1] -> [2] -> [3] ->[4] -
^ |
+______________+
next: [1] -> [4] -> [3] ->[2] -
^ |
+______________+
For list_add_tail():
prev: [1] -> [4] -> [3] ->[2] -
^ |
+______________+
next: [1] -> [2] -> [3] ->[4] -
^ |
+______________+
Now you can see the difference, right? Unfortunately, however, the distinction is vague to some extent. You can have a stack with list_add_tail() and prev, and vice versa. Besides, list_del(E) deletes E from the list which E belongs to, i.e. E should be an element from the list. For example, list_del(1) will remove the head from the above. Last but not least, to iterate all elements of a list, use list_for_each_entry().
Sunday, December 13, 2009
Saturday, December 12, 2009
Locking Mechanism in Linux Kernel 2.6
Linux has memory barrier and spinning lock. Let's take a deep look at how they work. First, let's start with barrier. GCC compiler reorders instructions, but this is a bad news for synchronization because some instructions should be kept in order for synchronization, for example instruction A should be executed before the synchronization but it may be executed after it. Thus, Linux barrier behaves as an optimization barrier. barrier() is defined in include/linux/compiler-gcc.ha as follows:
#define barrier() __asm__ __volatile__("": : :"memory")
This Macro ensures no instruction reorder by the compiler.
Next, let's take a deep look at spin_lock() and spin_unlock(). For spin_lock(), it first disables kernel preemption and performs busy-wait till it gets a lock. For spin_unlock(), unlocks and enables preemption again.
More details will be revisited later.
Thursday, December 3, 2009
Tips for hacking Linux Kernel 2.6
This blog will now turn into recording my experience in Linux Kernel hacking with Kernel 2.6. Most files are stored in my local SVN repository, but some comments will be made here for a faster access and probably I'll post commented source files on this blog. I'm trying to comply with the order in "Understanding the Linux Kernel by Bovet et al." so that each post will start with a number indicating the order. Besides, I'm focusing on 32-bit kernel instead of 64-bit kernel hacking because I'll run 32-bit Guest Linux on my Mac OS X. I'll definitely be heading into 64-bit kernel hacking someday, though. Hence, most files with ****_32.* will be hacked unless it is explicitly specified. Wish me a happy hacking!!!
By the way, if you want to hack the kernel, you need the following:
1] Linux kernel 2.6. It'd be much better if you could grab from your Linux vendor because some vendors already made patches. In my case, I'm using Ubuntu 9.04 with Kernel 2.6.28_55. Here is a good link for Ubuntu Kernel Fetch & Compile. You can also view the code here online, but I'd rather have a local copy. For the local browsing, see here and here.
2] The book of "Understanding the Linux Kernel by D. Bovet & M. Cesati". I bought this book, probably a used book, from here at $14. However, this book might be overwhelming to beginners because it has too much details. If you want just a big picture, I'd rather recommend the book of "Linux Kernel Development by R. Love". In my case, I just read one chapter from the latter, read the corresponding code, read one chapter from the former and re-read the code.
3] The book of "The C Programming Language by B. Kernighan and D. Ritchie". Besides, you need to frequently reference to GCC (Caveat: find the matching GCC with which you'll compile your kernel). You'll face a lot of complicated macros, typedefs, GCC-specific features, etc. For example, SEE GCC 5.27, 5.34 and 5.35 for __attribute__:
...
regparm (N)
On the Intel 386, the regparm attribute causes the compiler to pass arguments number one to N if they are of integral type in registers EAX, EDX, and ECX instead of on the stack. Functions that take a variable number of arguments will continue to be passed all of their arguments on the stack.
...
4] Understanding of IA32 ISA(instruction set architecture). For beginners, here is a good link which uses Intel Syntax, i.e. OP DST SRC. Beware that the Linux kernel code uses AT&T syntax, i.e. OP SRC DST. Here is another link(it's not well organized to find IA32 lecture notes, though) that uses AT&T syntax. Of course, you need to download manuals/specifications from Intel homepage, which may be much more daunting to beginners, though.
5] Understanding of Inline assembly: here, here and here. You can also SEE 5.37 of GCC (recommended).
6] Understanding of Linux Kernel idioms and miscellaneous things: here, here, here, here and here. You'd better visit here before you start anything.
7] Cleaning arch/ directory up, i.e. I removed all directories in arch/ other than x86 and ia64. Then, run as follows:
> find . -name *.[hcsS] > cscope.files # assuming you're in the root of the kernel source directory
> cscope -b -q -k
If you want to use cscope on Mac OS X, use "ENTER" and "TAB", where TAB key is used to switch search mode and result mode, and ENTER key is used to navigate search fields. You can navigate the search results with [0-9a-z]. To exit, press "Ctrl+d".
8] With helps from all the links in 1], I'd rather NOT set cscope_maps.vim and use vim as follows, assuming that you look for "task_struct", a C symbol:
> vim -t task_struct
Awesome, isn't it? However, this one only works for C symbols, which means that you cannot find "current", a global variable, though. This is a time to use "> cscope -d", but I'd rather go for with ">grep". Below is my policy of when to use what:
(1) use "> vim -t" for C symbols, for example task_struct in struct task_struct { .... }.
(2) use "> cscope -d" for global variables, for example __USER_CS in #define __USER_CS ...
(3) use "> grep" otherwise
Finally, you're ready to hack the Kernel. From now on, read one chapter and hack the kernel whenever you feel like it's a right time to see some details or you don't understand something.
By the way, if you want to hack the kernel, you need the following:
1] Linux kernel 2.6. It'd be much better if you could grab from your Linux vendor because some vendors already made patches. In my case, I'm using Ubuntu 9.04 with Kernel 2.6.28_55. Here is a good link for Ubuntu Kernel Fetch & Compile. You can also view the code here online, but I'd rather have a local copy. For the local browsing, see here and here.
2] The book of "Understanding the Linux Kernel by D. Bovet & M. Cesati". I bought this book, probably a used book, from here at $14. However, this book might be overwhelming to beginners because it has too much details. If you want just a big picture, I'd rather recommend the book of "Linux Kernel Development by R. Love". In my case, I just read one chapter from the latter, read the corresponding code, read one chapter from the former and re-read the code.
3] The book of "The C Programming Language by B. Kernighan and D. Ritchie". Besides, you need to frequently reference to GCC (Caveat: find the matching GCC with which you'll compile your kernel). You'll face a lot of complicated macros, typedefs, GCC-specific features, etc. For example, SEE GCC 5.27, 5.34 and 5.35 for __attribute__:
...
regparm (N)
On the Intel 386, the regparm attribute causes the compiler to pass arguments number one to N if they are of integral type in registers EAX, EDX, and ECX instead of on the stack. Functions that take a variable number of arguments will continue to be passed all of their arguments on the stack.
...
4] Understanding of IA32 ISA(instruction set architecture). For beginners, here is a good link which uses Intel Syntax, i.e. OP DST SRC. Beware that the Linux kernel code uses AT&T syntax, i.e. OP SRC DST. Here is another link(it's not well organized to find IA32 lecture notes, though) that uses AT&T syntax. Of course, you need to download manuals/specifications from Intel homepage, which may be much more daunting to beginners, though.
5] Understanding of Inline assembly: here, here and here. You can also SEE 5.37 of GCC (recommended).
6] Understanding of Linux Kernel idioms and miscellaneous things: here, here, here, here and here. You'd better visit here before you start anything.
7] Cleaning arch/ directory up, i.e. I removed all directories in arch/ other than x86 and ia64. Then, run as follows:
> find . -name *.[hcsS] > cscope.files # assuming you're in the root of the kernel source directory
> cscope -b -q -k
If you want to use cscope on Mac OS X, use "ENTER" and "TAB", where TAB key is used to switch search mode and result mode, and ENTER key is used to navigate search fields. You can navigate the search results with [0-9a-z]. To exit, press "Ctrl+d".
8] With helps from all the links in 1], I'd rather NOT set cscope_maps.vim and use vim as follows, assuming that you look for "task_struct", a C symbol:
> vim -t task_struct
Awesome, isn't it? However, this one only works for C symbols, which means that you cannot find "current", a global variable, though. This is a time to use "> cscope -d", but I'd rather go for with ">grep". Below is my policy of when to use what:
(1) use "> vim -t" for C symbols, for example task_struct in struct task_struct { .... }.
(2) use "> cscope -d" for global variables, for example __USER_CS in #define __USER_CS ...
(3) use "> grep" otherwise
Finally, you're ready to hack the Kernel. From now on, read one chapter and hack the kernel whenever you feel like it's a right time to see some details or you don't understand something.
access()/open() in Linux Kernel 2.6
It's in fs/open.c:
SYSCALL_DEFINE2(access, const char __user *, filename, int, mode) // This is a macro. This will generate "asmlinkage long sys_access(const char __user * filename, int mode) [SEE "System calls in Linux Kernel 2.6".]
{
return sys_faccessat(AT_FDCWD, filename, mode);
*** AT_FDCWD defined in include/linux/fcntl.h:
#define AT_FDCWD -100 /* Special value used to indicate
openat should use the current
working directory. */
}
SYSCALL_DEFINE2(access, const char __user *, filename, int, mode) // This is a macro. This will generate "asmlinkage long sys_access(const char __user * filename, int mode) [SEE "System calls in Linux Kernel 2.6".]
{
return sys_faccessat(AT_FDCWD, filename, mode);
*** AT_FDCWD defined in include/linux/fcntl.h:
#define AT_FDCWD -100 /* Special value used to indicate
openat should use the current
working directory. */
}
*** What is __user? It's defined in include/linux/compiler.h and for its meaning, SEE here and here. Besides, address_space(N) is a SPARSE (Semantic Parser for C: this really lacks of documentations unfortunately) thing so that you wouldn't get any information from GCC. SEE here for address_space(1).
SYSCALL_DEFINE3(open, const char __user *, filename, int, flags, int, mode)
{
long ret;
if (force_o_largefile())
flags |= O_LARGEFILE;
ret = do_sys_open(AT_FDCWD, filename, flags, mode);
/* avoid REGPARM breakage on x86: */
asmlinkage_protect(3, ret, filename, flags, mode);
return ret;
}
SYSCALL_DEFINE4(openat, int, dfd, const char __user *, filename, int, flags,
int, mode)
{
long ret;
if (force_o_largefile())
flags |= O_LARGEFILE;
ret = do_sys_open(dfd, filename, flags, mode);
/* avoid REGPARM breakage on x86: */
asmlinkage_protect(4, ret, dfd, filename, flags, mode);
return ret;
}
SYSCALL_DEFINE3(faccessat, int, dfd, const char __user *, filename, int, mode)
{
struct path path;
struct inode *inode;
int old_fsuid, old_fsgid;
kernel_cap_t uninitialized_var(old_cap); /* !SECURE_NO_SETUID_FIXUP */
int res;
if (mode & ~S_IRWXO) /* where's F_OK, X_OK, W_OK, R_OK? */
return -EINVAL;
...follows more...
}
* What is uninitialized_var? This one, defined in include/linux/compiler-[gcc[3|4]|intel].h, is to remove a compiler error/warning.
{
struct path path;
struct inode *inode;
int old_fsuid, old_fsgid;
kernel_cap_t uninitialized_var(old_cap); /* !SECURE_NO_SETUID_FIXUP */
int res;
if (mode & ~S_IRWXO) /* where's F_OK, X_OK, W_OK, R_OK? */
return -EINVAL;
...follows more...
}
* What is uninitialized_var? This one, defined in include/linux/compiler-[gcc[3|4]|intel].h, is to remove a compiler error/warning.
* What is issecure(SECURE_NO_SETUID_FIXUP)? Assume this is always false. Let's take a deep look at it. SECURE_NO_SETUID_FIXUP and issecure() are defined in include/linux/securebits.h. Any process created is set to have securebits of SECUREBITS_DEFAULT, i.e. 0. prctl() system call can change the value, but the argument passed to set the capability is either 0 or 1. So, issecure(SECURE_NO_SETUID_FIXUP) always returns either 0 or 16 and, as a result, issecure() is set to 0. Be aware that there's a chance for a process to have securebits set to 16 in security/commoncap.c if arg2 of 0 is passed. SEE here for prctl().
* What is prefetch()? SEE GCC 5.49.
SYSCALL_DEFINE3(open, const char __user *, filename, int, flags, int, mode)
{
long ret;
if (force_o_largefile())
flags |= O_LARGEFILE;
ret = do_sys_open(AT_FDCWD, filename, flags, mode);
/* avoid REGPARM breakage on x86: */
asmlinkage_protect(3, ret, filename, flags, mode);
return ret;
}
SYSCALL_DEFINE4(openat, int, dfd, const char __user *, filename, int, flags,
int, mode)
{
long ret;
if (force_o_largefile())
flags |= O_LARGEFILE;
ret = do_sys_open(dfd, filename, flags, mode);
/* avoid REGPARM breakage on x86: */
asmlinkage_protect(4, ret, dfd, filename, flags, mode);
return ret;
}
Subscribe to:
Posts (Atom)
