V.Kompany's Blog: Linux

Showing posts with label Linux. Show all posts

2008-01-15

VM performance

VM performance
2008-01-07 14:20:27

下面这篇文章提到了衡量VM性能的一些指标...

Why doesn't free memory go down

I wrote a little while ago about interpreting swap usage on Linux. A related question is why Linux always seems to have so little free memory. Does this indicate some kind of problem in Linux or the application? No.

Someone at work asked (paraphrasing):

I have a process that uses a lot of memory while it's running, so the free memory (shown by freetop) goes right down to 60MB out of 8100MB. But when the process exits, the free memory doesn't go back up. Why isn't memory released when the process exits? or

The short answer is that you should never worry about the amount of free memory on Linux. The kernel attempts to keep this slightly above zero by keeping the cache as large as possible. This is a feature not a bug.

If you are concerned about VM performance then the most useful thing to watch is the page in/out rate, shown by the "bi" and "bo" columns in vmstat. Another useful measure (2.6 only) is the "wa" column, showing the amount of CPU time spent waiting for IO. "wa" is probably the one you have to worry about most, because it shows CPU cycles that are essentially wasted because VM is too slow.

As you said, linux is keeping the free memory into buffer cache, but when there is no process running how come the buffer cache is having 4GB and how it is released 3GB to free memory.

Disk cache is maintained globally, not per-process. Files can remain in cache even after the process that was using them exited, because they might be used by another process. Freeing the cache would mean discarding cached data. There's no reason to do that until the data is obsolete (e.g. files are deleted) or the memory is needed for some other purpose.

After a while the free memory goes back up again.

Pages only become free when they're evicted to build up the free pool (see below), or when nothing useful can be stored in them. If there are gigabytes of free memory then the main cause is that the kernel doesn't have anything to cache in them.

This can happen when, for example, a file that was cached was deleted, or a filesystem is unmounted: there's no point keeping those pages cached because they can't be accessed. (Note that the kernel can still cache a file which is just unlinked, but still in use by applications.)

A similar case is that an application has allocated a lot of anonymous memory and then either exited or freed the memory. That data is discarded, so the pages are free.

Note that flushing the data to disk makes the pages clean, but not free. They can still be kept in memory in case they're read in the future. (Clean means the in-memory page is the same as the on-disk page.)

The guy in the second row asks:

So if Linux tries to keep the cache as large as possible, why is there 60MB free rather than zero? Wouldn't it be better to cache an additional 60MB?

Linux keeps a little bit of memory free so that it is ready as soon as it needs to allocate more memory. If the extra 60MB was used for cache too then when a new allocation was required, the kernel would have to go through the cache and work out what to evict. Possibly it would need to wait for a page to be written out. This would make allocation slower and more complex. So there is a tradeoff where the page cache is made slightly slower so that allocation can be faster and simpler. The kernel keeps just a few free pages prepared in advance.

(If you have any questions mail me and I'll try to answer them here.)

posted Fri 13 Aug 2004 in /software/linux-kernel | link

Linux filemon, 你在哪里？

Linux filemon, 你在哪里？
2007-12-28 01:16:25

今天遇到一个问题，确切地说是同事遇到的一个问题引起了我的兴趣：log显示在User Mode Linux中有一个文件的访问频率异乎寻常的高，这种频繁的File IO似乎不应该出现，那我们怎么找到那一个/几个正在访问这个文件的程序呢？

要是在Windows下面比较好办，装一个filemon(sysinternals上一帮人写的)就可以了，注：sysinternals上有很多类似的工具，可以监视文件、网络、注册表等等，对于平常debug程序来说是些不错的辅助工具。

那在Linux下面呢?对于我这个Linux Newbie来说可就犯难了。经过简单的资料搜索，我的结论是：在Linux下面情况似乎有些复杂...
在Linux-kernel 2.6.x的早期版本里面提供了dnotify的机制来实现file system monitor,确切地说是directory monitor，它提供给应用程序一种途径去监视某个目录下文件的访问情况，但是：
- 它只能监视目录，不能监视指定的某个文件；我们那个文件在/tmp/目录下面。
- 它好像不能告诉应用程序是哪个程序在访问该文件，handler里面倒是有si_pid这个item，但是我发现在UML里面即使我用不同的程序打开那个文件，handler输出来的都是一个PID?!

在Linux-kernel 2.6.x的最近版本提供了inotify机制来弥补dnotify的一些不足(具体什么不足我还不是很清楚，因为两者对我来说都是新鲜事物)。但是我们用的UML是base在2.6.10上的，所以inotify就不予考虑了。

后来看到论坛上有人说可以借助LSM(Linux Security Module)来满足我的需求，于是我又下载了一篇Usenix Security 2002上的文献 - Linux Security Module Framework(Chris Wright and Crispin Cowan, Stephen Smalley, James Morris and Greg Kroah-Hartman)，看起来像是会议文献。根据文献所说，LSM提供了包括File Hook在内的多种Hook以实现其安全框架。满心欢喜，准备实践，根据文献说明:
"LSM is available as a kernel patch for both the 2.4 and 2.5 Linux kernels. The patches are available from http://lsm.immunix.org."
但是很不幸，http://lsm.immunix.org不能访问，网络原因?这个project中止了?

接着google了一下LSM，好家伙，居然发现Linux有很多Security Solutions，其中以LSM和SELinux为主。按照论坛里一个日本人的说法，LSM是path-based-security，而 SELinux是Label-based-security。但是好像至今为止LSM并没有merge到Mainline里面去，还是必须打patch。

找工具这条路好像有些走不通了，留给我的问题很多：
- LSM的patch在哪里？
- LSM的patch能不能在UML(linux-kernel-2.6.10)里面打？
- 就算LSM可用了，它的File Hook能不能提供足够的信息，比如是哪一个进程在访问哪一个文件？

因为我们的问题域是在UML里,所以我们还有这样的选择:
- Hook文件IO函数;
- 重新编译UML;
- 重新在UML中部署我们的应用;
- 复现那个问题;

但我还是倾向于第一种方案

why do we use while (0)?

why do we use while (0)?
2007-12-26 16:02:05

今天在Linux Kernel Newbie的Mail List上看到一个问题,原文如下:

Recently I started looking into linux kernel and trying to understand the code.
I am working with linux-2.6.9.
in file include/llinux/list.h - I found something like this.

#define INIT_LIST_HEAD(ptr) do { \
(ptr)->next = (ptr); (ptr)->prev = (ptr); \
} while (0)

My question is why do we use a loop when we actually know that it is not going to execute more than once? Cannot we simply do -

#define INIT_LIST_HEAD(ptr) {(ptr)->next = (ptr); (ptr)->prev = (ptr)}

Do we get some kind of optimization by using while (0)?

觉得很有意思,开始关注这个帖子,很快就有人回复了,比较权威的是kernelnewbies.org的FAQ:
http://kernelnewbies.org/FAQ/DoWhile0

Why do a lot of #defines in the kernel use do { ... } while(0)?

There are a couple of reasons:

*(from Dave Miller) Empty statements give a warning from the compiler so this is why you see #define FOO do { } while(0).
*(from Dave Miller) It gives you a basic block in which to declare local variables.
*(from Ben Collins) It allows you to use more complex macros in conditional code. Imagine a macro of several lines of code like:

#define FOO(x) \
printf("arg is %s\n", x); \
do_something_useful(x);

Now imagine using it like:

if (blah == 2)
FOO(blah);

This interprets to:

if (blah == 2)
printf("arg is %s\n", blah);
do_something_useful(blah);;

As you can see, the if then only encompasses the printf(), and the do_something_useful() call is unconditional (not within the scope of the if), like you wanted it. So, by using a block like do { ... } while(0), you would get this:

if (blah == 2)
do {
printf("arg is %s\n", blah);
do_something_useful(blah);
} while (0);

Which is exactly what you want.
*(from Per Persson) As both Miller and Collins point out, you want a block statement so you can have several lines of code and declare local variables. But then the natural thing would be to just use for example:

#define exch(x,y) { int tmp; tmp=x; x=y; y=tmp; }

However that wouldn't work in some cases. The following code is meant to be an if-statement with two branches:

if (x > y)
exch(x,y); // Branch 1
else
do_something(); // Branch 2

But it would be interpreted as an if-statement with only one branch:

if (x > y) { // Single-branch if-statement!!!
int tmp; // The one and only branch consists
tmp = x; // of the block.
x = y;
y = tmp;
}
; // empty statement
else // ERROR!!! "parse error before else"
do_something();

The problem is the semi-colon (;) coming directly after the block. The solution for this is to sandwich the block between do and while (0). Then we have a single statement with the capabilities of a block, but not considered as being a block statement by the compiler. Our if-statement now becomes:

if (x > y)
do {
int tmp;
tmp = x;
x = y;
y = tmp;
} while(0);
else
do_something();

*(from Bart Trojanowski) gcc adds Statement-Expressions which provide an alternative to the do-while-0 block. They provide the above mentioned benefits and are slightly more legible.

#define FOO(arg) ({ \
typeof(arg) lcl; \
lcl = bar(arg); \
lcl; \
})

The highlight of Beginning Linux Programming (Third Edition)

The highlight of Beginning Linux Programming (Third Edition)
2007-12-26 12:33:49

The highlight of Beginning Linux Programming (Third Edition)

This is absolutely a good book regarding Linux Programming for beginner. I highlight key-sentences of this book below…

注：用50%的Gray表示原文，100%的Black表示我的问题

Chapter 1. Getting started

Linux applications are represented by two special types of files: executables and scripts.

Q.程序的执行路径问题
Since this is the first program we’ve run, it’s a good time to point something out. The hello program will probably be in your home directory. If PATH doesn’t include a reference to your home directory, the shell won’t be able to find hello. Furthermore, if one of the directories in PATH contains another program called hello, that program will be executed instead. This would also happen if such a directory is mentioned in PATH before your home directory. To get around this potential problem, we can prefix program names with ./ (e.g., ./hello). This specifically instructs the shell to execute the program in the current directory with the given name.

Q.在Windows下我喜欢用Source Insight作为查看代码的工具，在Linux下面我还没找到什么很好的工具，看来作者主要是用grep命令了，太强了
It’s often convenient to use the grep command to search header files for particular definitions and function prototypes.

Q.Library文件的处理，怎么指定库文件(full path or -l)，怎么让编译器去指定路径搜索所需的库文件(-L)
The libraries usually exist in both static and shared formats, as a quick ls /usr/lib will show. We can instruct the compiler to search a library either by giving it the full path name or by using the -l flag.

The -lm (no space between the l and the m) is shorthand (shorthand is much valued in UNIX circles) for the library called libm.a in one of the standard library directories (in this case /usr/lib). An additional advantage of the -lm notation is that the compiler will automatically choose the shared library when it exists.

Although libraries are usually found in standard places in the same way as header files, we can add to the search directories by using the -L (uppercase letter) flag to the compiler. For example,

$ gcc -o x11fred -L/usr/openwin/lib x11fred.c -lX11

will compile and link a program called x11fred using the version of the library libX11 found in the /usr/openwin/lib directory.

The –L. option tells the compiler to look in the current directory (.) for libraries. The –lfoo option tells the compiler to use a library called libfoo.a (or a shared library, libfoo.so, if one is present).

Q.怎么产生静态库文件(Static Libraries)
We can create and maintain our own static libraries very easily by using the ar (for archive) program and compiling functions separately with gcc -c.

We do this by invoking the C compiler with the -c option, which prevents the compiler from trying to create a complete program. Trying to create a complete program would fail because we haven’t defined a function called main.

作者给了一个简单的例子(注：这个例子有个bug)

- fred.c
#include
void fred(int arg)
{
printf(“fred: you passed %d\n”, arg);
}

- bill.c
#include
void bill(char *arg)
{
printf(“bill: you passed %s\n”, arg);
}

$ gcc -c bill.c fred.c

$ ls *.o

bill.o fred.o

bill.c and fred.c是库文件的源代码，它们都没有main函数的，但是编译不会出错因为我们用了-c option

- lib.h
/*
This is lib.h.
*/
void bill(char *);
void fred(int);

- program.c
#include “lib.h”
int main()
{
bill(“Hello World”);
exit(0);
}

主程序，调用库文件暴露出来的函数

$ cgcc -c program.c $ cgcc -o program program.o bill.o $ ./program bill: we passed Hello World // YM: a minor bug here. It should output ‘bill: you passed Hello World’

Now let’s create and use a library. We use the ar program to create the archive and add our object files to it. The program is called ar because it creates archives, or collections, of individual files placed together in one large file. Note that we can also use ar to create archives of files of any type. (Like many UNIX utilities, ar is a generic tool.)

$ ar crv libfoo.a bill.o fred.o // generate a static library named libfoo.a
a - bill.o
a - fred.o

The library is created and the two object files added. To use the library successfully, some systems, notably those derived from Berkeley UNIX, require that a table of contents be created for the library. We do this with the ranlib command. In Linux, this step isn’t necessary (but it is harmless) when we’re using the GNU software development tools.
$ ranlib libfoo.a

Q.静态库文件的缺点
One disadvantage of static libraries is that when we run many applications at the same time and they all use functions from the same library, we may end up with many copies of the same functions in memory and indeed many copies in the program files themselves. This can consume a large amount of valuable memory and disk space.

Q.怎么看一个程序和哪些动态库相关
We can see which shared libraries are required by a program by running the utility ldd. For example, if we try running it on our example application, we get the following:

$ ldd program libc.so.6 => /lib/libc.so.6 (0x4002a000)
/lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)

Chapter 4 The Linux Environment
Q. 在Linux下编程需要了解的一些基本环境相关的概念:

给程序传参数(Passing arguments to programs)
环境变量(Environment variables)
系统时间(Finding out what the time is)
临时文件(Temporary files)
得到用户和主机的信息(Getting information about the user and the host computer)
Log信息(Causing and configuring log messages)
系统限制(Discovering the limits imposed by the system)

Q. 这章所涉及的Linux API

1. Programs arguments

#include
int getopt(int argc, char *const argv[], const char *optstring);
extern char *optarg;
extern int optind, opterr, optopt;

2. Environment variables

#include
char *getenv(const char *name);
int putenv(const char *string);
extern char **environ;

3. Time and Date

#include
time_t time(time_t *tloc);
double difftime(time_t time1, time_t time2);
struct tm *gmtime(const time_t timeval);
struct tm *localtime(const time_t *timeval);
time_t mktime(struct tm *timeptr);
char *asctime(const struct tm *timeptr);
char *ctime(const time_t *timeval);
size_t strftime(char *s, size_t maxsize, const char *format, struct tm *timeptr);
char *strptime(const char *buf, const char *format, struct tm *timeptr);

4. Temporary files

#include
char *tmpnam(char *s);
FILE *tmpfile(void);
char *mktemp(char *template);
int mkstemp(char *template);

5. User Information

#include
#include
#include
uid_t getuid(void);
char *getlogin(void);

struct passwd *getpwuid(uid_t uid);
struct passwd *getpwnam(const char *name);

void endpwent(void);
struct passwd *getpwent(void);

void setpwent(void);
uid_t geteuid(void);
gid_t getgid(void);
gid_t getegid(void);

int setuid(uid_t uid);
int setgid(gid_t gid);

6. Host Information
#include
#include

int gethostname(char *name, size_t namelen);
int uname(struct utsname *name);
long gethostid(void);

6.Logging
#include
#include
#include

void syslog(int priority, const char *message, arguments...);
void closelog(void);
void openlog(const char *ident, int logopt, int facility);
int setlogmask(int maskpri);

pid_t getpid(void);
pid_t getppid(void);

7. Resources and limits
#include
int getpriority(int which, id_t who);
int setpriority(int which, id_t who, int priority);
int getrlimit(int resource, struct rlimit *r_limit);
int setrlimit(int resource, const struct rlimit *r_limit);
int getrusage(int who, struct rusage *r_usage);

Limit Constant What They’re For

NAME_MAX The maximum number of characters in a filename
CHAR_BIT The number of bits in a char value
CHAR_MAX The maximum char value
INT_MAX The maximum int value

Resource Parameter Description

RLIMIT_CORE The core dump file size limit, in bytes
RLIMIT_CPU The CPU time limit, in seconds
RLIMIT_DATA The data () segment limit, in bytes
RLIMIT_FSIZE The file size limit, in bytes
RLIMIT_NOFILE The limit on the number of open files
RLIMIT_STACK The limit on stack size, in bytes
RLIMIT_AS The limit on address space (stack and data), in bytes

V.Kompany's Blog