2008-03-14

Memory Aliasing

Memory aliasing.我理解成内存混淆.<<深入理解计算机系统>>里面解释成"存储器使用别名"(第5章,5.2小节). 这个概念是在本书中介绍编译器优化程序性能方面的局限性时引出来的.下面是Intel Itanium CPU的文档中关于这个概念的介绍.

Memory Aliasing on Itanium(r)-based Systems Memory aliasing is the single largest issue affecting the optimizations in the Intel(r) compiler for Itanium(r)-based systems. Memory aliasing is writing to a given memory location with more than one pointer. The compiler has to be very cautious to not optimize too aggressively in these cases; if the compiler optimizes too aggressively, unpredictable behavior is expected (for example, incorrect results, abnormal termination, etc.).
Memory aliasing对支持Intel的编译器来说在优化方面是个大
问题. Memory aliasing指的是一个内存地址被几个指针所指向.在这种情况下编译器最好不要做优化操作,如果做的话可能会引起不可预知的问题.

Since the compiler usually optimizes on a module-by-module, function-by-function basis, the compiler does not have an overall perspective with respect to variable use for global variables or variables that are passed into a function; therefore, the compiler usually assumes that any pointers passed into a function are likely to be aliased.
因为编译器通常是一个模块一个模块,一个函数一个函数的优化
,这样的话编译器对整个程序缺乏一个全局的了解,尤其是全局变量和函数的传入参数(这里主要指的是以指针形式传入的参数). 于是编译器只好假设所有函数的传入指针参数所指向的地址同时也被别的指针指向.也就是说假设所有函数的传入指针参数都有Memory aliasing现象.

The compiler makes this assumption even for pointers you know are not aliased. This behavior means that perfectly safe loops do not get pipelined or vectorized, and performance suffers. There are several ways to instruct the compiler that pointers are not aliased: 1. Use a comprehensive compiler option, such as -fno-alias (Linux*) or /Oa (Windows*). These options instruct the compiler that no pointers in any module are aliased, placing the responsibility of program correctness directly with the developer. 2. Use a less comprehensive option, like -fno-fnalias (Linux) or /Ow (Windows). These options instruct the compiler that no pointers passed through function arguments are aliased. Function arguments are a common example of potential aliasing that you can clarify for the compiler. You may know that the arguments passed to a function do not alias, but the compiler is forced to assume so. Using these options tells the compiler it is now safe to assume that these function arguments are not aliased. This option is still a somewhat bold statement to make, as it affects all functions in the module(s) compiled with the -fno-nalias (Linux) or -Ow (Windows) option. 3. Use the ivdep pragma. Alternatively, you might use a pragma that applies to a specified loop in a function. This is more precise than specifying an entire function. The pragma asserts that, for a given loop, there are no vector dependencies. Essentially, this is the same as saying that no pointers are aliasing in a given loop. 4. Use of keyword restrict. An even more precise method of disambiguating pointers is the restrict keyword. The restrict keyword is used to identify individual pointers as not being aliased. You would use the restrict keyword to tell the compiler that a given memory location is not written to by any other pointer.
编译器在这方面采取的是格杀勿论的策略:即使你知道某个指针没有
别名,但是编译器还是假设它有Memory Aliasing现象.这样做的一个后果可能使得一些循环得不到优化. 于是有下面方法来告诉编译器哪些指针是没有别名的.
1. 影响全局的编译选项. Linux下:-fno-alias; Windows下:/Oa. 这些选项告诉编译器所有模块都没有存储器使用别名的现象. 这相当于把责任都交给程序员了;
2. 影响局部的编译选项. Linux下:-fno-fnalias;Windows下
:/Ow. 这些选项告诉编译器所有函数都没有存储器使用别名的现象.这还是有些大胆啊,因为这个选项影响用前面提到的选项编译过的模块中的所有函数!
3. 使用ivdep这个编译指示. 把你认为没有问题的代码部分用这个编译指示来告知(主要是循环)
.下面是一段我从网上看到的真实代码:
























4. 针对指针用restrict关键字来告诉编译器该内存没有别名.

The following example demonstrates using the restrict keyword to tell the compiler that the memory address pointed to by z is not written to by any other pointer. With this new information the compiler can then vectorize or SWP the loop as follows:

// single-dimension array example
void foo (int *x, int *y, int * restrict z) {){ //
告诉编译器Z没有别名

int i;
double temp;

for(i=0;i<100;i++)>

// two-dimension array example
void bar (int a[][100], int b[restrict][100])

上面是一个使用restrict关键字的例子.当然了,你需要用编译选项让编译器支持restrict关键字. Linux下:-restrict;Windows下:/Qrestrict.

我的问题是:
- 上述优化循环无法优化的解决办法只是针对
Itanium(r)-based systems吗?还是针对Intel-based systems?

No comments: