第三次课继续讲内存管理,介绍了double、array,struct,&,*,以及大小端模式。Jerry会用很直观小方框图画出出内存管理图。高级语言里有很多模板,例如二分搜索、线形搜索、归并排序、快速排序等等,当你理解了内存管理的本质,就可以自己实现这些内存管理的范式。
“Whatever bit pattern happened to reside there before is now PRetending to be a character for the lifetime of this statement right here.” 对于强制类型转换的表达式,Jerry一直用这个口诀,翻译下就是:“取地址,解引用,强制重新解释地址,不管变量的bit pattern曾经是什么。”
e.g.2. 也是强制类型转换,但double *强制重新解释&s后,可能会导致s后没有多余的6个byte内存而崩溃。
有人提出了大小端机器之间数据copy的问题。如果是bit copy,没有什么不同,因为大小端字节序的前提是建立在多字节存储的时候;如果是byte copy,则会有高低位字节颠倒的情况。
Student: Are these examples gonna behave differently on low NDN? Jerry: Certainly, yeah. As far as bit copying is concerned, no. This ampersand(&), right here, is always the address of the lowest byte. But as far as how the NDNS has to do more with interpretation and placement of bytes relevant to another. There was these phrase in the handout that I kind of de-emphasized, but they’re there.
通过e.g.3.和Fig. 2介绍结构体内存操作:
struct和array是混合介绍的,所以就有了e.g.4 以数组的形式管理结构体内存的例子:
Student: So you are use the address of pi and the X sets it as an array there. Is it gonna know that the asides of each element is a fraction? Jerry: Yep. That’s it uses the data typing of whatever pi is right there, and because ampersand of pi is an int star…*
C/C++不提供数组越界检查(Bounce Checking),所以后面会有几组越界例子,来展示直接操作内存的副作用;java会提供检查,但CS107对此不作介绍。
e.g.4. array访问操作
int array[10];array[0] = 44;array[10] = 1;array[25] = 25; //it's okarray[-4] = 77; //it's oke.g.4.只是例子,不是好的代码,因为在C和C++中,声明数组的同时,应该传入数组的长度、宽度、和以列为元素的数组,从而让内存的图变得更加清晰,否则内存管理会变得混乱,甚至无法恢复。附Jerry的原话,我听了很多遍,很难理解的一段:
Jerry: It doesn’t in C and C++, not at all. All it does is instruction for that one declaration as to how many ints to set aside space for. But once you do that, like, there’s no length of the array is that. But the length of the memory figure, it’s not exposed to you. So there’s no way to recover it. That’s why you always pass around the length, width, a raw array in C and C++.
Jerry: You use vectors more than you did raw arrays in C in 106B, but we’re gonna be more C programmers than C++.
Jerry强调了在C语言中使用array进行内存管理,和在106B课程使用C++ STL模板的vector进行内存管理,方式是不同的。vector具有高级内存管理的特性,array直接和底层打交道。
Q&A: 为什么在初始化时限制数组边界?
Student: Say an array 10, like, you initialize it 10 by spaces, but like, what’s the point of initializing it if it’s just going to do, what’s basically do what you want when you inside of it. Jerry: That is true, actually. This right here is really just documentation for how much space is being allocated. And then you’re supposed to write code I’m not saying this is good code. I’m just saying its code. Okay? Your’re supposed to write code that’s consistent with the amount of space that you legally have. But this, this and this, just work because there’s no bounce checking. It doesn’t look arbitrarily far backwards to figure out whether or not it’s an in-range index. So when you get away with this, and it compiles, and it runs, it’s just gonna put a one where it assumes that the eleventh entry would be, or the twenty-sixty entry, or the negative fourth entry. Okay?*
e.g.4. array内存操作的语法:
array 等价 &array[0] (array+k) 等价 array[k] *array 等价 &array[0]*(array+k) 等价 &array[k] *(array-4) 等价 &array[-4]*的操作会首先进行,所以k不是移动了4个byte,而是16个byte,进入一个矩形,放入一个数值。基于类型系统,k会知道自己是int *型的,从array基地址开始偏移int长度*k个字节。e.g.5介绍一些array的强制类型转换:
e.g.5 array的强制类型转换
int arr[5];arr[3] = 128;((short*)arr)[6] = 2; cout<<arr[3]<<endl;array到short的强制类型转换使其可以操纵任意的基本单元,e.g.5的内存操作如Figure 4.:
输出结果是512+128,但是我在自己电脑上验证输出2,在纸上算也是2。本以为是大小端问题,后来还是师弟给出了正确解释。视频是2008年的,那个时候的int还是2 byte,long int才是4 byte,short(也就是short int)是1 byte,课程内容也是建立在32bit机器上,所以根本不是机器大小端的问题,而是操作系统位数的问题!很多C语言的经典著作里都会强调这个问题:
Stephen G. Kochan, Programming in C (4th Edition)
“You should never write programs that make any assumptions about the size of your data types. You are, however, guaranteed that a minimum amount of storage will be set aside for each basic type. For example, it’s guaranteed that an integer value will be stored in a minimum of 32 bit of storage, which is the size fo a “Word” on many computers.”
另外,对array的越界操作会影响到内存中保存array前后地址的数据,例如一个函数域function,这个区域是在函数运行阶段,在堆栈中开辟的一段内存空间:
function: int a; int array[5]; double d;array的越界可能会修改int a或double d的值,这种模型,是机器工作的真正模式,即将所有的局部变量打包成一个“活动记录”的小东西(a memory block for all local variables in a function)
以C++方式声明一个16 bytes的结构体,在pure C语言中没有string,所以name表示成character arrays,在末尾是一个0 character or null character。
struct student { char *name; //动态字符数组 char star char suid[8]; //静态字符数组 static character array of length 8 int numUnites; };student pupils[4];pupils[0].numUnits = 21;pupils[2].name = strdup("Adam");strcpy(pupils[1].suid, "2014110xxx");//求值指针表达式,6是基于char*类型进行的指针加减运算pupils[3].name= pupils[0].suid + 6; strcpy(pupils[3].name, "123456");其中,strdup=shorthand for string duplicate,dynamically allocates just enough space to store the string,strdup不指定地址,而是在堆中分配一块内存中写下‘Adam’,再返回‘Adam’地址;strcpy并不分配地址,而是在内部指定一块地址。在strcpy里有一个for循环进行character copy one after another直到0(0也被copy)。strdup似乎就是后来的malloc?
Q&A: 结构体在内存中以怎样的形式存储?
Jerry:“In 106B and 106X, and maybe 106A as well, you drew them as the somewhat loose rectangels around two boxes. Okay, I want to be a little bit more structured than that. I want to recognize that the amount of memory that’s set aside for the struct fraction, not surprisingly, is eight bytes. Okay? It’s obviously the sum of some of its parts, and it actually packs all of those bytes as tightly as possible.”
结构体在内存中的存储,需要考虑变量在内存地址中的先后顺序,以及各个变量间字节对齐的情况。Fig . 是结构体在内存中的存储图示:
在e.g. 5, line 13求值指针表达式中,strcpy会把pupils[0].suid+6
视为一连串长度的字符序列空间基地址,传给pupils[3].name
,此时pupils[3].name
里保存的应该是诸如0x1000,0x1001…这样的地址值。得到基地址后,在基地址基础上一个接着一个,向小方框赋值,或者将字符print出来。
Fig. 5. 内存图表
C语言中的泛型本质就是a simple function + advanced memory terminology,例如最常见的swap函数:
void swap(int *ap, int *bp);{ int temp = *ap; *ap = *bp; //求值它所寻址的部分 *bp = *temp;}int x = 7;int y = 117;swap(&x, &y)实际上这种辗转交换的过程和int无关,可以是double,char,struct,student…虽然在C中写泛型很没有必要,但是这确实是一种方式,尤其当你了解内存的原理后。下一次课会介绍这类泛型,用通用的指针和通用的字节交换来形成新的认识(in terms of generic pointers and generic byte)。
新闻热点
疑难解答