在 C# 中处理结构内的数组

2024-07-21 02:19:11

字体：大中小

来源：转载

供稿：网友

本文来源于网页设计爱好者web开发社区http://www.html.org.cn收集整理，欢迎访问。

在 c/c++ 代码中，大量掺杂着包括普通类型和数组的结构，如定义 pe 文件头结构的 image_optional_header 结构定义如下：

以下内容为程序代码:

typedef struct _image_data_directory {
dword virtualaddress;
dword size;
} image_data_directory, *pimage_data_directory;

#define image_numberof_directory_entries 16

typedef struct _image_optional_header {

word magic;

//...

dword numberofrvaandsizes;
image_data_directory datadirectory[image_numberof_directory_entries];

} image_optional_header32, *pimage_optional_header32;

在 c/c++ 中这样在结构中使用数组是完全正确的，因为这些数组将作为整个结构的一部分，在对结构操作时直接访问结构所在内存块。但在 c# 这类语言中，则无法直接如此使用，因为数组是作为一种特殊的引用类型存在的，如定义：
以下内容为程序代码:

public struct image_data_directory
{
public uint virtualaddress;
public uint size;
}

public struct image_optional_header
{
public const int image_numberof_directory_entries = 16;

public ushort magic;

//...

public uint numberofrvaandsizes;

public image_data_directory datadirectory[image_numberof_directory_entries];
}

在 c# 中这样定义结构中的数组是错误的，会在编译时获得一个 cs0650 错误：

以下为引用：

error cs0650: 语法错误，错误的数组声明符。若要声明托管数组，秩说明符应位于变量标识符之前

如果改用 c# 中引用类型的类似定义语法，如
以下内容为程序代码:

public struct image_optional_header
{
public const int image_numberof_directory_entries = 16;

public ushort magic;

//...

public uint numberofrvaandsizes;

public image_data_directory[] datadirectory = new image_data_directory[image_numberof_directory_entries];
}

则得到一个 cs0573 错误：

以下为引用：

error cs0573: “image_optional_header.datadirectory” : 结构中不能有实例字段初始值设定项

因为结构内是不能够有引用类型的初始化的，这与 class 的初始化工作不同。如此一来只能将数组的初始化放到构造函数中，而且结构还不能有无参数的缺省构造函数，真是麻烦，呵呵
以下内容为程序代码:

public struct image_optional_header
{
public const int image_numberof_directory_entries = 16;

public ushort magic;

public uint numberofrvaandsizes;

public image_data_directory[] datadirectory;

public image_optional_header(intptr ptr)
{
magic = 0;
numberofrvaandsizes = 0;

datadirectory = new image_data_directory[image_numberof_directory_entries];
}
}

这样一来看起来似乎能使了，但如果使用 marshal.sizeof(typeof(image_optional_header)) 看看就会发现，其长度根本就跟 c/c++ 中定义的长度不同。问题还是在于结构中数组，虽然看起来此数组是定义在结构内，但实际上在此结构中只有一个指向 image_data_directory[] 数组类型的指针而已，本应保存在 datadirectory 未知的数组内容，是在托管堆中。
于是问题就变成如何将引用类型的数组，放在一个值类型的结构中。

解决的方法有很多，如通过 structlayout 显式指定结构的长度来限定内容：
以下内容为程序代码:

[structlayout(layoutkind.sequential, size=xxx)]
public struct image_optional_header
{
public const int image_numberof_directory_entries = 16;

public ushort magic;

public uint numberofrvaandsizes;

public image_data_directory datadirectory;
}

注意这儿 structlayout 中 size 指定的是整个结构的长度，因为 datadirectory 已经是最后一个字段，故而数组的后 15 个元素被保存在未命名的堆栈空间内。使用的时候稍微麻烦一点，需要一次性读取整个结构，然后通过 unsafe 代码的指针操作来访问 datadirectory 字段后面的其他数组元素。
这种方法的优点是定义简单，但使用时需要依赖 unsafe 的指针操作代码，且受到数组字段必须是在最后的限制。当然也可以通过 layoutkind.explicit 显式指定每个字段的未知来模拟多个结构内嵌数组，但这需要手工计算每个字段偏移，比较麻烦。

另外一种解决方法是通过 marshal 的支持，显式定义数组元素所占位置，如
以下内容为程序代码:

[structlayout(layoutkind.sequential, pack=1)]
public struct image_optional_header
{
public const int image_numberof_directory_entries = 16;

public ushort magic;

public uint numberofrvaandsizes;

[marshalas(unmanagedtype.byvalarray, sizeconst=image_numberof_directory_entries)]
public image_data_directory[] datadirectory;
}

这种方法相对来说要优雅一些，通过 marshal 机制支持的属性来定义值数组语义，使用起来与普通的数组区别不算太大。上述数组定义被编译成 il 定义：
以下内容为程序代码:

.field public marshal( fixed array [16]) valuetype image_data_directory[] datadirectory

虽然类型还是 valuetype image_data_directory[]，但因为 marshal( fixed array [16]) 的修饰，此数组已经从引用语义改为值语义。不过这样做还是会受到一些限制，如不能多层嵌套、使用时性能受到影响等等。

除了上述两种在结构定义本身做文章的解决方法，还可以从结构的操作上做文章。

此类结构除了对结构内数组的访问外，主要的操作类型就是从内存块或输入流中读取整个结构，因此完全可以使用 clr 提高的二进制序列化支持，通过实现自定义序列化函数来完成数据的载入和保存，如：
以下内容为程序代码:

[serializable]
public struct image_optional_header : iserializable
{
public const int image_numberof_directory_entries = 16;

public ushort magic;

public uint numberofrvaandsizes;

public image_data_directory[] datadirectory;

public image_optional_header(intptr ptr)
{
magic = 0;
numberofrvaandsizes = 0;

datadirectory = new image_data_directory[image_numberof_directory_entries];
}

[securitypermissionattribute(securityaction.demand,serializationformatter=true)]
public virtual void getobjectdata(serializationinfo info, streamingcontext context)
{
// 完成序列化操作
}
}

这种解决方法可以将结构的载入和存储，与结构的内部表现完全分离开来。虽然结构内部保存的只是数组引用，但用户并不需关心。但缺点是必须为每个结构都编写相应的序列化支持代码，编写和维护都比较麻烦。

与此思路类似的是我比较喜欢的一种解决方法，通过一个公共工具基类以 reflection 的方式统一处理，如：
以下内容为程序代码:

public class image_optional_header : binaryblock
{
public const int image_numberof_directory_entries = 16;

public ushort magic;

public uint numberofrvaandsizes;

public image_data_directory[] datadirectory = new image_data_directory[image_numberof_directory_entries];
}

注意原本的 struct 在这儿已经改为 class，因为通过这种方式已经没有必要非得固守值类型的内存模型。binaryblock 是一个公共的工具基类，负责通过 reflection 提供类型的载入和存储功能，如
以下内容为程序代码:

public class binaryblock
{
private static readonly ilog _log = logmanager.getlogger(typeof(binaryblock));

public binaryblock()
{
}

static public object loadfromstream(binaryreader reader, type objtype)
{
if(objtype.equals(typeof(char)))
{
return reader.readchar();
}
else if(objtype.equals(typeof(byte)))
{
return reader.readbyte();
}
//...
else if(objtype.equals(typeof(double)))
{
return reader.readdouble();
}
else if(objtype.isarray)
{
// 处理数组的情况
}
else
{
foreach(fieldinfo field in classtype.getfields())
{
field.setvalue(obj, loadfromstream(...));
}
}

return true;
}

public bool loadfromstream(stream stream)
{
return loadfromstream(new binaryreader(stream), this);
}
}

loadfromstream 是一个嵌套方法，负责根据指定字段类型从流中载入相应的值。使用时只需要对整个类型调用此方法，则会自动以 reflection 机制，遍历类的所有字段进行处理，如果有嵌套定义的情况也可以直接处理。使用此方法，类型本身的定义基本上就无需担心载入和存储机制，只要从 binaryblock 类型继承即可。有兴趣的朋友还可以对此类进一步扩展，支持二进制序列化机制。

此外 c# 2.0 中为了解决此类问题提供了一个新的 fixed array 机制，支持在结构中直接定义内嵌值语义的数组，如
以下内容为程序代码:

struct data
{
int header;
fixed int values[10];
}

此结构在编译时由编译器将数组字段翻译成一个外部值类型结构，以实现合适的空间布局，如
以下内容为程序代码:

.class private sequential ansi sealed beforefieldinit data
extends [mscorlib]system.valuetype
{
.class sequential ansi sealed nested public beforefieldinit '<values>e__fixedbuffer0'
extends [mscorlib]system.valuetype
{
.pack 0
.size 40
.custom instance void [mscorlib]system.runtime.compilerservices.compilergeneratedattribute::.ctor() = ( 01 00 00 00 [img]/images/wink.gif[/img]
.field public int32 fixedelementfield
} // end of class '<values>e__fixedbuffer0'

.field public int32 header
.field public valuetype data/'<values>e__fixedbuffer0' values
.custom instance void [mscorlib]system.runtime.compilerservices.fixedbufferattribute::.ctor(class [mscorlib]system.type, int32) = ( ...)
} // end of class data

可以看到 values 字段被编译成一个值类型，而值类型本身使用的是类似于上述第一种解决方法的思路，强行限制结构长度。而在使用时，也完全是类似于第一种解决方法的 unsafe 操作，如对此数组的访问被编译成 unsafe 的指针操作：
以下内容为程序代码:

// 编译前
for(int i=0; i<10; i++)
d.values[i] = i;

// 编译后
for(int i=0; i<10; i++)
&data1.values.fixedelementfield[(((intptr) i) * 4)] = i;

不幸的是这种方式必须通过 unsafe 方式编译，因为其内部都是通过 unsafe 方式实现的。而且也只能处理一级的嵌套定义，如果将 image_optional_header 的定义转换过来会得到一个 cs1663 错误：
以下内容为程序代码:

error cs1663: fixed sized buffer type must be one of the following: bool, byte, short, int, long, char, sbyte, ushort, uint, ulong, float or double

eric gunnerson 有篇文章, arrays inside of structures,简要介绍了 c# 2.0 中的这种有限度的增强语法。

上一篇：浅谈 C# 中的代码协同 (Coroutine) 执行支持

下一篇：C# 绘图--飘带