DOMXML: Expat之外的另一选择

2024-05-04 23:00:27

字体：大中小

来源：转载

供稿：网友

概述

　　在网上有许多php的xml教程，但是只有少数介绍如何使用dom来解析xml。我想利用这个机会展示一下在php编程中除了广泛使用的sax实现方式外，还有另一种选择。

　　dom(document object model，文档对象模型)和sax(simple api for xml，xml简单应用程序接口)在如何解析xml上有不同的方法。sax引擎完全是事件驱动的。当它遇到一个标记时，它就调用一个适当的函数来处理它。这使得sax非常快速和有效。然而他给你的感觉就象被套在一个没完没了的循环里面。你发现自己使用了太多的全局变量和条件语句。

　　另一方面，dom方法稍稍对内存有些敏感。它把整个xml文档以层次化的结构方式装载到内存里。这就是说，所有的数据组成了一个家族树，它们对编程者来说都是可用的。这种方法更直观，更容易使用，也提供了更多的可读性。

　　为了使用dom函数，你必需在配置php时，使用'--with-dom'参数。它们不是标准配置的组成部分，这里有一个简单的编译方式。

%＞ ./configure --with-dom --with-apache=../apache_1.3.12
%＞ make
%＞ make install

　　译注：在win32平台上支持dom做法是这样的。首先，将下载包中dlls目录下的php_domxml.dll文件复制到系统目录下。nt、win2k是system32目录，9x是system目录。其次，修改php.ini文件。将"paths and directories"中的extension_dir参数指向php_domxml.dll所在的路径，如extension_dir = c:/winnt/system32；将"dynamic extensions"中extension=php_domxml.dll前的注释去掉。

　　dom如何构造xml

　　因为dom装载一个完整的字符串或文件到内存中作为一个树，这使我们可以将这些数据作为一个整体进行操作。我们拿这个xml文档作为一个例子。

＜?xml version="1.0"?＞

＜book type="paperback"＞
＜title＞red nails＜/title＞
＜price＞$12.99＜/price＞
＜author＞
＜name first="robert" middle="e" last="howard"/＞
＜birthdate＞9/21/1977＜/birthdate＞
＜/author＞
＜/book＞

　　数据将象这样被组织起来

　　任何被标记封闭起来的文本都是它们自身的节点。举个例子来说，"red nails"是title的子节点，"$12.99"是price的子节点。
　在dom中使用的对象

　　你可能会觉得困惑，什么是一个domnode。我们从这里开始讨论包含在dom模型中的对象。dom定义了五种对象：domdocument, domnode, domattribute, domdtd, 和 domnamespace。我们将把主要注意力集中在domdocument和domnode对象，因为他们是最常用的。

　　node对象

　　下面是一个domnode对象所包含内容的概览。

class domnode
　properties:
　　name
　　content
　　type
　methods:
　　lastchild()
　　children()
　　parent()
　　new_child( $name,$content )
　　getattr( $name )
　　setattr( $name,$value )
　　attributes()

　　properties需要一些详细的说明。

　　·name 属性实际上是节点标记的名称。一个引用title标记的节点可能就是用'title'作为节点名。

　　·content 属性通常是空的。然而文本型节点使用这个属性来保存文本。

　　·type 属性是个常数，它明确定义了这个节点是什么类型的对象。有一些domnode对象的类型。这些类型常数的列表可以从http://www.php.net/manual/ref.domxml.php在线获得。例如，一个包含文本内容的节点就可能有一个xml_text_node的类型。

　　methods也需要解释一下。

　　·lastchild() 返回一个节点的最后一个子节点。

　　·parent() 返回一个节点的父节点。例如，我们这里title节点的父节点就是'book'。

　　·children() 返回一个包含某节点所有子节点的数组。例如，author节点的children就是'name'和'birthdate'。

　　·new_child() 增加一个新的子节点，包括一个名称和一些作为参数的内容。

　　·getattr()和setattr()都是用于处理属性的。一个是取得属性值，一个是设置属性值。

　　·attributes() 返回一个domattribute对象的数组。

　　domdocument对象

　　domdocument对象也是重要的。

class domdocument
　properties:
　　version
　　encoding
　　standalone
　　type
　methods:
　　root()
　　children()
　　add_root( $node )
　　dtd()
　　dumpmem()

　　properties(属性)的名字就可以解释自身的含义。

　　·'version' 指文档的xml版本号。

　　·'encoding' 指文本的编码。

　　·'standalone' 是一个布尔值，它决定文档是否独立。

　　method(方法)也是相当简单的。

　　·root() 返回文档的根节点。如果我们载入前面的xml范例作为一个domdocument对象，那么根节点就是'book'。

　　·children() 与domnode中的children一样。

　　·add_root() 在xml文档中增加一个新的根节点。如果你想用一个其他节点来代替'book'节点，那么你就要用到这个方法了。

　　·dtd() 返回xml文档的dtd。

　　·dumpmem() 返回xml数据的字符串表示。译注：dumpmem()方法将整个domdocument对象串行化为一个字符串，并返回。

　　由xmltree()返回的domdocument对象

　　xmltree()，返回另一种类型的domdocument对象，它可能会给你带来麻烦。这个对象没有方法，它用属性代替了方法。它有一个真正的树形结构。

class domdocument
　properties:
　　version
　　encoding
　　standalone
　　name
　　content
　　type
　　attributes
　　children

　　它很容易使用。例如，用不着使用一个方法去得到一个节点的子节点，只要访问它的'children'属性就可以了。同样，'children'和'attributes'属性都是数组。

　　其他的对象

　　我将列出其他对象以及它们的属性和方法作为参考。在这篇文章中，我们将用不到它们。

class attribute
　properties:
　　name
　　content
　　methods:
　　name()

　class dtd
　　properties:
　　　extid
　　　sysid
　　　name
class namespace

　使用对象

　　dom模型只有三个函数，xmldoc()，xmldocfile()和xmltree()。剩下的时间，我们将用这些对象进行处理。上面的函数都返回domdocument对象。这有一些例子，关于如何装载xml数据到你的php脚本。

＜?php

　# 使用下面两个方法中的任一个，从一个字符串装载xml

　$doc = xmldoc( $xmlstr );
　$tree = xmltree( $xmlstr );

　# 从一个文件装载xml
　$doc = xmldocfile( $xmlfile );

?＞

　　如果xml不能被正确解析，那么这些函数都会掷出一个错误。dom不会为你验证xml文件的正确性。你必需用其他的方式来完成这一点。或许可以通过其他的程序来做，如xmllint。译注：微软ie中内嵌xml解析器，只要用ie来浏览xml文档，就可以验证文档的有效性了。

　　一个简单的例子

　　让我们用一个简单的例子，将前面提到的东西联系起来。＜?php

　# 生成一个xml范例文档，以进行演示
　$xmlstr = "＜" . "?" . "xml version=/"1.0/"" . "?" . "＞";
　$xmlstr .=
　"
　　＜employee＞
　　　＜name＞matt＜/name＞
　　　＜position type=/"contract/"＞web guy＜/position＞
　　＜/employee＞
　";

　# 装载xml数据($doc成为一个domdocument对象的实例)
　$doc = xmldoc($xmlstr);

　# 得到根节点"employee"
　$employee = $doc-＞root();

　# 得到employee节点的子节点("name","position")
　$nodes = $employee-＞children();
　
　# 我们打算使用"position"节点
　# 因此我们必需反复通过employee的子节点来搜索它
　while ($node = array_shift($nodes))
　{
　　if ($node-＞name == "position")
　　{
　　　$position = $node;
　　　break;
　　}
　}

　# 得到position的类型属性
　$type = $position-＞getattr("type");
　
　# 得到被封闭在position标记中的文本
　# 移动到position子节点的第一个子节点
　# 译注：这里的用法可以参考第一部分“dom如何构造xml”的最后一段。
　$text_node = array_shift($position-＞children());

　# 访问这个文本节点的内容属性
　$text = $text_node-＞content;

　# 输出position和type
　echo "position: $text＜br＞";
　echo "type: $type";

?＞

　　这个例子将产生下面的输出。

position: web guy
type: contract

　　上面例子中，while循环实际上是寻找position节点。这里的employee节点真正有5个子节点：三个text(文本)，一个name，一个position。这个文本节点包含在行末尾的换行。这一点开始看起来比较奇怪，但是dom将任何的字符串(即使是那些只包含空白部分的)都作为text(文本)，并为它们都创造一个节点。
如果你为了确保employee节点只有两个子节点，那么你必需象这样写xml项目。

＜employee＞
　＜name＞matt＜/name＞
　＜position type="contract"＞web guy＜/position＞
＜/employee＞

　　一个更长的例子

　　下面是一个更长的例子，关于如何从一个xml文档中提取信息。比如，我们有一个employees.xml文件，它包含employee信息。

＜?xml version="1.0"?＞

　＜employees company="zoomedia.com"＞
　＜employee＞
　　＜name＞matt＜/name＞
　　＜position type="contract"＞web guy＜/position＞
　＜/employee＞

　＜employee＞
　　＜name＞george＜/name＞
　　＜position type="full time"＞mad hacker＜/position＞
　＜/employee＞

　＜employee＞
　　＜name＞wookie＜/name＞
　　＜position type="part time"＞hairy sysadmin＜/position＞
　＜/employee＞
＜/employees＞

　　下面展示如何在php脚本提取这个信息。

＜?php

　# 遍历一个节点数组，寻找一个文本节点，并返回它的内容
　function get_content($parent)
　{
　　$nodes = $parent-＞children();
　　while($node = array_shift($nodes))
　　if ($node-＞type == xml_text_node)
　　　return $node-＞content;
　　return "";
　}

　# 得到一个特定节点的内容
　function find_content($parent,$name)
　{
　　$nodes = $parent-＞children();
　　while($node = array_shift($nodes))
　　　if ($node-＞name == $name)
　　　　return get_content($node);
　　　return "";
　}

　# 得到一个特定节点的属性
　function find_attr($parent,$name,$attr)
　{
　　$nodes = $parent-＞children();
　　while($node = array_shift($nodes))
　　　if ($node-＞name == $name)
　　　　　return $node-＞getattr($attr);
　　return "";
　}

　# 载入xml文档
　$doc = xmldocfile("employees.xml") or die("what employees?");

　# 得到根节点(employees)
　$root = $doc-＞root();

　# 得到employees的子节点数组，它包含了每一个employee节点
　$employees = $root-＞children();

　# 在数组中移动，输出一些emloyee数据
　while($employee = array_shift($employees))
　{
　　if ($employee-＞type == xml_text_node)
　　　continue;

　　　$name = find_content($employee,"name");
　　　$pos = find_content($employee,"position");
　　　$type = find_attr($employee,"position","type");

　　　echo "$name the $pos, $type employee＜br＞";
　}

?＞

　　你可以从你的浏览器上看到下面的输出。

matt the web guy, contract employee
george the mad hacker, full time employee
wookie the hairy sysadmin, part time employee

　　另一个例子(增加数据)

　　因为xml被载入到内存作为一个树，所以我们能够很容易地操作这些数据。必要时，我们能够增加分支或节点。

　　比方说，我们想在xml文件中增加一个employee(售员)。