PHP DOM & XPath: Difference between revisions
Created page with "{{TOCright}} == See also == <span class="editsection">top</span> == Reference == <span class="editsection">top</span> <references/> Category:Ind..." |
|||
(15 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
{{TOCright}} | {{TOCright}} | ||
The <code>Document Object Model</code> (<code>DOM</code>) is a programming API for HTML and XML documents defining the logical structure. | |||
<code>XPath</code> (XML Path Language) is based on a tree representation of the XML/HTML document, and provides the ability to navigate around the tree, selecting nodes by a variety of criteria. | |||
== Introduction == | |||
The <code>Document Object Model</code> or <code>DOM</code> in combination with <code>XPath</code> are powerful/useful tools for screen scraping, programming and data manipulation (and much more). | |||
Unfortunately all the features are less known, often poorly documented and with very little tutorials. | |||
Also the DOM and XPath can be used in PHP and in Javascript or better on the server and in the client-browser. | |||
One should keep the following points in mind, while working with XPath: | |||
* XPath is core component of XSLT standard. | |||
* XSLT cannot work without XPath. | |||
* XPath is basis of XQuery and XPointer. | |||
== Structuring XML and HTML == | |||
=== DOM Nodes === | |||
According to the W3C HTML DOM standard, everything in an HTML document is a node: | |||
* The entire document is a document node, | |||
* Every HTML element is an element node, | |||
* The text inside HTML elements are text nodes, | |||
* Every HTML attribute is an attribute node (deprecated), | |||
* All comments are comment nodes, | |||
=== Node Relations === | |||
[[File:DOM-Relations.png|thumb|1050px|right|Node Relations]] | |||
The nodes in the node tree have a hierarchical relationship to each other which have to be defined clearly: | |||
* '''Root''' : the top node of the tree, | |||
* '''Parent''' : every node has exactly <u>one parent</u> except the root, which has no parent, | |||
* '''Child''' : A node can have a number of children, | |||
* '''Sibling''' : Siblings (brothers or sisters) are nodes with the <u>same parent</u>. | |||
{| class="wikitableharm" width="700px" | |||
|- | |||
! colspan="2" | From the HTML on the right you can read | |||
|- | |||
| style="vertical-align:top;" width="50%" | | |||
* {{FormFCTW|8|blue|bold|html}} is the root node, | |||
* {{FormFCTW|8|blue|bold|html}} has no parents, | |||
* {{FormFCTW|8|blue|bold|html}} is the parent of {{FormFCTW|8|blue|bold|head}} and {{FormFCTW|8|blue|bold|body}}, | |||
* {{FormFCTW|8|blue|bold|head}} is the first child of {{FormFCTW|8|blue|bold|html}}, | |||
* {{FormFCTW|8|blue|bold|body}} is the last child of {{FormFCTW|8|blue|bold|html}}, | |||
| style="vertical-align:top;" width="50%" | | |||
* {{FormFCTW|8|blue|bold|head}} has one child: {{FormFCTW|8|blue|bold|title}} | |||
* {{FormFCTW|8|blue|bold|title}} has one child (a text node): "DOM Tutorial" | |||
* {{FormFCTW|8|blue|bold|body}} has two children: {{FormFCTW|8|blue|bold|h1}} and {{FormFCTW|8|blue|bold|p}} | |||
* {{FormFCTW|8|blue|bold|h1}} has one child: "DOM Lesson one" | |||
* {{FormFCTW|8|blue|bold|p}} has one child: "Hello world!" | |||
* {{FormFCTW|8|blue|bold|h1}} and {{FormFCTW|8|blue|bold|p}} are siblings | |||
|- | |||
! colspan="2" | Use the following node properties to navigate between nodes | |||
|- | |||
| colspan="2" | | |||
* parentNode | |||
* childNodes[nodenumber] | |||
* firstChild | |||
* lastChild | |||
* nextSibling | |||
* previousSibling | |||
|} | |||
* Examples: https://www.w3schools.com/js/js_htmldom_navigation.asp | |||
=== nodeName === | |||
The nodeName property specifies the name of a node. | |||
* nodeName is read-only | |||
* nodeName of an <u>element node</u> is the same as the <u>tag name</u>, | |||
* nodeName of an <u>attribute node</u> is the <u>attribute name</u> | |||
* nodeName of a <u>text node</u> is always {{FormFCTW|8|blue|bold|#text}}, | |||
* nodeName of the <u>document node</u> is always {{FormFCTW|8|blue|bold|#document}}. | |||
=== nodeValue === | |||
The nodeValue property specifies the value of a node. | |||
* nodeValue for element nodes is {{FormFCTW|8|blue|bold|null}}, | |||
* nodeValue for text nodes is the text itself, | |||
* nodeValue for attribute nodes is the attribute value. | |||
=== nodeType === | |||
The nodeType property | |||
* nodeType is read only. | |||
* nodeType is the type of a node. | |||
The most important nodeType properties are: | |||
{| class="wikitableharm" | |||
|- | |||
! width="150px" style="text-align:left;" | Node | |||
! width="050px" style="text-align:center;"| Type | |||
! width="350px" style="text-align:left;" | Example | |||
|- | |||
| ELEMENT_NODE | |||
| style="text-align:center;" | 1 | |||
| <h1 class="heading">W3Schools</h1> | |||
|- | |||
| ATTRIBUTE_NODE | |||
| style="text-align:center;" | 2 | |||
| class = "heading" (deprecated) | |||
|- | |||
| TEXT_NODE | |||
| style="text-align:center;" | 3 | |||
| W3Schools | |||
|- | |||
| COMMENT_NODE | |||
| style="text-align:center;" | 8 | |||
| <!-- This is a comment --> | |||
|- | |||
| DOCUMENT_NODE | |||
| style="text-align:center;" | 9 | |||
| The HTML document itself (the parent of <html>) | |||
|- | |||
| DOCUMENT_TYPE_NODE | |||
| style="text-align:center;" | 10 | |||
| <!Doctype html> | |||
|} | |||
# '''Remark''': Type 2 is deprecated in the HTML DOM (but works). It is not deprecated in the XML DOM. | |||
# '''See''' the full list: https://www.php.net/manual/en/dom.constants.php | |||
== XPath syntax == | |||
Tutorials on xpath made by the Website ZVON <ref name="zvon">[http://www.zvon.org ZVON.org], Tutorials by Example on HTML, CSS, XPath, XML, Schemas and much more.</ref><ref>[http://zvon.org/xxl/XPathTutorial/General/examples.html ZVON XPath Tutorial], XPath Tutorial by Example.</ref><ref>[http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.dc30020_1251/html/xmlb/xmlb31.htm Infocenter Sybase] XPath operator and functions</ref>. | |||
{| class="wikitableharm" width="1500px" | |||
|- | |||
| width="1250px" | | |||
{| class="wikitableharm" width="1250px" | |||
|- | |||
! width="150px" | Syntax | |||
! width="400px" | Description | |||
! width="300px" | Example catch | |||
! width="400px" | Links | |||
|- | |||
| / | |||
| {{FormFCT|9|blue|Child}}: Absolute path to the required element. | |||
| /AAA => <AAA> | |||
| http://zvon.org/xxl/XPathTutorial/Output/example1.html | |||
|- | |||
| // | |||
| {{FormFCT|9|blue|Descedents}}: All elements which fulfill following criteria are selected. | |||
| //BBB => <BBB/> on 2, 4, 5, 7 | |||
| http://zvon.org/xxl/XPathTutorial/Output/example2.html | |||
|- | |||
| . | |||
| Indicates the current context. | |||
| | |||
| | |||
|- | |||
| .. | |||
| Indicates the parent of the current context. | |||
| | |||
| | |||
|- | |||
| : | |||
| Namespace separator. | |||
| | |||
| | |||
|- | |||
| () | |||
| Groups operations to explicitly establish precedence. | |||
| | |||
| | |||
|- | |||
| <nowiki>*</nowiki> | |||
| {{FormFCT|9|blue|Collect}}: selects all elements located by preceding path. | |||
| /AAA/DDD/* => 7. <BBB/> after <DDD> | |||
| http://zvon.org/xxl/XPathTutorial/Output/example3.html | |||
|- | |||
| [] | |||
| {{FormFCT|9|blue|Filter}}: A number in the brackets gives the position of the element in the selected set. [last()] specifies the last element. | |||
| /AAA/BBB[1] => <BBB/> on 2. | |||
| http://zvon.org/xxl/XPathTutorial/Output/example4.html | |||
|- | |||
| @ | |||
| {{FormFCT|9|blue|Attribute}}: Specifies an attribute. | |||
| | |||
| http://zvon.org/xxl/XPathTutorial/Output/example5.html | |||
|- | |||
| @attrib='xx' | |||
| Specifies the value of an attribute. | |||
| | |||
| http://zvon.org/xxl/XPathTutorial/Output/example6.html | |||
|- | |||
| count() | |||
| Number of selected elements. | |||
| | |||
| http://zvon.org/xxl/XPathTutorial/Output/example7.html | |||
|- | |||
| @name()='xx' | |||
| Gets the name of the element. Extra {{FormFCTW|8|blue|bold|start-with}} and {{FormFCTW|8|blue|bold|contains}}. | |||
| | |||
| http://zvon.org/xxl/XPathTutorial/Output/example8.html | |||
|- | |||
| string-length | |||
| Specifies the string length of the element name. | |||
| //*[string-length(name()) = 3] => all elements. | |||
| http://zvon.org/xxl/XPathTutorial/Output/example9.html | |||
|- | |||
| <nowiki>|</nowiki> | |||
| {{FormFCT|9|blue|Union}}: Implements the 'or' into XPath. | |||
| <nowiki>//AAA|//BBB</nowiki> | |||
| http://zvon.org/xxl/XPathTutorial/Output/example10.html | |||
|- | |||
| child | |||
| Children of the context node. The child axis is the default axis and it can be omitted. | |||
| //child::AAA => <AAA> </AAA>, | |||
| http://zvon.org/xxl/XPathTutorial/Output/example11.html | |||
|- | |||
| descedent | |||
| Descedents of the context node; a descendant is a child or a child of a child and so on; | |||
| /DDD/descedent => <BBB/> | |||
| http://zvon.org/xxl/XPathTutorial/Output/example12.html | |||
|- | |||
| parent | |||
| | |||
| | |||
| http://zvon.org/xxl/XPathTutorial/Output/example13.html | |||
|- | |||
| ancestor | |||
| | |||
| | |||
| http://zvon.org/xxl/XPathTutorial/Output/example14.html | |||
|- | |||
| following-sibling | |||
| | |||
| | |||
| http://zvon.org/xxl/XPathTutorial/Output/example15.html | |||
|- | |||
| preceding-sibling | |||
| | |||
| | |||
| http://zvon.org/xxl/XPathTutorial/Output/example16.html | |||
|- | |||
| following | |||
| | |||
| | |||
| http://zvon.org/xxl/XPathTutorial/Output/example17.html | |||
|- | |||
| preceding | |||
| | |||
| | |||
| http://zvon.org/xxl/XPathTutorial/Output/example18.html | |||
|- | |||
| descedent-or-self | |||
| | |||
| | |||
| http://zvon.org/xxl/XPathTutorial/Output/example19.html | |||
|- | |||
| ancestor-or-self | |||
| | |||
| | |||
| http://zvon.org/xxl/XPathTutorial/Output/example20.html | |||
|- | |||
| ancestor::* | |||
| | |||
| //BBB[position() mod 2 = 0 ] | |||
| http://zvon.org/xxl/XPathTutorial/Output/example21.html | |||
|- | |||
| mod | |||
| | |||
| | |||
| http://zvon.org/xxl/XPathTutorial/Output/example22.html | |||
|} | |||
| style="vertical-align:top;" width="250px" | | |||
'''XML-text''' | |||
<pre> | |||
1 <AAA> | |||
2 <BBB/> | |||
3 <CCC/> | |||
4 <BBB/> | |||
5 <BBB/> | |||
6 <DDD> | |||
7 <BBB/> | |||
8 </DDD> | |||
9 <CCC/> | |||
10 </AAA> | |||
</pre> | |||
|} | |||
== DOM Overview == | |||
[[File:DOM-Overview.png]] | |||
== See also == | == See also == | ||
<span class="editsection">[[#content|top]]</span> | <span class="editsection">[[#content|top]]</span> | ||
* [https://riptutorial.com/Download/xpath.pdf Riptutoral], ''Learning XPath'' (''XPath.pdf'') is an excellent free eBook-tutorial without the XML explanations, so solely on XPath. | |||
* [https://devhints.io/xpath#prefixes devhints.io], XPath CheatSheet contains useful examples for XPath queries. | |||
== Reference == | == Reference == |
Latest revision as of 09:22, 16 October 2020
The Document Object Model
(DOM
) is a programming API for HTML and XML documents defining the logical structure.
XPath
(XML Path Language) is based on a tree representation of the XML/HTML document, and provides the ability to navigate around the tree, selecting nodes by a variety of criteria.
Introduction
The Document Object Model
or DOM
in combination with XPath
are powerful/useful tools for screen scraping, programming and data manipulation (and much more).
Unfortunately all the features are less known, often poorly documented and with very little tutorials.
Also the DOM and XPath can be used in PHP and in Javascript or better on the server and in the client-browser.
One should keep the following points in mind, while working with XPath:
- XPath is core component of XSLT standard.
- XSLT cannot work without XPath.
- XPath is basis of XQuery and XPointer.
Structuring XML and HTML
DOM Nodes
According to the W3C HTML DOM standard, everything in an HTML document is a node:
- The entire document is a document node,
- Every HTML element is an element node,
- The text inside HTML elements are text nodes,
- Every HTML attribute is an attribute node (deprecated),
- All comments are comment nodes,
Node Relations

The nodes in the node tree have a hierarchical relationship to each other which have to be defined clearly:
- Root : the top node of the tree,
- Parent : every node has exactly one parent except the root, which has no parent,
- Child : A node can have a number of children,
- Sibling : Siblings (brothers or sisters) are nodes with the same parent.
From the HTML on the right you can read | |
---|---|
|
|
Use the following node properties to navigate between nodes | |
|
nodeName
The nodeName property specifies the name of a node.
- nodeName is read-only
- nodeName of an element node is the same as the tag name,
- nodeName of an attribute node is the attribute name
- nodeName of a text node is always #text,
- nodeName of the document node is always #document.
nodeValue
The nodeValue property specifies the value of a node.
- nodeValue for element nodes is null,
- nodeValue for text nodes is the text itself,
- nodeValue for attribute nodes is the attribute value.
nodeType
The nodeType property
- nodeType is read only.
- nodeType is the type of a node.
The most important nodeType properties are:
Node | Type | Example |
---|---|---|
ELEMENT_NODE | 1 | <h1 class="heading">W3Schools |
ATTRIBUTE_NODE | 2 | class = "heading" (deprecated) |
TEXT_NODE | 3 | W3Schools |
COMMENT_NODE | 8 | |
DOCUMENT_NODE | 9 | The HTML document itself (the parent of <html>) |
DOCUMENT_TYPE_NODE | 10 | <!Doctype html> |
- Remark: Type 2 is deprecated in the HTML DOM (but works). It is not deprecated in the XML DOM.
- See the full list: https://www.php.net/manual/en/dom.constants.php
XPath syntax
Tutorials on xpath made by the Website ZVON [1][2][3].
|
XML-text 1 <AAA> 2 <BBB/> 3 <CCC/> 4 <BBB/> 5 <BBB/> 6 <DDD> 7 <BBB/> 8 </DDD> 9 <CCC/> 10 </AAA> |
DOM Overview
See also
- Riptutoral, Learning XPath (XPath.pdf) is an excellent free eBook-tutorial without the XML explanations, so solely on XPath.
- devhints.io, XPath CheatSheet contains useful examples for XPath queries.
Reference
- ↑ ZVON.org, Tutorials by Example on HTML, CSS, XPath, XML, Schemas and much more.
- ↑ ZVON XPath Tutorial, XPath Tutorial by Example.
- ↑ Infocenter Sybase XPath operator and functions