Difference between revisions of "PHP DOM & XPath"

From HaFrWiki
Jump to: navigation, search
(Created page with "{{TOCright}} == See also == <span class="editsection">top</span> == Reference == <span class="editsection">top</span> <references/> Category:Ind...")
 
m
Line 1: Line 1:
 
{{TOCright}}
 
{{TOCright}}
 +
The <code>Document Object Model</code> (<code>DOM</code>) is a programming API for HTML and XML documents defining the logical structure.
 +
 +
<code>XPath</code> (XML Path Language) is based on a tree representation of the XML/HTML document, and provides the ability to navigate around the tree, selecting nodes by a variety of criteria.
 +
 +
== Introduction ==
 +
The <code>Document Object Model</code> or <code>DOM</code> in combination with <code>XPath</code> are powerful/useful tools for screen scraping, programming and data manipulation (and much more).
 +
Unfortunately all the features are less known, often poorly documented and with very little tutorials.
 +
 +
Also the DOM and XPath can be used in PHP and in Javascript or better on the server and in the client-browser.
 +
 +
== Structuring XML and HTML ==
 +
 +
 +
=== Node Relations ===
 +
[[File:DOM-Relations.png|thumb|1050px|right|Node Relations]]
 +
The nodes in the node tree have a hierarchical relationship to each other which have to be defined clearly:
 +
* '''Root''' : the top node of the tree,
 +
* '''Parent''' : every node has exactly <u>one parent</u> except the root, which has no parent,
 +
* '''Child''' : A node can have a number of children,
 +
* '''Sibling''' : Siblings (brothers or sisters) are nodes with the <u>same parent</u>.
 +
 +
{| class="wikitableharm" width="700px"
 +
|-
 +
! colspan="2" | From the HTML on the right you can read
 +
|-
 +
| style="vertical-align:top;" width="50%" |
 +
* {{FormFCTW|8|blue|bold|html}} is the root node,
 +
* {{FormFCTW|8|blue|bold|html}} has no parents,
 +
* {{FormFCTW|8|blue|bold|html}} is the parent of {{FormFCTW|8|blue|bold|head}} and {{FormFCTW|8|blue|bold|body}},
 +
* {{FormFCTW|8|blue|bold|head}} is the first child of {{FormFCTW|8|blue|bold|html}},
 +
* {{FormFCTW|8|blue|bold|body}} is the last child of {{FormFCTW|8|blue|bold|html}},
 +
| style="vertical-align:top;" width="50%" |
 +
* {{FormFCTW|8|blue|bold|head}} has one child: {{FormFCTW|8|blue|bold|title}}
 +
* {{FormFCTW|8|blue|bold|title}} has one child (a text node): "DOM Tutorial"
 +
* {{FormFCTW|8|blue|bold|body}} has two children: {{FormFCTW|8|blue|bold|h1}} and {{FormFCTW|8|blue|bold|p}}
 +
* {{FormFCTW|8|blue|bold|h1}} has one child: "DOM Lesson one"
 +
* {{FormFCTW|8|blue|bold|p}} has one child: "Hello world!"
 +
* {{FormFCTW|8|blue|bold|h1}} and {{FormFCTW|8|blue|bold|p}} are siblings
 +
|-
 +
! colspan="2" | Use the following node properties to navigate between nodes
 +
|-
 +
| colspan="2" |
 +
* parentNode
 +
* childNodes[nodenumber]
 +
* firstChild
 +
* lastChild
 +
* nextSibling
 +
* previousSibling
 +
|}
  
 
== See also ==
 
== See also ==

Revision as of 08:45, 14 September 2020

The Document Object Model (DOM) is a programming API for HTML and XML documents defining the logical structure.

XPath (XML Path Language) is based on a tree representation of the XML/HTML document, and provides the ability to navigate around the tree, selecting nodes by a variety of criteria.

Introduction

The Document Object Model or DOM in combination with XPath are powerful/useful tools for screen scraping, programming and data manipulation (and much more). Unfortunately all the features are less known, often poorly documented and with very little tutorials.

Also the DOM and XPath can be used in PHP and in Javascript or better on the server and in the client-browser.

Structuring XML and HTML

Node Relations

Node Relations

The nodes in the node tree have a hierarchical relationship to each other which have to be defined clearly:

  • Root : the top node of the tree,
  • Parent : every node has exactly one parent except the root, which has no parent,
  • Child : A node can have a number of children,
  • Sibling : Siblings (brothers or sisters) are nodes with the same parent.
From the HTML on the right you can read
  • html is the root node,
  • html has no parents,
  • html is the parent of head and body,
  • head is the first child of html,
  • body is the last child of html,
  • head has one child: title
  • title has one child (a text node): "DOM Tutorial"
  • body has two children: h1 and p
  • h1 has one child: "DOM Lesson one"
  • p has one child: "Hello world!"
  • h1 and p are siblings
Use the following node properties to navigate between nodes
  • parentNode
  • childNodes[nodenumber]
  • firstChild
  • lastChild
  • nextSibling
  • previousSibling

See also

top

Reference

top