The DOM Crawler ¶
A Crawler instance is returned each time you make a request with the Client. It allows you to traverse HTML or XML documents: select nodes, find links and forms, and retrieve attributes or contents.
Traversing ¶
Like jQuery, the Crawler has methods to traverse the DOM of an HTML/XML
document. For example, the following finds all input[type=submit]
elements,
selects the last one on the page, and then selects its immediate parent element:
1 2 3 4 5 |
$newCrawler = $crawler->filter('input[type=submit]')
->last()
->parents()
->first()
;
|
Many other methods are also available:
filter('h1.title')
-
Nodes that match the CSS selector.
CSS セレクターに一致するノード。
filterXpath('h1')
-
Nodes that match the XPath expression.
XPath 式に一致するノード。
eq(1)
-
Node for the specified index.
指定されたインデックスのノード。
first()
-
First node.
最初のノード。
last()
-
Last node.
最後のノード。
siblings()
-
Siblings.
兄弟。
nextAll()
-
All following siblings.
以下のすべての兄弟。
previousAll()
-
All preceding siblings.
先行するすべての兄弟。
parents()
-
Returns the parent nodes.
親ノードを返します。
children()
-
Returns children nodes.
子ノードを返します。
reduce($lambda)
-
Nodes for which the callable does not return false.
callable が false を返さないノード。
Since each of these methods returns a new Crawler
instance, you can
narrow down your node selection by chaining the method calls:
1 2 3 4 5 6 7 8 9 |
$crawler
->filter('h1')
->reduce(function ($node, $i) {
if (!$node->attr('class')) {
return false;
}
})
->first()
;
|
Tip
Use the count()
function to get the number of nodes stored in a Crawler:
count($crawler)
Extracting Information ¶
The Crawler can extract information from the nodes:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
// returns the attribute value for the first node
$crawler->attr('class');
// returns the node value for the first node
$crawler->text();
// returns the default text if the node does not exist
$crawler->text('Default text content');
// pass TRUE as the second argument of text() to remove all extra white spaces, including
// the internal ones (e.g. " foo\n bar baz \n " is returned as "foo bar baz")
$crawler->text(null, true);
// extracts an array of attributes for all nodes
// (_text returns the node value)
// returns an array for each element in crawler,
// each with the value and href
$info = $crawler->extract(['_text', 'href']);
// executes a lambda for each node and return an array of results
$data = $crawler->each(function ($node, $i) {
return $node->attr('href');
});
|