The DOM Crawler ¶

A Crawler instance is returned each time you make a request with the Client. It allows you to traverse HTML or XML documents: select nodes, find links and forms, and retrieve attributes or contents.

クライアントでリクエストを行うたびに、Crawler インスタンスが返されます。これにより、HTML または XML ドキュメントをトラバースできます。つまり、ノードの選択、リンクとフォームの検索、および属性またはコンテンツの取得を行うことができます。

Traversing ¶

Like jQuery, the Crawler has methods to traverse the DOM of an HTML/XML document. For example, the following finds all input[type=submit] elements, selects the last one on the page, and then selects its immediate parent element:

jQuery と同様に、Crawler には HTML/XML ドキュメントの DOM をトラバースするメソッドがあります。たとえば、次の例ではすべての input[type=submit] 要素を検索し、ページの最後の要素を選択してから、その直接の親要素を選択します。

    
                        $newCrawler = $crawler->filter('input[type=submit]')
    ->last()
    ->parents()
    ->first()
;

Many other methods are also available:

他の多くの方法も利用できます。

filter('h1.title'): Nodes that match the CSS selector.
CSS セレクターに一致するノード。
filterXpath('h1'): Nodes that match the XPath expression.
XPath 式に一致するノード。
eq(1): Node for the specified index.
指定されたインデックスのノード。
first(): First node.
最初のノード。
last(): Last node.
最後のノード。
siblings(): Siblings.
兄弟。
nextAll(): All following siblings.
以下のすべての兄弟。
previousAll(): All preceding siblings.
先行するすべての兄弟。
parents(): Returns the parent nodes.
親ノードを返します。
children(): Returns children nodes.
子ノードを返します。
reduce($lambda): Nodes for which the callable does not return false.
callable が false を返さないノード。

Since each of these methods returns a new Crawler instance, you can narrow down your node selection by chaining the method calls:

これらのメソッドはそれぞれ新しい Crawler インスタンスを返すため、メソッド呼び出しを連鎖させることでノードの選択を絞り込むことができます。

    
                        $crawler
    ->filter('h1')
    ->reduce(function ($node, $i) {
        if (!$node->attr('class')) {
            return false;
        }
    })
    ->first()
;

Tip

ヒント

Use the count() function to get the number of nodes stored in a Crawler: count($crawler)

Crawler:count($crawler) に格納されているノードの数を取得するには、count() 関数を使用します。

Extracting Information ¶

The Crawler can extract information from the nodes:

クローラーは、ノードから情報を抽出できます。

    
                        // returns the attribute value for the first node
$crawler->attr('class');

// returns the node value for the first node
$crawler->text();

// returns the default text if the node does not exist
$crawler->text('Default text content');

// pass TRUE as the second argument of text() to remove all extra white spaces, including
// the internal ones (e.g. "  foo\n  bar    baz \n " is returned as "foo bar baz")
$crawler->text(null, true);

// extracts an array of attributes for all nodes
// (_text returns the node value)
// returns an array for each element in crawler,
// each with the value and href
$info = $crawler->extract(['_text', 'href']);

// executes a lambda for each node and return an array of results
$data = $crawler->each(function ($node, $i) {
    return $node->attr('href');
});