dom - PHP XPath Child Concat And New Line Issues -
i using domxpath query nodes in html document content extract.
i have following html document:
<p class="data"> immediate text <br> text in second line <br> e-mail: <script>some script tag</script> <a href="#"> <script>another script tag</script> link in third line </a> <br> text in last line </p>
i receive following result:
immediate text\r\ntext in second line\r\ne-mail: link in third line\r\ntext in last line
so far have following php code:
#... libxml_use_internal_errors(true); $dom = new \domdocument(); if(!$dom->loadhtml($html)) { #... } $xpath = \domxpath($dom); $result = $xpath->query("(//p[@class='data'])[1]/text()[not(parent::script)]");
problems:
- it not include child nodes' texts.
- it not include line breaks.
by using child axis /
in /text()
you'll direct child of current node context. descendants, use descendant axis (//
) instead.
to both text node , <br>
, can try using //nodes()
axis , filter further node's type -to nodes of type text node- or name -to elements named br
- :
(//p[@class='data'])[1]//nodes()[self::text() or self:br][not(parent::script)]
Comments
Post a Comment