dom - PHP XPath Child Concat And New Line Issues -


i using domxpath query nodes in html document content extract.

i have following html document:

<p class="data">     immediate text     <br>     text in second line     <br>     e-mail:     <script>some script tag</script>     <a href="#">         <script>another script tag</script>         link in third line     </a>     <br>     text in last line </p> 

i receive following result:

immediate text\r\ntext in second line\r\ne-mail: link in third line\r\ntext in last line

so far have following php code:

#... libxml_use_internal_errors(true); $dom = new \domdocument(); if(!$dom->loadhtml($html)) {     #... }  $xpath = \domxpath($dom); $result = $xpath->query("(//p[@class='data'])[1]/text()[not(parent::script)]"); 

problems:

  • it not include child nodes' texts.
  • it not include line breaks.

by using child axis / in /text() you'll direct child of current node context. descendants, use descendant axis (//) instead.

to both text node , <br>, can try using //nodes() axis , filter further node's type -to nodes of type text node- or name -to elements named br- :

(//p[@class='data'])[1]//nodes()[self::text() or self:br][not(parent::script)] 

Comments

Popular posts from this blog

asp.net mvc - SSO between MVCForum and Umbraco7 -

Python Tkinter keyboard using bind -

ubuntu - Selenium Node Not Connecting to Hub, Not Opening Port -