powershell - HtmlAgilityPack get tables based on cell value -


i have 1000+ html documents contain various tables each , using powershell process them.

i looking extract specific tables, these can identified first row, used headings , 1 of cells has word "measurement".

since html .doc export word can nested in <span> or <p> ideally able ignore level of nesting.

i've tried like:

$tables = $doc.documentnode.selectnodes("//table[* = 'measurement']") 

but nothing back.

here's more html, unfortunately cannot post of it, it's ms word export html document:

<table class=msonormaltable border=1 cellspacing=0 cellpadding=0    style='border-collapse:collapse;mso-table-layout-alt:fixed;border:none;    mso-border-alt:double windowtext 1.5pt;mso-padding-alt:0in 5.4pt 0in 5.4pt'> <tr style='mso-yfti-irow:0;mso-yfti-firstrow:yes'>    <td width=192 valign=top style='width:2.0in;border:solid windowtext 1.0pt;       padding:0in 5.4pt 0in 5.4pt'>       <p class=msoheading9><span lang=en-ca>areas</span></p>    </td>    <td width=288 valign=top style='width:3.0in;border:solid windowtext 1.0pt;       border-left:none;mso-border-left-alt:solid windowtext 1.0pt;padding:0in 5.4pt 0in 5.4pt'>       <p class=msoheading9><span lang=en-ca>measurements</span></p>    </td>    <td width=346 valign=top style='width:3.6in;border:solid windowtext 1.0pt;       border-left:none;mso-border-left-alt:solid windowtext 1.0pt;padding:0in 5.4pt 0in 5.4pt'>       <p class=msoheading9><span lang=en-ca>objectives</span></p>    </td> </tr> 

without further information or sample html markup can suggest use descendant axis // descendant nodes no matter how deep nested within <table> node :

//table[.//* = 'measurement'] 

update :

after looking @ sample html, think there might more efficient way using more specific xpath, example:

//table[tr/td//* = 'measurement'] 

but specific xpath bring more risk of leaving tables supposed selected. decision yours, according entire document structure , how efficiency needed.


Comments

Popular posts from this blog

asp.net mvc - SSO between MVCForum and Umbraco7 -

Python Tkinter keyboard using bind -

ubuntu - Selenium Node Not Connecting to Hub, Not Opening Port -