How to convert partial XML to hash in Ruby -
i have string has plain text , spaces , carriage returns xml-like tags followed xml tags:
string = "hi there. <set-topic> initiate </set-topic> <setprofile> <key>name</key> <value>joe</value> </setprofile> <setprofile> <key>email</key> <value>email@hi.com</value> </setprofile> <get-relations> <collection>goals</collection> <value>walk upstairs</value> </get-relations> think? true? "
i want parse similar use nori or nokogiri or ox convert xml hash.
my goal able pull out top level tags keys , know elements, like:
keys = ['setprofile', 'setprofile', 'set-topic', 'get-object'] values[0] = [{name => joe}, {email => email@hi.com}] values[3] = [{collection => goals}, {value => walk up}]
i have seen several functions true xml of mine partial.
i started going down line of thinking:
parsed = doc.search('*').each_with_object({}) |n, h| (h[n.name] ||= []) << n.text end
i'd along these lines if wanted keys
, values
variables:
require 'nokogiri' string = "hi there. <set-topic> initiate </set-topic> <setprofile> <key>name</key> <value>joe</value> </setprofile> <setprofile> <key>email</key> <value>email@hi.com</value> </setprofile> <get-relations> <collection>goals</collection> <value>walk upstairs</value> </get-relations> think? true? " doc = nokogiri::xml('<root>' + string + '</root>', nil, nil, nokogiri::xml::parseoptions::noblanks) nodes = doc.root.children.reject { |n| n.is_a?(nokogiri::xml::text) }.map { |node| [ node.name, node.children.map { |c| [c.name, c.content] }.to_h ] } nodes # => [["set-topic", {"text"=>" initiate "}], # ["setprofile", {"key"=>"name", "value"=>"joe"}], # ["setprofile", {"key"=>"email", "value"=>"email@hi.com"}], # ["get-relations", {"collection"=>"goals", "value"=>"walk upstairs"}]]
from nodes
it's possible grab rest of detail:
keys = nodes.map(&:first) # => ["set-topic", "setprofile", "setprofile", "get-relations"] values = nodes.map(&:last) # => [{"text"=>" initiate "}, # {"key"=>"name", "value"=>"joe"}, # {"key"=>"email", "value"=>"email@hi.com"}, # {"collection"=>"goals", "value"=>"walk upstairs"}] values[0] # => {"text"=>" initiate "}
if you'd rather, it's possible pre-process dom , remove top-level text:
doc.root.children.select { |n| n.is_a?(nokogiri::xml::text) }.map(&:remove) doc.to_xml # => "<root><set-topic> initiate </set-topic><setprofile><key>name</key><value>joe</value></setprofile><setprofile><key>email</key><value>email@hi.com</value></setprofile><get-relations><collection>goals</collection><value>walk upstairs</value></get-relations></root>\n"
that makes easier work xml.
Comments
Post a Comment