I've been a working on a little project for a while now, an hpricot-styled ruby libxml library. I started the project in order to learn the ruby c extension api and create an easy to use xml library for ruby. Hpricot is great but it is not a full fledged xml library and isn't intended as such. Libxml has ruby bindings, but they provide the same libxml api which is very un-ruby (imho). Rexml is just plain old slow. So my current hacking has left me with a library capable of loading xml strings/arrays whatever into an object (from what I've seen libxml-ruby doesn't support loading from memory/strings). I can run xpath searches and do xslt. I'm working on cleaning up the api to make it match hpricot and then I'll probably release the parse/read-only version as v0.1 in the next few weeks.

Here's a snippet of benchmark output comparing the different libraries in use (run on a late 2k6 Macbook w/ 2gb of ram):
(in /Users/segfault/Devel/fastxml)
ruby ./benchmarks/speedtest.rb
                     user     system      total        real
fastxml.new      0.000000   0.000000   0.000000 (  0.001102)
fastxml.to_s     0.000000   0.000000   0.000000 (  0.000629)
fastxml.search   0.000000   0.000000   0.000000 (  0.000207)

hpricot.new      0.010000   0.000000   0.010000 (  0.012319)
hpricot.to_s     0.000000   0.000000   0.000000 (  0.003164)
hpricot.search   0.010000   0.000000   0.010000 (  0.000603)

libxml.new       0.000000   0.000000   0.000000 (  0.001287)
libxml.to_s      0.000000   0.000000   0.000000 (  0.000698)
libxml.search    0.000000   0.000000   0.000000 (  0.000073)

REXML.new        0.020000   0.000000   0.020000 (  0.024030)
REXML.to_s       0.010000   0.000000   0.010000 (  0.011971)
REXML.xpath      0.000000   0.000000   0.000000 (  0.001092)

xpath expression: /feed/entry
fastxml nodes: 15
 libxml nodes: 0
hpricot nodes: 15
  REXML nodes: 15

2 Responses to “easy fast ruby libxml interface”

  1. James Kassemi James Kassemi Says:
    I'm really looking forward to this. XPath support in ruby's libxml binding is dismal (I've yet to get it to work when namespaces are involved), and REXML, although perfectly functional, is dog slow. I just stumbled across your Trac for the project... I'll see if I can help out. Take it easy, James
  2. segfault segfault Says:
    James, I'm happy to have any help. I've tried to tackle (read hack) the default namespace handling for XPath searches. Rock on.

Leave a Reply