6.42. Ruby XML, XSLT and XPath tutorials

6.42.1. What is XML? #

XML refers to Extensible markup language.

Extensible markup language, a subset of the standard generic markup language, a markup language for tagging electronic documents to make them structural.

It can be used to mark data and define data types. It is a source language that allows users to define their own markup language. It is ideal for Worldwide Web transmission and provides a unified way to describe and exchange structured data independent of applications or vendors.

For more information, please see our XML tutorial.

6.42.2. XML parser structure and API #

The main parsers for XML are DOM and SAX.

The SAX parser is event-based and needs to scan the XML document from beginning to end. In the scanning process, each time a syntax structure is encountered, the event handler of this particular syntax structure is calledto send an event to the application.
DOM is the parsing of the document object model, the hierarchical syntax structure of the document is built, and the DOM tree is established in memory. The nodes of the DOM tree are identified in the form of objects. After the document parsing is completed, the entire DOM tree of the documentwill be placed in memory.

6.42.3. Parsing and creating XML in Ruby #

This library REXML library can be used for parsing XML documents in RUBY.

The REXML library is an XML toolkit for Ruby, written in the pure Ruby language and compliant with the XML 1.0 specification.

In Ruby version 1.8 and beyond, REXML will be included in the RUBY standard library.

The path to the REXML library is: rexml/document

All methods and classes are encapsulated in a REXML module.

REXML parsers have the following advantages over other parsers:

100% written by Ruby.
Applicable to SAX and DOM parsers.
It’s lightweight, less than 2000 lines of code.
Methods and classes that are easy to understand.
Based on SAX2 API and full XPath support.
Install using Ruby instead of a separate installation.

The following is the XML code for the example, saved as movies.xml :

<collectionshelf="New Arrivals"><movietitle="Enemy Behind"><type>War,
Thriller</type><format>DVD</format><year>2003</year><rating>PG</rating><stars>10</stars><description>Talk
about a US-Japan
war</description></movie><movietitle="Transformers"><type>Anime, Science
Fiction</type><format>DVD</format><year>1989</year><rating>R</rating><stars>8</stars><description>A
schientific
fiction</description></movie><movietitle="Trigun"><type>Anime,
Action</type><format>DVD</format><episodes>4</episodes><rating>PG</rating><stars>10</stars><description>Vash
the
Stampede!</description></movie><movietitle="Ishtar"><type>Comedy</type><format>VHS</format><rating>PG</rating><stars>2</stars><description>Viewable
boredom</description></movie></collection>

6.42.4. DOM parser #

Let’s parse the XML data first. First, let’s introduce the rexml/document library. Usually we can introduce REXML in the top-level namespace:

Example #

#!/usr/bin/ruby
-wrequire'rexml/document'includeREXMLxmlfile=File.new("movies.xml")xmldoc=Document.new(xmlfile)#obtain
root element root=xmldoc.rootputs"Root element
:"+root.attributes["shelf"]#The following will output the movie title xmldoc.elements.each("collection/movie"){
\|e\|puts"Movie Title
:"+e.attributes["title"]}#The following will output all movie types xmldoc.elements.each("collection/movie/type"){
\|e\|puts"Movie Type
:"+e.text}#The following will output all movie descriptions xmldoc.elements.each("collection/movie/description"){
\|e\|puts"Movie Description :"+e.text}

The output result of the above example is:

Root element : New Arrivals
Movie Title : Enemy Behind
Movie Title : Transformers
Movie Title : Trigun
Movie Title : Ishtar
Movie Type : War, Thriller
Movie Type : Anime, Science Fiction
Movie Type : Anime, Action
Movie Type : Comedy
Movie Description : Talk about a US-Japan war
Movie Description : A schientific fiction
Movie Description : Vash the Stampede!
Movie Description : Viewable boredom
SAX-like Parsing:

6.42.5. SAX parser #

Deal with the same data file: accounes.xml` . It is not recommended to resolve SAX to a small file. Here is a simple example:

Example #

#!/usr/bin/ruby
-wrequire'rexml/document'require'rexml/streamlistener'includeREXMLclassMyListenerincludeREXML::StreamListenerdeftag_start(*args)puts"tag_start:
#{args.map {|x\| x.inspect}.join(',
')}"enddeftext(data)returnifdata=~/^\\w*$/#whitespace
onlyabbrev=data[0..40]+(data.length>40?"...":"")puts"text :
#{abbrev.inspect}"endendlist=MyListener.newxmlfile=File.new("movies.xml")Document.parse_stream(xmlfile,list)

The above output is as follows:

tag_start: "collection", {"shelf"=>"New Arrivals"}
tag_start: "movie", {"title"=>"Enemy Behind"}
tag_start: "type", {}
  text   :   "War, Thriller"
tag_start: "format", {}
tag_start: "year", {}
tag_start: "rating", {}
tag_start: "stars", {}
tag_start: "description", {}
  text   :   "Talk about a US-Japan war"
tag_start: "movie", {"title"=>"Transformers"}
tag_start: "type", {}
  text   :   "Anime, Science Fiction"
tag_start: "format", {}
tag_start: "year", {}
tag_start: "rating", {}
tag_start: "stars", {}
tag_start: "description", {}
  text   :   "A schientific fiction"
tag_start: "movie", {"title"=>"Trigun"}
tag_start: "type", {}
  text   :   "Anime, Action"
tag_start: "format", {}
tag_start: "episodes", {}
tag_start: "rating", {}
tag_start: "stars", {}
tag_start: "description", {}
  text   :   "Vash the Stampede!"
tag_start: "movie", {"title"=>"Ishtar"}
tag_start: "type", {}
tag_start: "format", {}
tag_start: "rating", {}
tag_start: "stars", {}
tag_start: "description", {}
  text   :   "Viewable boredom"

6.42.6. XPath and Ruby #

We can use XPath to view XML, a language for finding information in XML documents (see: XPath tutorial).

XPath is the XML path language, which is a language used to determine the location of a part of an XML (a subset of the standard general markup language) document. XPath is based on the tree structure of XML and providesthe ability to find nodes in the data structure tree.

Ruby supports XPath through REXML’s XPath class, which is a tree-based analysis (document object model).

Example #

#!/usr/bin/ruby
-wrequire'rexml/document'includeREXMLxmlfile=File.new("movies.xml")xmldoc=Document.new(xmlfile)#
Information about the first movie movie=XPath.first(xmldoc,"//movie")pmovie#Print all movie types XPath.each(xmldoc,"//type"){
\|e\|putse.text}#Obtain the types of all movie formats and return an array names=XPath.match(xmldoc,"//format").map{\|x\|x.text}pnames

The output result of the above example is:

<movie title='Enemy Behind'> ... </>
War, Thriller
Anime, Science Fiction
Anime, Action
Comedy
["DVD", "DVD", "DVD", "VHS"]

6.42.7. XSLT and Ruby #

There are two XSLT parsers in Ruby, which are briefly described below:

6.42.8. Ruby-Sablotron #

This parser is created by Justice Masayoshi Takahash write and maintain. This is mainly written for the Linux operating system and requiresthe following libraries:

Sablot
Iconv
Expat

You can do it in the Ruby-Sablotron Find these libraries.

6.42.9. XSLT4R #

XSLT4R is written by Michael Neumann. XSLT4R is used for simple command-line interaction and can be used by third-party applications to transform XML documents.