S-XML is a simple XML parser implemented in Common Lisp. Originally it was written by Sven Van Caekenberghe. It is now being maintained by Sven Van Caekenberghe, Rudi Schlatte and Brian Mastenbrook. S-XML is used by S-XML-RPC and CL-PREVALENCE.
This XML parser implementation has the following features:
This XML parser implementation has the following limitations:
You can download the LLGPL source code and documentation as s-xml.tgz (signature: s-xml.tgz.asc for which the public key can be found in the common-lisp.net keyring) (build and/or install with ASDF).
You can view the CVS Repository or get anonymous CVS access as follows:
$ cvs -d:pserver:anonymous@common-lisp.net:/project/s-xml/cvsroot login (Logging in to anonymous@common-lisp.net) CVS password: anonymous $ cvs -d:pserver:anonymous@common-lisp.net:/project/s-xml/cvsroot co s-xml
The plain API exported by the package S-XML (automatically generated by LispDoc) is available in S-XML.html.
Using a DOM parser is easier, but usually less efficient: see the next sections. To use the event-based API of the parser, you call the function start-parse-xml on a stream, specifying 3 hook functions:
As an example, consider the following tracer that shows how the different hooks are called:
(defun trace-xml-new-element-hook (name attributes seed)
(let ((new-seed (cons (1+ (car seed)) (1+ (cdr seed)))))
(trace-xml-log (car seed)
"(new-element :name ~s :attributes ~:[()~;~:*~s~] :seed ~s) => ~s"
name attributes seed new-seed)
new-seed))
(defun trace-xml-finish-element-hook (name attributes parent-seed seed)
(let ((new-seed (cons (1- (car seed)) (1+ (cdr seed)))))
(trace-xml-log (car parent-seed)
"(finish-element :name ~s :attributes ~:[()~;~:*~s~] :parent-seed ~s :seed ~s) => ~s"
name attributes parent-seed seed new-seed)
new-seed))
(defun trace-xml-text-hook (string seed)
(let ((new-seed (cons (car seed) (1+ (cdr seed)))))
(trace-xml-log (car seed)
"(text :string ~s :seed ~s) => ~s"
string seed new-seed)
new-seed))
(defun trace-xml (in)
"Parse and trace a toplevel XML element from stream in"
(start-parse-xml in
(make-instance 'xml-parser-state
:seed (cons 0 0)
;; seed car is xml element nesting level
;; seed cdr is ever increasing from element to element
:new-element-hook #'trace-xml-new-element-hook
:finish-element-hook #'trace-xml-finish-element-hook
:text-hook #'trace-xml-text-hook)))
This is the output of the tracer on two small XML documents, the seed is a CONS that keeps track of the nesting level in its CAR and of its flow through the hooks with an ever increasing number is its CDR:
S-XML 31 > (with-input-from-string (in "<FOO X='10' Y='20'><P>Text</P><BAR/><H1><H2></H2></H1></FOO>") (trace-xml in))
(new-element :name :FOO :attributes ((:Y . "20") (:X . "10")) :seed (0 . 0)) => (1 . 1)
(new-element :name :P :attributes () :seed (1 . 1)) => (2 . 2)
(text :string "Text" :seed (2 . 2)) => (2 . 3)
(finish-element :name :P :attributes () :parent-seed (1 . 1) :seed (2 . 3)) => (1 . 4)
(new-element :name :BAR :attributes () :seed (1 . 4)) => (2 . 5)
(finish-element :name :BAR :attributes () :parent-seed (1 . 4) :seed (2 . 5)) => (1 . 6)
(new-element :name :H1 :attributes () :seed (1 . 6)) => (2 . 7)
(new-element :name :H2 :attributes () :seed (2 . 7)) => (3 . 8)
(finish-element :name :H2 :attributes () :parent-seed (2 . 7) :seed (3 . 8)) => (2 . 9)
(finish-element :name :H1 :attributes () :parent-seed (1 . 6) :seed (2 . 9)) => (1 . 10)
(finish-element :name :FOO :attributes ((:Y . "20") (:X . "10")) :parent-seed (0 . 0) :seed (1 . 10)) => (0 . 11)
(0 . 11)
S-XML 32 > (with-input-from-string (in "<FOO><UL><LI>1</LI><LI>2</LI><LI>3</LI></UL></FOO>") (trace-xml in))
(new-element :name :FOO :attributes () :seed (0 . 0)) => (1 . 1)
(new-element :name :UL :attributes () :seed (1 . 1)) => (2 . 2)
(new-element :name :LI :attributes () :seed (2 . 2)) => (3 . 3)
(text :string "1" :seed (3 . 3)) => (3 . 4)
(finish-element :name :LI :attributes () :parent-seed (2 . 2) :seed (3 . 4)) => (2 . 5)
(new-element :name :LI :attributes () :seed (2 . 5)) => (3 . 6)
(text :string "2" :seed (3 . 6)) => (3 . 7)
(finish-element :name :LI :attributes () :parent-seed (2 . 5) :seed (3 . 7)) => (2 . 8)
(new-element :name :LI :attributes () :seed (2 . 8)) => (3 . 9)
(text :string "3" :seed (3 . 9)) => (3 . 10)
(finish-element :name :LI :attributes () :parent-seed (2 . 8) :seed (3 . 10)) => (2 . 11)
(finish-element :name :UL :attributes () :parent-seed (1 . 1) :seed (2 . 11)) => (1 . 12)
(finish-element :name :FOO :attributes () :parent-seed (0 . 0) :seed (1 . 12)) => (0 . 13)
(0 . 13)
The following example counts tags, attributes and characters:
(defclass count-xml-seed ()
((elements :initform 0)
(attributes :initform 0)
(characters :initform 0)))
(defun count-xml-new-element-hook (name attributes seed)
(declare (ignore name))
(incf (slot-value seed 'elements))
(incf (slot-value seed 'attributes) (length attributes))
seed)
(defun count-xml-text-hook (string seed)
(incf (slot-value seed 'characters) (length string))
seed)
(defun count-xml (in)
"Parse a toplevel XML element from stream in, counting elements, attributes and characters"
(start-parse-xml in
(make-instance 'xml-parser-state
:seed (make-instance 'count-xml-seed)
:new-element-hook #'count-xml-new-element-hook
:text-hook #'count-xml-text-hook)))
(defun count-xml-file (pathname)
"Parse XMl from the file at pathname, counting elements, attributes and characters"
(with-open-file (in pathname)
(let ((result (count-xml in)))
(with-slots (elements attributes characters) result
(format t
"~a contains ~d XML elements, ~d attributes and ~d characters.~%"
pathname elements attributes characters)))))
This example removes XML markup:
(defun remove-xml-markup (in)
(let* ((state (make-instance 'xml-parser-state
:text-hook #'(lambda (string seed) (cons string seed))))
(result (start-parse-xml in state)))
(apply #'concatenate 'string (nreverse result))))
The next example is from the xml-element struct DOM implementation, where the SSAX parser hook functions are building the actual DOM:
(defun standard-new-element-hook (name attributes seed)
(declare (ignore name attributes seed))
'())
(defun standard-finish-element-hook (name attributes parent-seed seed)
(let ((xml-element (make-xml-element :name name
:attributes attributes
:children (nreverse seed))))
(cons xml-element parent-seed)))
(defun standard-text-hook (string seed)
(cons string seed))
(defmethod parse-xml-dom (stream (output-type (eql :xml-struct)))
(car (start-parse-xml stream
(make-instance 'xml-parser-state
:new-element-hook #'standard-new-element-hook
:finish-element-hook #'standard-finish-element-hook
:text-hook #'standard-text-hook))))
The parse state can be used to specify the initial seed value (nil by default), and the set of known entities (the 5 standard entities (lt, gt, amp, qout, apos) and nbps by default).
Using a DOM parser is easier, but usually less efficient. Currently three different DOM's are supported:
There is a generic API that is identical for each type of DOM, with an extra parameter input-type or output-type used to specify the type of DOM. The default DOM type is :lxml. Here are some examples:
? (in-package :s-xml)
#<Package "S-XML">
? (setf xml-string "<foo id='top'><bar>text</bar></foo>")
"<foo id='top'><bar>text</bar></foo>"
? (parse-xml-string xml-string)
((:|foo| :|id| "top") (:|bar| "text"))
? (parse-xml-string xml-string :output-type :sxml)
(:|foo| (:@ (:|id| "top")) (:|bar| "text"))
? (parse-xml-string xml-string :output-type :xml-struct)
#S(XML-ELEMENT :NAME :|foo| :ATTRIBUTES ((:|id| . "top"))
:CHILDREN (#S(XML-ELEMENT :NAME :|bar|
:ATTRIBUTES NIL
:CHILDREN ("text"))))
? (print-xml * :pretty t :input-type :xml-struct)
<foo id="top">
<bar>text</bar>
</foo>
NIL
? (print-xml '(p "Interesting stuff at " ((a href "http://slashdot.org") "SlashDot")))
<P>Interesting stuff at <A HREF="http://slashdot.org">SlashDot</A></P>
NIL
Tag and attribute names are converted to keywords. Note that XML is case-sensitive, hence the fact that Common Lisp has to resort to the special literal symbol syntax.
2005-02-03 Sven Van Caekenberghe <svc@mac.com>
* release 5 (cvs tag RELEASE_5)
* added :start and :end keywords to print-string-xml
* fixed a bug: in a tag containing whitespace, like <foo> </foo> the parser collapsed
and ingnored all whitespace and considered the tag to be empty!
this is now fixed and a unit test has been added
* cleaned up xml character escaping a bit: single quotes and all normal whitespace
(newline, return and tab) is preserved a unit test for this has been added
* IE doesn't understand the ' XML entity, so I've commented that out for now.
Also, using actual newlines for newlines is probably better than using #xA,
which won't get any end of line conversion by the server or user agent.
June 2004 Sven Van Caekenberghe <svc@mac.com>
* release 4
* project moved to common-lisp.net, renamed to s-xml,
* added examples counter, tracer and remove-markup, improved documentation
13 Jan 2004 Sven Van Caekenberghe <svc@mac.com>
* release 3
* added ASDF systems
* optimized print-string-xml
10 Jun 2003 Sven Van Caekenberghe <svc@mac.com>
* release 2
* added echo-xml function: we are no longer taking the car when
the last seed is returned from start-parse-xml
25 May 2003 Sven Van Caekenberghe <svc@mac.com>
* release 1
* first public release of working code
* tested on OpenMCL
* rewritten to be event-based, to improve efficiency and
to optionally use different DOM representations
* more documentation
end of 2002 Sven Van Caekenberghe <svc@mac.com>
* release 0
* as part of an XML-RPC implementation
CVS version $Id: index.html,v 1.10 2005/02/03 08:36:05 scaekenberghe Exp $