module Xml_lexer:sig..end
ocamllex lexer for XML files. It only
supports the most basic features of the XML specification.
The lexer altogether ignores the following 'events': comments, processing instructions, XML prolog and doctype declaration.
The predefined entities (&, <, etc.) are supported. The
replacement text for other entities whose entity value consist of
character data can be provided to the lexer (see
Xml_lexer.entities). Internal entities declarations are not
taken into account (the lexer just skips the doctype declaration).
CDATA sections and character references are supported.
See Xml_lexer.strip_ws about whitespace handling.
This module provides an ocamllex lexer for XML files. It only
supports the most basic features of the XML specification.
The lexer altogether ignores the following 'events': comments, processing instructions, XML prolog and doctype declaration.
The predefined entities (&, <, etc.) are supported. The
replacement text for other entities whose entity value consist of
character data can be provided to the lexer (see
Xml_lexer.entities). Internal entities declarations are not
taken into account (the lexer just skips the doctype declaration).
CDATA sections and character references are supported.
See Xml_lexer.strip_ws about whitespace handling.
type error =
| |
Illegal_character of |
| |
Bad_entity of |
| |
Unterminated of |
| |
Tag_expected |
| |
Attribute_expected |
| |
Other of |
val error_string : error -> string
exception Error of error * int
int argument indicates the character position in
the buffer. Note that some non-conforming XML documents might not
trigger an error.type token =
| |
Tag of |
(* | Tag (name, attributes, empty) denotes an opening tag
with the specified name and attributes. If empty,
then the tag ended in "/>", meaning that it has no
sub-elements. | *) |
| |
Chars of |
(* |
Some text between the tags
| *) |
| |
Endtag of |
(* |
A closing tag
| *) |
| |
EOF |
(* |
End of input
| *) |
val strip_ws : bool Pervasives.refstrip_ws is true (the default),
whitespaces next to a tag are ignored. Character data consisting
only of whitespaces is thus suppressed (i.e. Chars "" tokens are
skipped).val entities : (string * string) list Pervasives.ref ["amp", "&"; "lt", "<" ...] ).val token : Lexing.lexbuf -> tokenError in case of an invalid XML document