(in
This document has been reviewed by W3C Members and other interested parties and has been endorsed by the Director as a W3C Recommendation. It is a stable document and may be used as reference material or cited as a normative reference from another document. W3C's role in making the Recommendation is to draw attention to the specification and to promote its widespread deployment. This enhances the functionality and interoperability of the Web.
This document has been produced by the
This version of this document incorporates some editorial changes from earlier versions.
Please report errors in this document to
The English version of this specification is the only normative
version. Information about translations of this document is available
at
A list of current W3C Recommendations and other technical documents can be found at
The
The table below offers two typical examples of XML instances in which datatypes are implicit: the instance on the left represents a billing invoice, the instance on the right a memo or perhaps an email message in XML.
Data oriented | Document oriented |
---|---|
|
|
The invoice contains several dates and telephone numbers, the postal abbreviation for a state (which comes from an enumerated list of sanctioned values), and a ZIP code (which takes a definable regular form). The memo contains many of the same types of information: a date, telephone number, email address and an "importance" value (from an enumerated list, such as "low", "medium" or "high"). Applications which process invoices and memos need to raise exceptions if something that was supposed to be a date or telephone number does not conform to the rules for valid dates or telephone numbers.
In both cases, validity constraints exist on the content of the instances that are not expressible in XML DTDs. The limited datatyping facilities in XML have prevented validating XML processors from supplying the rigorous type checking required in these situations. The result has been that individual applications writers have had to implement type checking in an ad hoc manner. This specification addresses the need of both document authors and applications writers for a robust, extensible datatype system for XML which could be incorporated into XML processors. As discussed below, these datatypes could be used in other XML-related standards as well.
The
provide for primitive data typing, including byte, date, integer, sequence, SQL and Java primitive datatypes, etc.;
define a type system that is adequate for import/export from database systems (e.g., relational, object, OLAP);
distinguish requirements relating to lexical data representation vs. those governing an underlying information set;
allow creation of user-defined datatypes, such as datatypes that are derived from existing datatypes and which may constrain certain of its properties (e.g., range, precision, length, format).
This portion of the XML Schema Language discusses datatypes that can be
used in an XML Schema. These datatypes can be specified for element
content that would be specified as
The terminology used to describe XML Schema Datatypes is defined in the body of this specification. The terms defined in the following list are used in building those definitions and in describing the actions of a datatype processor:
A feature of this specification included solely to ensure that schemas
which use this feature remain compatible with
Conforming documents and processors are permitted to but need not behave as described.
(Of strings or names:) Two strings or names being compared must be identical. Characters with multiple possible representations in ISO/IEC 10646 (e.g. characters with both precomposed and base+diacritic forms) match only if they have the same representation in both strings. No case folding is performed. (Of strings and rules in the grammar:) A string matches a grammatical production if it belongs to the language generated by that production.
Conforming documents and processors are required to behave as
described; otherwise they are in
A violation of the rules of this specification; results are undefined.
Conforming software
This specification provides three different kinds of normative statements about schema components, their representations in XML and their contribution to the schema-validation of information items:
Constraints on the schema components themselves, i.e. conditions
components
Constraints on the representation of schema components in XML. Some but
not all of these are expressed in
Constraints expressed by schema components which information
items
This section describes the conceptual framework behind the type system
defined in this specification. The framework has been influenced by the
The datatypes discussed in this specification are computer
representations of well known abstract concepts such as
The
defined axiomatically from fundamental notions (intensional definition)
[see
enumerated outright (extensional definition)
[see
defined by restricting the
defined as a combination of values from one or more already defined
In addition to its
For example, "100" and "1.0E2" are two different literals from the
The literals in the
The number of literals for each value has been kept small; for many datatypes there is a one-to-one mapping between literals and values. This makes it easy to exchange the values between different systems. In many cases, conversion from locale-dependent representations will be required on both the originator and the recipient side, both for computer processing and for interaction with humans.
Textual, rather than binary, literals are used. This makes hand editing, debugging, and similar activities possible.
Where possible, literals correspond to those found in common programming languages and libraries.
While the datatypes defined in this specification have, for the most part,
a single lexical representation i.e. each value in the datatype's
The facets of a datatype serve to distinguish those aspects of
one datatype which
Facets are of two types:
All
Constraining the
All
It is useful to categorize the datatypes defined in this specification along various dimensions, forming a set of characterization dichotomies.
The first distinction to be made is that between
For example, a single token which
Several type systems (such as the one described in
A
In the above example, the value of the
When a datatype is
For each of
The
The
A prototypical example of a
Any number (greater than 1) of
The order in which the
For example, given the definition below, the first instance of the <size> element
validates correctly as an
The
A datatype which is
Next, we distinguish between
For example, in this specification,
The datatypes defined by this specification fall into both
the
In the example above,
A datatype which is
As described in more detail in
A
One datatype can be
Conceptually there is no difference between the
A datatype which is
Each built-in datatype in this specification (both
the base URI is the URI of the XML Schema namespace
the fragment identifier is the name of the datatype
For example, to address the
http://www.w3.org/2001/XMLSchema#int
Additionally, each facet definition element can be uniquely addressed via a URI constructed as follows:
the base URI is the URI of the XML Schema namespace
the fragment identifier is the name of the facet
For example, to address the maxInclusive facet, the URI is:
http://www.w3.org/2001/XMLSchema#maxInclusive
Additionally, each facet usage in a built-in datatype definition can be uniquely addressed via a URI constructed as follows:
the base URI is the URI of the XML Schema namespace
the fragment identifier is the name of the datatype, followed by a period (".") followed by the name of the facet
For example, to address the usage of the maxInclusive facet in the definition of int, the URI is:
http://www.w3.org/2001/XMLSchema#int.maxInclusive
The
http://www.w3.org/2001/XMLSchema
To facilitate usage in specifications other than the &schema-language;,
such as those that do not want to know anything about aspects of the
&schema-language; other than the datatypes, each
http://www.w3.org/2001/XMLSchema-datatypes
This applies to both
Each
The
Many human languages have writing systems that require
child elements for control of aspects such as bidirectional formating or
ruby annotation (see
As noted in
An instance of a datatype that is defined as
The canonical representation for
All
-1.23, 12678967.543233, +100000.00, 210
.
The canonical representation for
A literal in the
The 0
,
-0
, INF
, -INF
and
NaN
, respectively.
For example, -1E4, 1267.43233E12, 12.78e-2, 12 and INF
are all legal literals for
The canonical representation for
A literal in the
The 0
,
-0
, INF
, -INF
and
NaN
, respectively.
For example, -1E4, 1267.43233E12, 12.78e-2, 12 and INF
are all legal literals for
The canonical representation for
The lexical representation for
The values of the
Year, Month, Day, Hour and Minutes components are not restricted but
allow an arbitrary integer. Similarly, the value of the Seconds component
allows an arbitrary decimal. Thus, the lexical representation of
An optional preceding minus sign ('-') is
allowed, to indicate a negative duration. If the sign is omitted a
positive duration is indicated. See also
For example, to indicate a duration of 1 year, 2 months, 3 days, 10
hours, and 30 minutes, one would write: P1Y2M3DT10H30M
.
One could also indicate a duration of minus 120 days as:
-P120D
.
Reduced precision and truncated representations of this format are allowed provided they conform to the following:
If the number of years, months, days, hours, minutes, or seconds in any
expression equals zero, the number and its corresponding designator
The seconds part
The designator 'T' shall be absent if all of the time items are absent. The designator 'P' must always be present.
For example, P1347Y, P1347M and P1Y2MT2H are all allowed; P0Y1347M and P0Y1347M0D are allowed. P-1347M is not allowed although -P1347M is allowed. P1Y2MT is not allowed.
In general, the 1696-09-01T00:00:00Z 1697-02-01T00:00:00Z 1903-03-01T00:00:00Z 1903-07-01T00:00:00Z
The following table shows the strongest relationship that can be determined
between example durations. The symbol <> means that the order relation is
indeterminate. Note that because of leap-seconds, a seconds field can vary
from 59 to 60. However, because of the way that addition is defined in
Relation | |||||||
---|---|---|---|---|---|---|---|
P1Y | > P364D | <> P365D | <> P366D | < P367D | |||
P1M | > P27D | <> P28D | <> P29D | <> P30D | <> P31D | < P32D | |
P5M | > P149D | <> P150D | <> P151D | <> P152D | <> P153D | < P154D |
Implementations are free to optimize the computation of the ordering relationship. For example, the following table can be used to compare durations of a small number of months against days.
Months | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | ... | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Days | Minimum | 28 | 59 | 89 | 120 | 150 | 181 | 212 | 242 | 273 | 303 | 334 | 365 | 393 | ... |
Maximum | 31 | 62 | 92 | 123 | 153 | 184 | 215 | 245 | 276 | 306 | 337 | 366 | 397 | ... |
In comparing
Certain derived datatypes of durations can be guaranteed have a total order. For this, they must have fields from only one row in the list below and the time zone must either be required or prohibited.
year, month
day, hour, minute, second
For example, a datatype could be defined to correspond to the
A single lexical representation, which is a subset of the lexical
representations allowed by
The CCYY field must have at least four digits, the MM, DD, SS, hh, mm and ss fields exactly two digits each (not counting fractional seconds); leading zeroes must be used if the field would otherwise have too few digits.
This representation may be immediately followed by a "Z" to indicate
Coordinated Universal Time (UTC) or, to indicate the time zone, i.e. the
difference between the local time and Coordinated Universal Time,
immediately followed by a sign,
+ or -, followed by the difference from UTC represented as hh:mm (note:
the minutes part is required).
See
For example, to indicate 1:20 pm on May the 31st, 1999 for Eastern
Standard Time which is 5 hours behind Coordinated Universal Time (UTC), one
would write: 1999-05-31T13:20:00-05:00
.
The canonical representation for
In general, the
The following definition uses the notation S[year] to represent the year
field of S, S[month] to represent the month field, and so on. The notation (Q
& "-14:00") means adding the timezone -14:00 to Q, where Q did not
already have a timezone.
The ordering between two
A.Normalize P and Q. That is, if there is a timezone present, but
it is not Z, convert it to Z using the addition operation defined in
Thus 2000-03-04T23:00:00+03:00 normalizes to 2000-03-04T20:00:00Z
B. If P and Q either both have a time zone or both do not have a time zone, compare P and Q field by field from the year field down to the second field, and return a result as soon as it can be determined. That is:
For each i in {year, month, day, hour, minute, second}
If P[i] and Q[i] are both not specified, continue to the next i If P[i] is not specified and Q[i] is, or vice versa, stop and return
P <> Q If P[i] < Q[i], stop and return P < Q If P[i] > Q[i], stop and return P > Q
Stop and return P = Q
C.Otherwise, if P contains a time zone and Q does not, compare as follows:
P < Q if P < (Q with time zone +14:00)
P > Q if P > (Q with time zone -14:00)
P <> Q otherwise, that is, if (Q with time zone +14:00) < P < (Q with time zone -14:00)
D. Otherwise, if P does not contain a time zone and Q does, compare as follows:
P < Q if (P with time zone -14:00) < Q.
P > Q if (P with time zone +14:00) > Q.
P <> Q otherwise, that is, if (P with time zone +14:00) < Q < (P with time zone -14:00)
Examples:
Determinate | Indeterminate |
---|---|
2000-01-15T00:00:00 < 2000-02-15T00:00:00 | 2000-01-01T12:00:00 <> 1999-12-31T23:00:00Z |
2000-01-15T12:00:00 < 2000-01-16T12:00:00Z | 2000-01-16T12:00:00 <> 2000-01-16T12:00:00Z |
2000-01-16T00:00:00 <> 2000-01-16T12:00:00Z |
Certain derived types from
Since the lexical representation allows an optional time zone
indicator,
The lexical representation for
The canonical representation for
Since the lexical representation allows an optional time zone
indicator,
The lexical representation for
For example, to indicate May the 31st, 1999, one would write: 1999-05-31.
See also
Since the lexical representation allows an optional time zone
indicator,
Because month/year combinations in one calendar only rarely correspond to month/year combinations in other calendars, values of this type are not, in general, convertible to simple values corresponding to month/year combinations in other calendars. This type should therefore be used with caution in contexts where conversion to other calendars is desired.
The lexical representation for
For example, to indicate the month of May 1999, one would write: 1999-05.
See also
Since the lexical representation allows an optional time zone
indicator,
Because years in one calendar only rarely correspond to years in other calendars, values of this type are not, in general, convertible to simple values corresponding to years in other calendars. This type should therefore be used with caution in contexts where conversion to other calendars is desired.
The lexical representation for
For example, to indicate 1999, one would write: 1999.
See also
Since the lexical representation allows an optional time zone
indicator,
Because day/month combinations in one calendar only rarely correspond to day/month combinations in other calendars, values of this type do not, in general, have any straightforward or intuitive representation in terms of most other calendars. This type should therefore be used with caution in contexts where conversion to other calendars is desired.
The lexical representation for
This datatype can be used to represent a specific day in a month. To say, for example, that my birthday occurs on the 14th of September ever year.
This datatype can be used to represent a specific day of the month. To say, for example, that I get my paycheck on the 15th of each month.
Since the lexical representation allows an optional time zone
indicator,
Because days in one calendar only rarely correspond to days in other calendars, values of this type do not, in general, have any straightforward or intuitive representation in terms of most other calendars. This type should therefore be used with caution in contexts where conversion to other calendars is desired.
The lexical representation for
This datatype can be used to represent a specific month. To say, for example, that Thanksgiving falls in the month of November.
Since the lexical representation allows an optional time zone
indicator,
Because months in one calendar only rarely correspond to months in other calendars, values of this type do not, in general, have any straightforward or intuitive representation in terms of most other calendars. This type should therefore be used with caution in contexts where conversion to other calendars is desired.
The lexical representation for
The canonical representation for
The mapping from
Each URI scheme imposes specialized syntax rules for URIs in
that scheme, including restrictions on the syntax of allowed fragement
identifiers. Because it is
impractical for processors to check that a value is a
context-appropriate URI reference, this specification follows the
lead of
The
Spaces are, in principle, allowed in the
The mapping between literals in the
It is an
For compatibility (see
This section gives conceptual definitions for all
For compatibility (see
For compatibility (see
For compatibility (see
For compatibility (see
For compatibility (see
The
For compatibility (see
The
For compatibility (see
The canonical representation for
The canonical representation for
The canonical representation for
The canonical representation for
The canonical representation for
The canonical representation for
The canonical representation for
The canonical representation for
The canonical representation for
The canonical representation for
The canonical representation for
The canonical representation for
The canonical representation for
The following sections provide full details on the properties and
significance of each kind of schema component involved in datatype
definitions. For each property, the kinds of values it is allowed to have is
specified. Any property not identified as optional is required to
be present; optional properties which are not present have
For more information on the notion of datatype (schema) components,
see
Simple Type definitions provide for:
Establishing the
Attaching a unique name (actually a
The Simple Type Definition schema component has the following properties:
Datatypes are identified by their
If
If
The value of
The value of
If
The XML representation for a
name
&i-attribute;, if present,
otherwise final
&i-attribute;, if present, otherwise of the &v-value; of the
finalDefault
&i-attribute; the ancestor
the empty set;
a set with members drawn from the set above, each being present or absent depending on whether the string contains an equivalently named space-delimited substring.
Although the finalDefault
&i-attribute; of
targetNamespace
&i-attribute;
of the parent schema
element information item.
A
base
&i-attribute; or the
An electronic commerce schema might define a datatype called
In this case,
itemType
&i-attribute;
or the
A
A system might want to store lists of floating point values.
In this case,
As mentioned in
regardless of the
For each of
memberTypes
&i-attribute;, if any,
in order, followed by the
A
As an example, taken from a typical display oriented text markup language,
one might want to express font sizes as an integer between 8 and 72, or with
one of the tokens "small", "medium" or "large". The
this is a test
]]>
As mentioned in
regardless of the
Unless otherwise specifically allowed by this specification
(
Either the itemType
&i-attribute; or the
Either the base
&i-attribute; or the
simpleType
&i-child; of the
Either the memberTypes
&i-attribute; of the simpleType
&i-child;.
A value in a
the value is facet-valid with respect to the particular
A string is datatype-valid with respect to a datatype definition if:
it
if
if
if
if
if
the value denoted by the literal
The
If
If
There is a simple type definition nearly equivalent to the simple version
of the
Every
for any
there is no pair
for all
for any
for any
for any
Note that a consequence of the above is that, given
On every datatype, the operation Equal is defined in terms of the equality
property of the
There is no schema component corresponding to the
A
for no
for all
for all
The notation
A
for all
The fact that this specification does not define an
indicating whether an
When
When
When
indicating whether a
When
When
When
It
is sometimes useful to categorize
indicating whether the
When
When
one of
all of the following are true:
one of
one of
either of the following are true:
When
When
indicating whether a
When
When
When
For
For
Constraining a
The following is the definition of a
If
The XML representation for a
value
&i-attribute;
fixed
&i-attribute;, if present, otherwise false
A value in a
if the
if
if
if the
It is an
It is an
For
For
Constraining a
The following is the definition of a
If
The XML representation for a
value
&i-attribute;
fixed
&i-attribute;, if present, otherwise false
A value in a
if the
if
if
if the
If both
It is an
For
For
Constraining a
The following is the definition of a
If
The XML representation for a
value
&i-attribute;
fixed
&i-attribute;, if present, otherwise false
A value in a
if the
if
if
if the
It is an
Constraining a
The following is the definition of a
The XML representation for a
value
&i-attribute;
If multiple
It is a consequence of the schema representation constraint
Thus, to impose two
A literal in a
the literal is among the set of character sequences denoted by
the
Constraining a
The following example is a datatype definition for a
The XML representation for an
value
&i-attribute;
If multiple
A value in a
It is an
No normalization is done, the value is not changed (this is the
behavior required by
All occurrences of #x9 (tab), #xA (line feed) and #xD (carriage return) are replaced with #x20 (space)
After the processing implied by
The notation #xA used here (and elsewhere in this specification) represents
the Universal Character Set (UCS) code point hexadecimal A
(line feed), which is denoted by
U+000A. This notation is to be distinguished from 

,
which is the XML
collapse
and cannot be changed by a schema author; for
preserve
; for any type collapse
and cannot
be changed by a schema author. For all datatypes
For more information on
Constraining a
The following example is the datatype definition for
the
{preserve, replace, collapse}
.
If
The XML representation for a
value
&i-attribute;
fixed
&i-attribute;, if present, otherwise false
There are no
It is an
Constraining a
The following is the definition of a
If
The XML representation for a
value
&i-attribute;
fixed
&i-attribute;, if present, otherwise false, if present, otherwise false
A value in an
if the
if the
It is an
It is an
Constraining a
The following is the definition of a
Note that the
If
The XML representation for a
value
&i-attribute;
fixed
&i-attribute;, if present, otherwise false
A value in an
if the
if the
It is an
It is an
It is an
Constraining a
The following is the definition of a
Note that the
If
The XML representation for a
value
&i-attribute;
fixed
&i-attribute;, if present, otherwise false
A value in an
if the
if the
It is an
It is an
It is an
Constraining a
The following is the definition of a
If
The XML representation for a
value
&i-attribute;
fixed
&i-attribute;, if present, otherwise false
A value in an
if the
if the
It is an
It is an
Constraining a
The following is the definition of a
If
The XML representation for a
value
&i-attribute;
fixed
&i-attribute;, if present, otherwise false
A value in a
the number of decimal digits in the value is less than or equal to
It is an
Constraining a
The following is the definition of a
If
The XML representation for a
value
&i-attribute;
fixed
&i-attribute;, if present, otherwise false
A value in a
the number of decimal digits in the fractional part of the
value is less than or equal to
It is an
This specification describes two levels of conformance for datatype processors. The first is required of all processors. Support for the other will depend on the application environments for which the processor is intended.
By separating the conformance requirements relating to the concrete
syntax of XML schema documents, this specification admits processors
which validate using schemas stored in optimized binary representations,
dynamically created schemas represented as programming language data
structures, or implementations in which particular schemas are compiled
into executable code such as C or Java. Such processors can be said to
be
The following table shows the values of the fundamental facets
for each
The
C -- represents a digit used in the thousands and hundreds components, the "century" component, of the time element "year". Legal values are from 0 to 9.
Y -- represents a digit used in the tens and units components of the time element "year". Legal values are from 0 to 9.
M -- represents a digit used in the time element "month". The two digits in a MM format can have values from 1 to 12.
D -- represents a digit used in the time element "day". The two digits in a DD format can have values from 1 to 28 if the month value equals 2, 1 to 29 if the month value equals 2 and the year is a leap year, 1 to 30 if the month value equals 4, 6, 9 or 11, and 1 to 31 if the month value equals 1, 3, 5, 7, 8, 10 or 12.
h -- represents a digit used in the time element "hour". The two digits in a hh format can have values from 0 to 23.
m -- represents a digit used in the time element "minute". The two digits in a mm format can have values from 0 to 59.
s -- represents a digit used in the time element "second". The two
digits in a ss format can have values from 0 to 60. In the formats
described in this specification the whole number of seconds
Strictly speaking, a value of
60 or more is not sensible unless the month and day could
represent March 31, June 30, September 30, or December 31
For all the information items indicated by the above characters, leading zeros are required where indicated.
In addition to the above, certain characters are used as designators and appear as themselves in lexical formats.
T -- is used as time designator to indicate the start of the
representation of the time of day in
Z -- is used as time-zone designator, immediately (without a space)
following a data element expressing the time of day in Coordinated
Universal Time (UTC) in
In the lexical format for
P -- is used as the time duration designator, preceding a data element representing a given duration of time.
Y -- follows the number of years in a time duration.
M -- follows the number of months or minutes in a time duration.
D -- follows the number of days in a time duration.
H -- follows the number of hours in a time duration.
S -- follows the number of seconds in a time duration.
The values of the
Year, Month, Day, Hour and Minutes components are not restricted but
allow an arbitrary integer. Similarly, the value of the Seconds component
allows an arbitrary decimal. Thus, the lexical format for
An optional minus sign is allowed immediately preceding, without a space,
the lexical representations for
The year "0000" is an illegal year value.
To accommodate year values greater than 9999, more than four digits are
allowed in the year representations of
Given a
fQuotient(a, b) = the greatest integer less than or equal to a/b
fQuotient(-1,3) = -1 fQuotient(0,3)...fQuotient(2,3) = 0 fQuotient(3,3) = 1 fQuotient(3.123,3) = 1
modulo(a, b) = a - fQuotient(a,b)*b
modulo(-1,3) = 2 modulo(0,3)...modulo(2,3) = 0...2 modulo(3,3) = 0 modulo(3.123,3) = 0.123
fQuotient(a, low, high) = fQuotient(a - low, high - low)
fQuotient(0, 1, 13) = -1 fQuotient(1, 1, 13) ... fQuotient(12, 1, 13) = 0 fQuotient(13, 1, 13) = 1 fQuotient(13.123, 1, 13) = 1
modulo(a, low, high) = modulo(a - low, high - low) + low
modulo(0, 1, 13) = 12 modulo(1, 1, 13) ... modulo(12, 1, 13) = 1...12 modulo(13, 1, 13) = 1 modulo(13.123, 1, 13) = 1.123
maximumDayInMonthFor(yearValue, monthValue) =
M := modulo(monthValue, 1, 13) Y := yearValue + fQuotient(monthValue, 1, 13) Return a value based on M and Y:
31 | M = January, March, May, July, August, October, or December | |
30 | M = April, June, September, or November | |
29 | M = February AND (modulo(Y, 400) = 0 OR (modulo(Y, 100) != 0) AND modulo(Y, 4) = 0) | |
28 | Otherwise |
Essentially, this calculation is equivalent to separating D into <year,month>
and <day,hour,minute,second> fields. The <year,month> is added to S.
If the day is out of range, it is
Leap seconds are handled by the computation by treating them as overflows. Essentially, a value of 60 seconds in S is treated as if it were a duration of 60 seconds added to S (with a zero seconds field). All calculations thereafter use 60 seconds per minute.
Thus the addition of either PT1M or PT60S to any dateTime will always produce the same result. This is a special definition of addition which is designed to match common practice, and -- most importantly -- be stable over time.
A definition that attempted to take leap-seconds into account would need to
be constantly updated, and could not predict the results of future
implementation's additions. The decision to introduce a leap second in UTC
is the responsibility of the
The following is the precise specification. These steps must be followed in the same order. If a field in D is not specified, it is treated as if it were zero. If a field in S is not specified, it is treated in the calculation as if it were the minimum allowed value in that field, however, after the calculation is concluded, the corresponding field in E is removed (set to unspecified).
temp := S[month] + D[month] E[month] := modulo(temp, 1, 13) carry := fQuotient(temp, 1, 13)
E[year] := S[year] + D[year] + carry
E[zone] := S[zone]
temp := S[second] + D[second] E[second] := modulo(temp, 60) carry := fQuotient(temp, 60)
temp := S[minute] + D[minute] + carry E[minute] := modulo(temp, 60) carry := fQuotient(temp, 60)
temp := S[hour] + D[hour] + carry E[hour] := modulo(temp, 24) carry := fQuotient(temp, 24)
if S[day] > maximumDayInMonthFor(E[year], E[month])
tempDays := maximumDayInMonthFor(E[year], E[month]) else if S[day] < 1
tempDays := 1 else
tempDays := S[day] E[day] := tempDays + D[day] + carry E[day] := E[day] + maximumDayInMonthFor(E[year], E[month] - 1) carry := -1 E[day] := E[day] - maximumDayInMonthFor(E[year], E[month]) carry := 1 temp := E[month] + carry E[month] := modulo(temp, 1, 13) E[year] := E[year] + fQuotient(temp, 1, 13)
dateTime | duration | result |
---|---|---|
2000-01-12T12:13:14Z | P1Y3M5DT7H10M3.3S | 2001-04-17T19:23:17.3Z |
2000-01 | -P3M | 1999-10 |
2000-01-12 | PT33H | 2000-01-13 |
Time durations are added by simply adding each of their fields, respectively, without overflow.
The order of addition of durations to instants
((dateTime + duration1) + duration2) != ((dateTime + duration2) + duration1)
(2000-03-30 + P1D) + P1M = 2000-03-31 + P1M = 2001-04-30
(2000-03-30 + P1M) + P1D = 2000-04-30 + P1D = 2000-05-01
A
|
characters.
For all |
Denoting the set of strings |
---|---|
(empty string) | the set containing just the empty string |
all strings in |
|
all strings in |
For all |
Denoting the set of strings |
---|---|
all strings in |
|
all strings |
For all |
Denoting the set of strings |
---|---|
all strings in |
|
the empty string, and all strings in
|
|
All strings in |
|
All strings |
|
All strings |
|
All strings in |
|
All strings in L(S{n}S*) |
|
All strings |
|
The set containing only the empty string |
The regular expression language in the Perl Programming Language
S{,m)
, since it is logically equivalent to S{0,m}
.
We have, therefore, left this logical possibility out of the regular
expression language defined by this specification. We welcome
further input from implementors and schema authors on this issue.
?
, *
, +
,
{n,m}
or {n,}
, which have the meanings
defined in the table above.
For all |
Denoting the set of strings |
---|---|
the single string consisting only of |
|
all strings in |
|
( |
all strings in |
.
, \
, ?
,
*
, +
, {
, }
(
, )
, [
or ]
.
These characters have special meanings in
Note that a
A character class is either a
[
and ]
characters. For all character
groups
For all |
Identifying the set of characters |
---|---|
all characters in |
|
all characters in |
|
all characters in |
|
all characters in |
^
character.
For all
-
character.
For any
A single XML character is a
The [
, ]
, and \
characters are not
valid character ranges;
The ^
character is only valid at the beginning of a
The -
character is a valid character range only at the
beginning or end of a
A
\
If s is the first character in a ^
\
or [
; and
The code point of
The code point of a
The valid |
Identifying the set of characters |
---|---|
\n |
the newline character (#xA) |
\r |
the return character (#xD) |
\t |
the tab character (#x9) |
\\ |
\ |
\| |
| |
\. |
. |
\- |
- |
\^ |
^ |
\? |
? |
\* |
* |
\+ |
+ |
\{ |
{ |
\} |
} |
\( |
( |
\) |
) |
\[ |
[ |
\] |
] |
X
,
can be identified with a \p{X}
.
The complement of this set is specified with the
\P{X}
.
([\P{X}]
= [^\p{X}]
).
The following table specifies the recognized values of the "General Category" property.
Category | Property | Meaning |
---|---|---|
Letters | L | All Letters |
Lu | uppercase | |
Ll | lowercase | |
Lt | titlecase | |
Lm | modifier | |
Lo | other | |
Marks | M | All Marks |
Mn | nonspacing | |
Mc | spacing combining | |
Me | enclosing | |
Numbers | N | All Numbers |
Nd | decimal digit | |
Nl | letter | |
No | other | |
Punctuation | P | All Punctuation |
Pc | connector | |
Pd | dash | |
Ps | open | |
Pe | close | |
Pi | initial quote (may behave like Ps or Pe depending on usage) | |
Pf | final quote (may behave like Ps or Pe depending on usage) | |
Po | other | |
Separators | Z | All Separators |
Zs | space | |
Zl | line | |
Zp | paragraph | |
Symbols | S | All Symbols |
Sm | math | |
Sc | currency | |
Sk | modifier | |
So | other | |
Other | C | All Others |
Cc | control | |
Cf | format | |
Co | private use | |
Cn | not assigned |
The properties mentioned above exclude the Cs
property.
The Cs
property identifies "surrogate" characters, which do not
occur at the level of the "character abstraction" that XML instance documents
operate on.
X
(with all white space stripped out),
can be identified with a \p{IsX}
.
The complement of this set is specified with the
\P{IsX}
.
([\P{IsX}]
= [^\p{IsX}]
).
The following table specifies the recognized block names (for more
information, see the "Blocks.txt" file in
Start Code | End Code | Block Name | Start Code | End Code | Block Name | |
---|---|---|---|---|---|---|
#x0000 | #x007F | BasicLatin | #x0080 | #x00FF | Latin-1Supplement | |
#x0100 | #x017F | LatinExtended-A | #x0180 | #x024F | LatinExtended-B | |
#x0250 | #x02AF | IPAExtensions | #x02B0 | #x02FF | SpacingModifierLetters | |
#x0300 | #x036F | CombiningDiacriticalMarks | #x0370 | #x03FF | Greek | |
#x0400 | #x04FF | Cyrillic | #x0530 | #x058F | Armenian | |
#x0590 | #x05FF | Hebrew | #x0600 | #x06FF | Arabic | |
#x0700 | #x074F | Syriac | #x0780 | #x07BF | Thaana | |
#x0900 | #x097F | Devanagari | #x0980 | #x09FF | Bengali | |
#x0A00 | #x0A7F | Gurmukhi | #x0A80 | #x0AFF | Gujarati | |
#x0B00 | #x0B7F | Oriya | #x0B80 | #x0BFF | Tamil | |
#x0C00 | #x0C7F | Telugu | #x0C80 | #x0CFF | Kannada | |
#x0D00 | #x0D7F | Malayalam | #x0D80 | #x0DFF | Sinhala | |
#x0E00 | #x0E7F | Thai | #x0E80 | #x0EFF | Lao | |
#x0F00 | #x0FFF | Tibetan | #x1000 | #x109F | Myanmar | |
#x10A0 | #x10FF | Georgian | #x1100 | #x11FF | HangulJamo | |
#x1200 | #x137F | Ethiopic | #x13A0 | #x13FF | Cherokee | |
#x1400 | #x167F | UnifiedCanadianAboriginalSyllabics | #x1680 | #x169F | Ogham | |
#x16A0 | #x16FF | Runic | #x1780 | #x17FF | Khmer | |
#x1800 | #x18AF | Mongolian | #x1E00 | #x1EFF | LatinExtendedAdditional | |
#x1F00 | #x1FFF | GreekExtended | #x2000 | #x206F | GeneralPunctuation | |
#x2070 | #x209F | SuperscriptsandSubscripts | #x20A0 | #x20CF | CurrencySymbols | |
#x20D0 | #x20FF | CombiningMarksforSymbols | #x2100 | #x214F | LetterlikeSymbols | |
#x2150 | #x218F | NumberForms | #x2190 | #x21FF | Arrows | |
#x2200 | #x22FF | MathematicalOperators | #x2300 | #x23FF | MiscellaneousTechnical | |
#x2400 | #x243F | ControlPictures | #x2440 | #x245F | OpticalCharacterRecognition | |
#x2460 | #x24FF | EnclosedAlphanumerics | #x2500 | #x257F | BoxDrawing | |
#x2580 | #x259F | BlockElements | #x25A0 | #x25FF | GeometricShapes | |
#x2600 | #x26FF | MiscellaneousSymbols | #x2700 | #x27BF | Dingbats | |
#x2800 | #x28FF | BraillePatterns | #x2E80 | #x2EFF | CJKRadicalsSupplement | |
#x2F00 | #x2FDF | KangxiRadicals | #x2FF0 | #x2FFF | IdeographicDescriptionCharacters | |
#x3000 | #x303F | CJKSymbolsandPunctuation | #x3040 | #x309F | Hiragana | |
#x30A0 | #x30FF | Katakana | #x3100 | #x312F | Bopomofo | |
#x3130 | #x318F | HangulCompatibilityJamo | #x3190 | #x319F | Kanbun | |
#x31A0 | #x31BF | BopomofoExtended | #x3200 | #x32FF | EnclosedCJKLettersandMonths | |
#x3300 | #x33FF | CJKCompatibility | #x3400 | #x4DB5 | CJKUnifiedIdeographsExtensionA | |
#x4E00 | #x9FFF | CJKUnifiedIdeographs | #xA000 | #xA48F | YiSyllables | |
#xA490 | #xA4CF | YiRadicals | #xAC00 | #xD7A3 | HangulSyllables | |
#xD800 | #xDB7F | HighSurrogates | #xDB80 | #xDBFF | HighPrivateUseSurrogates | |
#xDC00 | #xDFFF | LowSurrogates | #xE000 | #xF8FF | PrivateUse | |
#xF900 | #xFAFF | CJKCompatibilityIdeographs | #xFB00 | #xFB4F | AlphabeticPresentationForms | |
#xFB50 | #xFDFF | ArabicPresentationForms-A | #xFE20 | #xFE2F | CombiningHalfMarks | |
#xFE30 | #xFE4F | CJKCompatibilityForms | #xFE50 | #xFE6F | SmallFormVariants | |
#xFE70 | #xFEFE | ArabicPresentationForms-B | #xFEFF | #xFEFF | Specials | |
#xFF00 | #xFFEF | HalfwidthandFullwidthForms | #xFFF0 | #xFFFD | Specials | |
#x10300 | #x1032F | OldItalic | #x10330 | #x1034F | Gothic | |
#x10400 | #x1044F | Deseret | #x1D000 | #x1D0FF | ByzantineMusicalSymbols | |
#x1D100 | #x1D1FF | MusicalSymbols | #x1D400 | #x1D7FF | MathematicalAlphanumericSymbols | |
#x20000 | #x2A6D6 | CJKUnifiedIdeographsExtensionB | #x2F800 | #x2FA1F | CJKCompatibilityIdeographsSupplement | |
#xE0000 | #xE007F | Tags | #xF0000 | #xFFFFD | PrivateUse | |
#x100000 | #x10FFFD | PrivateUse |
For example, the \p{IsBasicLatin}
.
Character sequence | Equivalent |
---|---|
. | [^\n\r] |
\s | [#x20\t\n\r] |
\S | [^\s] |
\i |
the set of initial name characters, those
|
\I | [^\i] |
\c |
the set of name characters, those
|
\C | [^\c] |
\d | \p{Nd} |
\D | [^\d] |
\w |
[#x0000-#x10FFFF]-[\p{P}\p{Z}\p{C}]
( |
\W | [^\w] |
The
The listing below is for the benefit of readers of a printed version of this document: it collects together all the definitions which appear in the document above.
The following have contributed material to this draft:
Co-editor Ashok Malhotra's work on this specification from March 1999 until February 2001 was supported by IBM.
The editors acknowledge the members of the XML Schema Working Group, the members of other W3C Working Groups, and industry experts in other forums who have contributed directly or indirectly to the process or content of creating this document. The Working Group is particularly grateful to Lotus Development Corp. and IBM for providing teleconferencing facilities.
The current members of the XML Schema Working Group are:
The XML Schema Working Group has benefited in its work from the participation and contributions of a number of people not currently members of the Working Group, including in particular those named below. Affiliations given are those current at the time of their work with the WG.