Topicmaps.net's Processing Model for XTM 1.0, version 1.0.1: A Processing Model for XML Topic Maps body {font-family:Arial, Helvetica, Geneva; font-size:12pt}p {margin-left:10pt; margin-right:10pt}h3 {color:#004F00} The informative Topic Maps website maintained by Michel Biezunski (InfoLoom) and Steven R. Newcomb (Coolheads Consulting)Topicmaps.net's Processing Model for XTM 1.0, version 1.0.1 A Processing Model for XML Topic Maps Steven R. Newcomb, srn@coolheads.com and Michel Biezunski, mb@infoloom.comThis version is dated March 20, 2001.Topicmaps.net's Processing Model for XTM 1.0 provides an explanation of the meaning of XTM syntax which is entirely true to the vision that has guided the authors in discovering, teaching, developing and testing the topic map paradigm.This version of Topicmaps.net's Processing Model for XTM 1.0 illustrates only the processing of topic map documents that conform to the XTM 1.0 Specification (i.e., XTM <topicMap> elements). Future efforts will additionally discuss the processing of other syntaxes for interchanging topic map information, including the interchange syntax (meta-DTD) specified by ISO/IEC 13250:2000.The authors gratefully acknowledge the contributions and counsel of Sam Hunting, Victoria T. Newcomb and Peter Newcomb.Previous versions of this material once appeared in drafts of the XTM 1.0 Specification published at http://www.topicmaps.org. This version is licensed to the public for all purposes and in every way. The authors request that all copies and translations of Topicmaps.net's Processing Model for XTM 1.0 be complete and correct, including this and all other notices, and including attribution to the authors by names and e-mail addresses, please. The authors also request that any claims of conformance to Topicmaps.net's Processing Model be accurate. Either a processing system conforms to the model exactly and comprehensively in every detail, or it does not conform, and no claim of conformance is justified.A verbose tutorial-style glossary is attached.1.0 Purpose of This Processing ModelTopicmaps.net's Processing Model for XTM 1.0 defines a set of rules for processing topic map documents in order to reconstitute the meaning of the information they are intended to convey to their recipients. It could be used as a partial blueprint for a topic maps application, but that is not its primary purpose. Its primary purpose is merely to illustrate, in a rigorous fashion, the authors' deepest understanding of the meaning of topic map information.In Topicmaps.net's Processing Model for XTM 1.0, the result of processing <topicMap> elements is described in terms of "topic map graphs" that consist of "nodes" and "arcs" which connect the nodes in certain ways.Note: Although Topicmaps.net's Processing Model for XTM 1.0 illustrates the meaning of XTM 1.0 syntax in terms of graphs and their component nodes and arcs, there is no intention to constrain all implementations to a graph paradigm. It is possible to implement the relationships and semantics illustrated by this Processing Model accurately using relational databases, for example.The topic map graph (or other representation of the full understanding of the interchanged information) must be fully completed before any other processing is done. After the topic map graph is constructed, topic map applications may perform additional processing (such as answering queries, rendition of one or more finding aids, etc.), beyond that described in Topicmaps.net's Processing Model for XTM 1.0, to exploit the information conveyed by interchangeable topic map documents; such additional processing is not constrained by Topicmaps.net's Processing Model for XTM 1.0.2.0 Topic Map Information and <topicMap> Elements areDifferently StructuredTopic map information is inherently multidimensional. For interchange, topic map information has to be flattened into a sequence of characters. After interchange, the information must be reconstituted so that it is multidimensional once more. Topicmaps.net's Processing Model for XTM 1.0 is wholly concerned with the reconstitution process, and with the result of the reconstitution process (a "topic map graph").The W3C DOM API, for example, cannot provide direct access to the topic map information represented by a <topicMap> element. It can only provide access to the syntactic representation of that information -- the <topicMap> element itself. The DOM can be used by applications that reconstitute topic map information from <topicMap> elements, but a DOM tree made from a <topicMap> element is not ready-to-use topic map information.3.0 What's in a Topic Map Graph?A topic map graph consists of nodes and arcs which connect the nodes.3.1 There are three kinds of nodes:"t-node" (for topic node)"a-node" (for association node), and "s-node" (for scope node). The only thing that characterizes a node is the fact that it serves as the endpoint of one or more arcs. In Topicmaps.net's Processing Model for XTM 1.0, the nodes themselves have no other properties or characteristics.The three different node types have rules about which ends of how many of which kinds of arcs they are allowed to serve as (see Table 1, "Matrix of constraints on the service of types of nodes as specific endpoints of types of arcs"). Other than these rules, no other formal features distinguish the three kinds of nodes, for purposes of Topicmaps.net's Processing Model for XTM 1.0.3.2 There are four kinds of arcs:(1) association member"association member" arcs have two ends, called the "association" end and the "member" end.The "association" end is always an a-node. (A-nodes always represent associations.)The "member" end is a node that represents a player of a specific role in the association represented by the a-node.The "association member" arc is the only kind of arc that has a label. The label is always a t-node that represents the topic whose subject is the role being played by the member (represented by the node at the "member" end) in the association (represented by the a-node at the "association" end).(2) association scope"association scope" arcs have two ends, called the "association" end and the "scope" end.The "association" end is always an a-node. (A-nodes always represent associations.)The "scope" end is always an s-node. S-nodes represent the scopes within which associations are valid. In the graph, all associations are represented as a-nodes, and all a-nodes have at least one scope. An "association scope" arc represents the fact that an association (an a-node) has a certain scope. Only a-nodes have scope. In conformance with the XTM Conceptual Model, all topic characteristics, not only including memberships in associations that were syntactically represented as <association> elements, but also including all topic names and topic occurrences, are represented in the graph as a-nodes. Therefore, when the word "association" is used in the context of the XTM Conceptual Model, and in the context of Topicmaps.net's Processing Model for XTM 1.0, it has a somewhat broader definition than it does when used as an element type name (<association>, in the context of the XTM Interchange Syntax).(3) association template"association template" arcs have two ends, called the "association" end and the "template" end.The "association" end is always an a-node. (A-nodes always represent associations. Remember: we're using the word "association" here in a sense that covers not only the kinds of things that are represented syntactically by <association> elements, but also the kinds of things that are represented syntactically by <occurrence> elements and <name> elements.)The "template" end is always a t-node that represents the subject that is the model with which the a-node at the "association" end will be validated for conformance. (The model is represented in the graph by a set of a certain kind of a-node and their members; all of these a-nodes have this t-node as a member playing a certain role; this will be described later.)(4) scope component"scope component" arcs have two ends, called the "scope" end and the "component" end.The "scope" end is always an s-node. The scope represented by the s-node is the set of t-nodes (and/or a-nodes) that serve as the "component" ends of the set of "scope component" arcs of which the s-node serves as the "scope" end.Topicmaps.net's Processing Model for XTM 1.0 constrains the construction of topic map graphs in such a way as to allow only certain node types to serve as certain endpoints of certain arc types. In some cases, there are also numeric constraints on the number of arc endpoints that a given node can serve as. The following table sets forth these constraints:3.3 Table 1 - Matrix of constraints on the service of types of nodes as specific endpoints of types of arcs. a given t-node a given a-node a given s-node may serve as “association” endpoint of “association member” arcs? no * no may serve as “member” endpoint of “association member” arcs? * * no may serve as label of “association member” arcs? * no no may serve as “association” endpoint of “association scope” arcs? no + no may serve as “scope” endpoint of “association scope” arcs? no no * Note 1 may serve as “association” endpoint of “association template” arcs? no ? no may serve as “template” endpoint of “association template” arcs? * no no may serve as “scope” endpoint of “scope component” arcs? no no * may serve as “component” endpoint of “scope component” arcs? * * no Legend for the above table: no Instances of the given node type cannot serve as this endpoint or label. ? Instances of the given node type may serve as one such endpoint (zero or one). * Instances of the given node type may serve as any number of such endpoints (zero or more). + Instances of the given node type must serve as one such endpoint, and may serve as more such endpoints (one or more). 1Topicmaps.net's Processing Model for XTM 1.0 neither prohibits nor requires the existence of s-nodes that are not used as the scope of any a-nodes.3.4 Elements vs. NodesIn a topic map graph, <topic> elements are represented by t-nodes. Similarly, <association> elements are represented by a-nodes.The number of t-nodes that appears in the topic map graph constructed from a <topicMap> element is typically not the same as the number of <topic> elements in that <topicMap>. Similarly, but for different reasons, the number of a-nodes in the topic map graph constructed from a <topicMap> element is not the same as the number of <association> elements in that <topicMap>.For example, there can be more nodes than elements:The contents of a <topicMap> element may demand the existence of t-nodes and a-nodes by means other than (and in addition to) <topic> and <association> elements. The existence of such nodes is said to be "implicitly demanded". Only <topic> and<association> elements explicitly demand the existence of corresponding t-nodes and a-nodes; the existence of all other t-nodes and a-nodes is implicitly demanded. For example, when an information resource is referenced by means of a<resourceRef> element, the existence of a corresponding t-node is implicitly demanded. (Its subject is the referenced information resource; i.e., in the case of <resourceRef> elements, the subject is the resource itself -- not what the resource signifies.) In the case of associations, the existence of a-nodes may be implicitly demanded by <instanceOf>, <occurrence>, and <baseName> elements.For example, there can be fewer nodes than elements:Topic map processing may also create a single t-node by merging topic characteristics declared via multiple <topic> elements. This must happen whenever the processing system has determined that the two elements have the same subject. (For more about topic merging rules, see the sections, "Subject-based Merging Rule" and "Topic Naming Constraint-based Merging Rule".) Multiple <association>s and other a-node demanders that represent the exact same association are also merged into a single a-node.For any given t-node or a-node, the total number of "node demanders" is always greater than or equal to one.3.5 t-nodes and subjectsA t-node always represents ("reifies") exactly one subject, just as does a <topic> element in an <topicMap> element.In a <topicMap> element, there can be any number of <topic> elements that have the same subject. In a topic map graph, however, there can only be one t-node that has that subject. (See the "merging" glossary entry for more information.)3.6 Comparison of t-nodes vs. a-nodesIn some sense, there isn't very much difference between t-nodes and a-nodes. For purposes of participating in associations, an a-node is treatable as a t-node. In fact, just like a t-node, an a-node always represents (reifies) exactly one subject. In the case of an a-node, however, the subject is always a specific relationship between other subjects, each of which is represented in the graph by either a t-node or an a-node.Since they represent subjects, a-nodes can be members of other associations (i.e., to serve as the "member" endpoints of "association member" arcs). (Indeed, it is perfectly possible for a single a-node to serve as both ends of a single "association member" arc; in this case, the association itself is participating in the relationship that it represents.)Under certain circumstances, it is even possible for a t-node to be merged with an a-node. The result is always an a-node.These are the differences between t-nodes and a-nodes:Only an a-node can serve as the "association" end of one or more "association member" arcs.Only a-nodes can represent relationships in which the members are represented by topics in the topic map. (It's possible for a t-node to represent a subject that is a relationship, because it's possible for a t-node to represent any subject. However, if the relationship has members that are represented by topics in the topic map, then an a-node must be used to represent the relationship, because only a-nodes can serve as the "association" end of one or more "asssociation member" arcs. Without such arcs, there can be no representation of the connection between the relationship and the participants in that relationship.)Only a-nodes can have scope (i.e., can serve as the "association" end of one or more "association scope" arcs). (Indeed, every a-node must have scope, i.e., it must serve as the "association" end of at least one "association scope" arc.)Only t-nodes can be association templates. It is a reportable error if a situation exists in which an a-node (i.e., a node that serves as the "association" end of any arc type) also serves as the "template" end of an "association template" arc.4.0 Subject identity pointsIn addition to serving as the endpoints of arcs, t-nodes and a-nodes have sets of subject identity points. Subject identity points are always addressable resources that are outside the topic map graph. In other words, for purposes of Topicmaps.net's Processing Model for XTM 1.0, the connections between the nodes and their identity points are not arcs, and the identity points themselves are not nodes. Subject identity points, and the connections between t-nodes (and a-nodes) and their subject identity points, are entirely in the realm of implementations; they are not constrained by Topicmaps.net's Processing Model for XTM 1.0 except to note that implementers must provide for them in such a way as to support the merging requirements set forth herein. (See the glossary entry for "subject identity points" for more information.) Note: It seems reasonable to implement t-nodes and a-nodes in such a way that users can retrieve the addresses, perhaps including all of the addresses that were used to reference each subject identity point. It also seems reasonable to implement topic map applications in such a way as to support traversal to any subject indicator or subject constituting resource. Indeed, it seems consistent with the implicit promise of the name of the "topic maps" paradigm to implement topic map applications in such a way as to support the initiation of traversal from any subject identity point, as well as to any subject identity point. Topicmaps.net's Processing Model for XTM 1.0, however, imposes no such requirement.It is a reportable error if a situation exists in which a single t-node appears to have more than one subject constituting resource.5.0 Association templatesAssociations are not required to have association templates. In topic map graph terms, this means that an a-node is not required to serve as an endpoint of any association template arc.If a t-node serves as the "template" end of one or more "association template arcs", then it is called an "association template t-node". An association template t-node establishes all of the roles that members of an a-node can play. If an a-node has a template (i.e., if it serves as the "association" end of an "association template" arc), it is a reportable error if a member of that a-node does not play one of the roles specified by the template.Association template t-nodes must play the role of "template" in one or more "template-role-rpc" associations. It is a reportable error if a t-node that serves as the "template" end of any "association template" arc does not also play the role of "template" in one or more "template-role-rpc" associations.All "template-role-rpc" association a-nodes were themselves templated in the original version of the XTM 1.0 Specification, which may or may not still be available at http://www.topicmaps.org/xtm/1.0/core.html#xtmmaps. The published subject indicator is http://www.topicmaps.org/xtm/1.0/psi1.xtm#at-template-role-rpc.Each of the "template-role-rpc" associations in which an association template t-node plays the role of "template" establishes: (the subject that is) the template itself, represented by exactly one t-node that plays the "template" role in the "template-role-rpc" association; (the subject that is) one of the member roles of the template, represented by exactly one t-node that plays the "role" role in the "template-role-rpc" association; and (the subject that is) the class of topic of which all players of the role must be instances, optionally represented by exactly one t-node that plays the "rpc" ("role player constraints") role in the "template-role-rpc" association. (Note: The following incredibly convoluted sentence has been deliberately formatted in a broken-up fashion in order to make it easier to parse. If you must re-format this difficult material, please take pity on our readers and figure out some way to make it at least as clear, and hopefully clearer.) The fact that a topic that plays the role being templated is an instance of the topic that plays the "rpc" role in the template must be represented in the topic map by a class-instance association in which the topic that plays the role being templated also plays the "instance" role, and the topic that plays the "class" role is either: the topic that plays the "rpc" role in the template, or a superclass of the topic that plays the "rpc" role in the template. The "class-instance" association mentioned in the above paragraph must be an instance of the XTM-defined "class-instance" association template, whose published subject indicator may or may not still be available at http://www.topicmaps.org/xtm/1.0/psi1.xtm#at-class-instance. If many classes of topics must be allowed play the role, the topic that plays the "rpc" role must be a subclass of all of them, so that any appropriate topic that plays the role being templated will, in effect, be known to be appropriate by virtue of the fact that it is an instance of at least one of them. The fact that a topic that plays the rpc role is a subclass of any other topic must be represented by an association (a-node) that is an instance of the XTM-defined "superclass-subclass" association template, whose published subject indicator may or may not still be available at http://www.topicmaps.org/xtm/1.0/psi1.xtm#at-superclass-subclass, and whose "subclass" role is played by the topic that also plays the "rpc" role. If any of the superclasses of the topic that plays the rpc role is an instance of the XTM-defined "Apply to Set" class, the constraints imposed by such superclasses apply to the entire set of topics that play the role being templated and that are instances of such superclasses. If any of the superclasses of the topic that plays the rpc role is not an instance of the XTM-defined "Apply to Set" class, the constraints imposed by such superclasses apply to each of the topics that play the role being templated and that are instances of such superclasses. Note: As of this writing, no one has yet provided published subject indicators for the purpose of constraining the number of topics that must (or may) play a particular role in a templated association. Users of Topicmaps.net's Processing Model for XTM 1.0 are free to do that in whatever way they desire. To be consistent with Topicmaps.net's Processing Model for XTM 1.0, though, such topics should be instances of the "Apply to Set" topic. If no topic plays the "rpc" role in the template-role-rpc association that specifies a particular role in an association template, then there are no validation constraints on the topics that are permitted to play the role in associations that are instances of the template. At minimum, the act of checking conformance of a role player to its role player constraints is the act of checking for the existence of a class-instance association between the topic that plays the role and the topic whose subject is the role player constraints. All other role player constraints, and all other processing to check the conformance of role players to their respective constraints is necessarily application-defined.6.0 S-nodes and Topic NamespacesS-nodes are, in effect, topic namespaces. Topic namespaces are like topical indexes, in which topics can be "looked up" if the user knows their names. In a given topic namespace, each name corresponds to exactly one topic. The set of topic-basename association a-nodes which serve as the "association" ends of "association scope" arcs whose "scope" end is a given s-node is, in effect, the set of topic basenames, and the t-nodes that have those basenames, in the topic namespace represented by that given s-node. Topic namespaces are like topical indexes, in which topics can be "looked up" if the user knows their names. Note: The authors believe that, for the sake of global knowledge interchange, there must be a minimum basename string length that is required to be supported by all applications that support a given topic map interchange syntax. They suggest that those responsible for the creation and maintenance of such interchange syntaxes consider two possible values: 31 (the maximum key field length in some RDBMS implementations), or 255 (a nice long name length that happens to be the maximum field length in some RDBMS implementations). If the minimum guaranteed-to-be-supported basename length is too short, there is a danger that topic map authors will be forced to abbreviate names in the basenames, which will compromise the intent of the Spec. On the other hand, if the minimum supported basename length is too long, the support of topic namespaces will incur an unfortunate amount of overhead in some implementations. 7.0 The Subject-based Merging RuleA fully-processed topic map graph should have exactly one t-node per subject. This is an ideal state that may or may not be fully achievable automatically, due to limitations on the information available to the topic map processing system. The Subject-based Merging Rule requires conforming topic map processing systems to merge t-nodes that are known to such systems to have the same subject, on the basis of whatever information is available to them. In addition, the Subject-based Merging Rule requires conforming topic map processing systems to conclude, on the basis of certain conditions, that two t-nodes have the same subject, and that they therefore must be merged into a single t-node.There are many situations in which a human being, on the basis of the human being's knowledge, must intervene in order to cause two t-nodes to be merged. An example of such a situation is when two topics have no subject indicator resources, but one of them has "Buster Keaton" as a basename topic characteristic, and the other has "The Great Stone Face" as a basename topic characteristic. A human being with considerable knowledge of the history of American cinema might reasonably conclude that the two topics both have the same subject (the Hollywood actor whose name was "Buster Keaton"), and, accordingly, intervene in topic map processing to cause these two t-nodes to be merged into a single t-node.Conforming topic map processing systems are not required to provide for such human intervention, but implementers of topic map applications are strongly encouraged to consider how best to account for the need to include human beings in the creation, interchange, processing, use, and maintenance of topic map information. This will have the effect of minimizing the need for such human intervention, and to maximally leverage the minimum automated merging capabilities that must be supported by all topic map processing systems, topic map authors are strongly encouraged to use common Published Subject Indicators. Organizations that serve communities of interest are strongly encouraged to create and promulgate the use of Published Subject Indicators for the subjects that their communities use, so that there will be common subject identity points around which relevant materials can be automatically "gathered" via merging operations performed by topic map processing systems that conform to Topicmaps.net's Processing Model for XTM 1.0. Such organizations should commit themselves to preserving the longterm validity of the published addresses of such identity points, in order to protect the value and mergeability of the topic maps that use them.According to Topicmaps.net's Processing Model for XTM 1.0, the minimum merging operations that must be performed by all conforming topic map processing systems under the Subject-based Merging Rule are: Whenever two t-nodes both have identity points that are subject constituting resources, they must be merged if and only if the two subject constituting resources are known to the processing system to be one and the same resource, regardless of how that resource may have been differently addressed. In other words, merging is required if and only if the two addresses are known to the processing system to be equivalent. All t-nodes have at least one subject indicator resource. (If nothing else, a t-node must at least have the syntactic construct that demanded its existence as one of its subject indicators.) Two t-nodes that do not have subject constituting resources shall be merged if and only if: either: one of the two t-nodes has at least one subject indicator resource that is known to the processing system to be the same resource that serves as one of the subject indicators of the other t-node, or: the two subject indicator resources indicating the subject are known (on account of machine intelligence or human intervention) to the processing system to describe the same subject. For purposes of the Subject-based Merging Rule, it is irrelevant whether two subject indicator resources, or two subject constituting resources, contain the same data or are the same string. A simple string comparison of the two subject indicator resources is not, in the general case, a reliable indication of whether or not the same subject is being described. For example, different products in different sales catalogs may coincidentally have the same catalog number, and a comparison of the two catalog numbers does not indicate that they are the same product. Therefore, the Subject-based Merging Rule is not based on comparing the data content of the resources that serve as identity points. Merging must occur if and only if: either both subject identity points are subject indicators, or both subject identity points are subject constituters (i.e., they can't be mixed), and they are one and the same resource, meaning that they exist in exact same addressable context, even though there may be multiple different equivalent addressing expressions that can arrive at that same resource in that same addressable context. Note: No merging should occur if the addressed information turns out to be different, because in such a case, it's obvious that the two resources are not the same resource. However, the point of this discussion is that the fact that the addressed information turns out to be the same string cannot be regarded as an indication that merging should occur. Note: If merging on the basis of string comparisons is desired, exploitation of the Name-based Merging Rule should be considered. That, after all, is its purpose! Topicmaps.net's Processing Model for XTM 1.0 requires topic map applications to be able to compare internet addresses, under the normal rules of internet addressing, in order to determine whether they address the same resource. For example, when, in an internet address, case is universally nonsignificant (as in the case of internet domain names), topic map processing systems are required to ignore case differences when comparing internet addresses in order to determine whether they address the same resource. Note: Topic map processors may, but are not required, also to apply various heuristics, such as automatically assuming that an address that is not prefixed by a scheme name, but begins with the characters "www.", should be regarded as beginning with "http://". Topic map processors may also take advantage of cataloging services and resources in order to establish whether or not two addresses are equivalent. This is an appropriate arena for competition between system vendors whose systems conform to Topicmaps.net's Processing Model for XTM 1.0. During topic map processing, it may be necessary to apply the Subject-based Merging Rule repeatedly. This is because merging may also occur on the basis of the Name-based Merging Rule, and the effect of such merging may require further merging under the Subject-based Merging Rule. Note: And vice versa.8.0 The Name-based Merging Rule The "topic naming constraint", which applies to all topic maps and on which the "Name-based Merging Rule" is based, can be expressed in terms of Topicmaps.net's Processing Model for XTM 1.0 in the following way:No two t-nodes and/or a-nodes can have the same basename in the same topic namespace (i.e., the same scope). (To "have a basename" is to play the "topic" role in a "topic-basename" association in which the resource that plays the "basename" role is the addressable subject (the subject constituting resource) of the topic that plays the "basename" role. The scope of the "topic-basename" association is, in effect, a namespace consisting of all of the topic-basename associations that have that scope.)The Name-based Merging Rule requires that if, during topic map processing, two or more t-nodes (and/or a-nodes) are found to have the same basename in the same scope, the two nodes must be merged to become a single node, which will become the only t-node or a-node that has that name in that scope (topic namespace).Syntactically (i.e., within a <topicMap> element), each basename is the content of a <baseNameString> element. Note: Remember, as with all other subject identity points, the nature of the connection, if any, from the topic whose subject is the content of a <baseNameString> element (and that also plays the "basename" role in a "topic-basename" association), to the actual content of the <baseNameString> element is not defined by Topicmaps.net's Processing Model for XTM 1.0. In the topic map graph, the scope of a "topic-basename" association (i.e., the Note: Even if no <scope> element specifies the scope of a characteristic assignment, the scope of that characteristic assignment in the topic map graph may nevertheless not be the uncontrained scope, on account the impact of any applicable <mergeMap> elements.variant(See variant name.)variant name[Synonym: variant.] An alternative form of a basename, intended for use in a particular processing context, such as sorting or display.Variant names are not subject to the Name-based Merging Rule; they are not found in topic namespaces. |
|