Types of Tests Used in Rules Archives - Stilo

Style Test

Eric Tubby — Mon, 15 Mar 2021 00:00:00 +0000

Migrate is typically used to convert documents from a highly-styled visual authoring environment to semantic XML. Documents therefore often have styles carefully applied to them such as Heading1, Heading2, Heading3 for titles at different levels. The style test is used to check the styling of an element.

Example: simple style test

Example: style test with a pattern

A style test accepts a pattern as its argument. A simple text string, as shown in the previous example, is a very simple pattern, but more complex patterns are allowed. The following rule will match text styled anything from Heading1 to Heading99 and apply the appropriate topic annotation. For example, Heading3 text would be annotated as p.title.topic(3). See Patterns for more information on patterns.

The post Style Test appeared first on Stilo.

Property Test

Eric Tubby — Mon, 15 Mar 2021 00:00:00 +0000

A content element may have several properties. A style, in fact, is simply a collection of properties controlling the look to apply to text (e.g. font size, font weight, margins, etc.). Style information is not the only source of properties. FrameMaker markers and condition (as in conditional text) also appear as properties of the element. All properties can be checked from a rule’s condition using the property test.

Example: property test

The post Property Test appeared first on Stilo.

Logical Group Test

Eric Tubby — Mon, 15 Mar 2021 00:00:00 +0000

For more complex situations, you can create a condition with multiple parts. These parts are combined using logical conjunction, disjunction, and negation. Migrate provides only the following four possibilities, but they are sufficient.

test	description
all off the following are true	a logical and of the contained tests
some of the following are true	a logical or of the contained tests
none of the following are true	the negation of the logical and of the contained tests
not all of the following are true	the negation of the logical or of the contained tests

Example: And test

Example: And with nested or test

The post Logical Group Test appeared first on Stilo.

Content Test

Eric Tubby — Mon, 15 Mar 2021 00:00:00 +0000

A content test allows you to create a rule that checks the text of an element to see if it applies. The following variations are available:

test	description
contains	the element contains the indicated text anywhere within it
ends with	the element ends with the indicated text
equals	the element contains precisely the indicated text with nothing extra before or after
matches	the element’s content matches the provided pattern
starts with	the element begins with the indicated text

For more information on patterns see Patterns

Example: content starts with test

The post Content Test appeared first on Stilo.

Context Test

Eric Tubby — Mon, 15 Mar 2021 00:00:00 +0000

The context tracking annotation may be used to track arbitrary contextual information about your document. The Migrate conversion does not use this annotation, but you can set context and then test it from other rules to help get annotations applied correctly. For more information on setting the context with annotations see Context Annotations.

Typically, context is used to track information that came earlier in the document and that you no longer have access to. For example, you may have decided that you are in a task topic based on the styling of a title. This information may be useful when attempting to apply annotations to the text of that topic. But the following paragraphs only have information about themselves, their styles and their properties. In order to remember that you are in a task topic, you can set some context and then test for it. The context you choose to set is entirely up to you. Migrate does not interpret it.

Don’t forget to clear or reset your context at the right time. If your task topic is finished, you want to make sure to clear that from your topic type context. If you can’t find a way to do this reliably, you probably should not make use of the context tracking mechanism.

You can also choose to track more than one type of context. For example, you can track both topic type and list nesting context. To keep these things separate, you organize your context into classes.

Each piece of context information has an associated label and depth. Topics in your source document may be structured hierarchically. You can use the label to capture the topic type, and the depth to capture the nesting level. These are the parameters for testing context.

class	arbitrary class name used to distinguish different contexts tracked	required
label	a label describing the value of the context entry	required
depth	the depth of the context entry	optional

If the depth parameter is not provided, the deepest context entry for the specified class will be tested.

Example: context test

The post Context Test appeared first on Stilo.

Contains Element Test

Eric Tubby — Mon, 15 Mar 2021 00:00:00 +0000

The elements presented in the content panel are quite flat, but there is a certain amount of nesting possible. For example, a span element may appear within a paragraph element or nested within another span element. Sometimes, this nesting is significant in correctly identifying content to convert.

The contains element test allows for a nested element to be tested. Thus, your rule can test outer and nested elements in combination. As well, your rule can place annotations on either the outer element or the nested elements. Be default, annotations are placed on the outer elements, but you can provide a label for the nested element, and place the annotation on them explicitly.

Example: contains element test

The following rule attempts to identify content that should be tagged as uicontrol. It is not enough for the term to be bold in the input document. However, bold text found in a paragraph styled as Step is determined to reliably identify a uicontrol.

Example: annotating both the inner and outer elements

The following applies an annotation to the paragraph as well as the nested span.

We want the paragraph to be annotated as a step regardless of whether or not there happen to be any uicontrols. So we need to make sure that the entire test does not fail because the contains element test fails. That is, we don’t really care whether the contains element test evalutes to true or false. That’s why we have placed the contains element test under a logical or-group that include the literal true. This is a common idiom that you may find useful in your rules.

The post Contains Element Test appeared first on Stilo.

Patterns

Eric Tubby — Mon, 15 Mar 2021 00:00:00 +0000

Purpose

Patterns are used for matching text when the exact text to identify is not exactly known. If you know the exact text you are looking for, you can use a simple string comparison such as is equal to. Perhaps there are different variations of the text you are interested in (e.g. optional bits) and for this reason a simple text comparison is insufficient. This is where patterns play an important role in helping you identify content of interest.

Patterns are often used to identify content in the text flow of your document. But, you can also use them to match the value of any property or to match style names.

When you use a pattern in a rule condition’s data content test, the pattern is tested on the data content of all elements to which the rule applies. So, for paragraph-based rules, paragraph content is what the pattern is tested on. If you use patterns within a contains element test, then the data content is the content of the contained element.

Migrate patterns are based on the regular expression support of W3C XML Schema Datatypes.

Simple String Pattern

Do not use patterns for simple text comparisons. Migrate offers the following simple string comparisons:
- is equal to
- is not equal to
- starts with
- does not start with
- contains
- does not contain
- ends with
- does not end with
It is more efficient to use a simple comparison than a matches comparison.
Patterns in Migrate are anchored to the start and end of the content they evaluated on. Thus the patern B will not match the inputs AB, BC, or ABC. In particular, you should be careful to ignore white space at the start or end of your input, or colons at the end, as appropriate. For example, the pattern Example will not match the input Example:

Alternatives in Patterns

A pattern can be used to match one of several specified alternatives. The pipe symbol | is used to separate the alternatives. For example, cat|dog will match either of the inputs cat or dog. Note that the patttern will not match catdog, it will only match one of the alternatives.

Quantifiers

Quantifiers are another mechanism for broadening the range of content identified by a pattern. Quantifiers allow you to identify repetition of content. The edge case is to indicate that 0 or 1 repetitions is acceptable, that is that the presence of the content is optional. Here is a complete list of quantifiers.

Quantifier	Description	Example
?	the content is optional and may appear 0 or 1 times	A? matches only the empty string or the string A
*	the content may appear 0 or more times	A* matches the empty string, or the strings A, AA, AAA etc.
+	the content may appear 1 or more times	A+ matches the strings A, AA, AAA etc.
{n,m}	the content must appear at least n times and at most m times	A{2,4} matches only the strings AA, AAA, AAAA
{n}	the content must appear exactly n times	A{4} matches only the string AAAA
{n,}	the content must appear at least n times	A{2,} matches the strings AA, AAA, AAAA etc.

The quantifiers *, +, and even ? are greedy. They will consume as much input as they can. Migrate extends the usual set of quantifiers with a non-greedy versions for you convenience.

Quantifier	Description
??	non-greedy version of ?
*?	non-greedy version of *
+?	non-greedy version of +

Example: greedy vs non-greedy patterns

Pattern	Input	Explanation
(.*)(\d+)	abc123	successful match, but the first capture contains abc12 and the second 3
(.*?)(\d+)	abc123	successful match, but the first capture contains abc and the second 123
(1?)(\d+)	123	successful match, but the first capture contains 1 and the second 23
(1??)(\d+)	123	123 successful match, but the first capture contains nothing and the second 123

Example: unanchored patterns

This example shows a very common technique for writing unanchored patterns. As previously mentioned, patterns are anchored to the start and end of the content to which they are tried on. If you are looking for some text that may appear anywhere inside this content, simply place .* before and after your pattern. This effectively means that anything can precede or follow the stuff you are interested in. To be precise, . does not match carriage return and newline characters, but this is not usually an issue. In Migrate the content your matching against typically does not contain these characters, at least for the content of paragraph, span, image and title elements.

Pattern	Explanation
.Ice Nine.	matches content that contains the string “Ice Nine”

Character Classes

A character class identifies a set of characters that a pattern should match. There are a few possibilities.

Description	Example
explicit list the characters to match	[aeiou] will match content consisting of a single vowel
range of characters to match	[0-9] will match any (base 10) digit
complement of character class	[^aeiou] will match any consonant
combination of two ranges	[a-zA-Z] will match any upper or lower case letter
combination of a range and explicit characters	[_:a-z] will match underscore, colon or any lower case letter
character class subtraction	[\S-[:-]] will match any non-white space character except for colons and dashes

There are some builtin character classes. You can use these inside a character class definition (i.e. inside the square brackets) or outside.

Builtin Class	Description
\n	new line character (#xA)
\r	carriage return character (#xD)
\t	tab character (#x9)
.	anything except a newline or carriage return (i.e. [^\n\r])
\s	space, tab, newline or carriage return (i.e. [#x20\t\n\r])
\S	non-space character (i.e. [^\s])
\i	a letter, underscore or colon
\I	not a letter, underscore or colon (i.e. [^\i])
\d	same as [0-9]
\D	same as [^\d]
\w	common characters found in words, excludes punctuation and other separators
\W	same as [^\w]

Example: character class

Pattern	Explanation
ABC\tDEF	matches content that starts with ABC, is followed by tab, and then ends with DEF

Grouping

It is sometimes necessary to group contiguous parts of you pattern. For example, if a quantifier is intended to apply to more than one part of your pattern, you need to group these. This is done using parenthesis.

Example: grouping

Pattern	Explanation
([A-Z][a-z]*)+	Matches camel-cased strings (e.g. ThisIsAVeryLongIdentifier). Note that the * applies just to the character class for lower case letters, but the + applies to the combination of the two grouped character classes.

Metacharacters

As we have seen, some characters have special meaning when constructing patterns. If you want to refer to these characters literally you need to escape them with a backslash. So, if you’d like to match a question mark, you must type \?. Here is the full list of metacharacters

. \ ? * + { } ( ) [ ].

Anonymous Captures

Patterns are useful for identifying relevant content. They can also be used to pick out some of the intersting parts for use. This ability to tease apart content is really very powerful when it comes to creating your rules. In Migrate, these content references can be used in annotation arguments.

Anonymous captures are created by surrounding parts of your patterns in parentheses. You can then reference the matched portions with the backslash notation: \1 for the first capture, \2 for the second, and so on. The captures are counted left-to-right within a pattern, and top-down if you have more than one pattern in your rule. It is the placement of these parentheses in your patterns that permits the matched content to be referenced in this way. This is why they are called captures — you are capturing content.

Remember that parentheses are also used for grouping, as described earlier. This does not cause trouble in practice. Just identify your captures by counting opening parentheses from left to right in your pattern.

Example: anonymous captures

Pattern	Use	Explanation
(.*)\.tiff	set-attribute(src=\1.jpg)	Change the extension of .tiff images .jpg.
Version:\s+(\d+)\.(\d+)	p.map.product-version(version=\1;release=\2)	Extract the major an minor numbers from a version string in order to populate map metadata.
\(\d{3}\)\s*\d{3}-\d{4}	prolog.meta.othermeta((area code)(\1))	Extract the area code of a phone number and place it in prolog metadata. Note that in this example the pattern had to match literal parentheses. These have been escaped by a backslash. When escaped, the parentheses lose their special meaning for grouping and indicating captures.

Named Captures

If you like, you can name your captures in your patterns. Doing so means that you won’t have to worry about counting your groups. This is also more robust because the count can be thrown off if you add a capture to the rule at some later time. A meaningful name also indicates the purpose of the capture more clearly to others working with the rule set.

You name your captures by using curly braces. You reference the capture with a backslash followed by the capture name in braces.

Example: named captures

Pattern	Use
Version:\s+({major}\d+)\.({minor}\d+)	p.map.product-version(version=\{major};release=\{minor})

Character Class Escapes

See the W3C XML Schema Datatypes specification for more details on character class escapes.

The post Patterns appeared first on Stilo.

On

Eric Tubby — Mon, 15 Mar 2021 00:00:00 +0000

The annotation qualifier on is used to indicate which element will receive an annotation when there is more than one possibility.

Overview

The on qualifier may be specified for any annotation. It is used to specify which element is to receive the annotation when there is more than one possibility. Recall that it is the rule’s condition which identifies elements in your content for the purposes of annotation. But, sometimes, there is more than one element identified by the condition. This is when you need to use on.

The uses of the on qualifier are twofold:

in conjunction with the contains element test
to annotation spans created by parsing data content with regular expressions
The type ahead feature of the annotation entry box is sensitive to the content element type. The list of available annotations differs for paragraphs, spans, tables, etc. So, it is a good idea to first set the on attribute if you need it, and then enter the annotation.

Use with contains element test

A contains element test, by its nature, identifies more than one element – an outer element, and one or more nested elements. The on qualifier can be used on an annotation to indicate that it is the inner elements (all the matching ones), rather than the outer element, which should receive the annotation.

Span creation with patterns

Patterns may be used to split up textual content and create span content elements around the various portions that have been teased apart. The purpose of this, of course, is to allow your rule to place annotations on the newly created spans. Use the on qualifier on annotations that are meant to apply to these created spans.

The post On appeared first on Stilo.

Types of Tests Used in Rules Archives - Stilo

Style Test

Prerequisite Concepts

Supporting Reference Material

Related Tasks

Property Test

Prerequisite Concepts

Supporting Reference Material

Related Tasks

Logical Group Test

Prerequisite Concepts

Supporting Reference Material

Related Tasks

Content Test

Prerequisite Concepts

Supporting Reference Material

Related Tasks

Context Test

Prerequisite Concepts

Supporting Reference Material

Related Tasks

Contains Element Test

Prerequisite Concepts

Supporting Reference Material

Related Tasks

Patterns

Purpose

Simple String Pattern

Alternatives in Patterns

Quantifiers

Character Classes

Grouping

Metacharacters

Anonymous Captures

Named Captures

Character Class Escapes

Prerequisite Concepts

Supporting Reference Material

Related Tasks

On

Overview

Use with contains element test

Span creation with patterns

Supporting Reference Material