Extended code reference processing

DITA-OT provides additional code reference processing support beyond that which is mandated by the DITA specification. These extensions can be used to define character encodings or line ranges for use in code blocks.

Character set definition

DITA-OT supports defining the code reference target file encoding using the @format attribute. The supported format is:

format (";" space* "charset=" charset)?

If a character set is not defined, the system default character set will be used. If the character set is not recognized or supported, the DOTJ052E error is thrown and the system default character set is used as a fall-back.

<coderef href="unicode.txt" format="txt; charset=UTF-8"/>

Line range extraction

Code references can be limited to extract only a specified line range by defining the line-range pointer in the URI fragment. The format is:

uri ("#line-range(" start ("," end)? ")" )?

Start and end line numbers start from 1 and are inclusive. If the end range is omitted, the range ends on the last line of the file.

<coderef href="Parser.scala#line-range(5, 10)" format="scala"/>

Only lines from 5 to 10 will be included in the output.

RFC 5147

DITA-OT also supports the line position and range syntax from RFC 5147. The format for line range is:

uri ("#line=" start? "," end? )?

Start and end line numbers start from 0 and are inclusive and exclusive, respectively. If the start range is omitted, the range starts from the first line; if the end range is omitted, the range ends on the last line of the file. The format for line position is:

uri ("#line=" position )?

Position line number starts from 0.

<coderef href="Parser.scala#line=4,10" format="scala"/>

Only lines from 5 to 10 will be included in the output.

Line range by content

Instead of specifying line numbers, you can also select lines to include in the code reference by specifying keywords (or “tokens”) that appear in the referenced file.

DITA-OT supports the token pointer in the URI fragment to extract a line range based on the file content. The format for referencing a range of lines by content is:

uri ("#token=" start? ("," end)? )?

Lines identified using start and end tokens are exclusive: the lines that contain the start token and end token will be not be included. If the start token is omitted, the range starts from the first line in the file; if the end token is omitted, the range ends on the last line of the file.

Given a Haskell source file named fact.hs with the following content,

-- START-FACT
fact :: Int -> Int
fact 0 = 1
fact n = n * fact (n-1)
-- END-FACT
main = print $ fact 7

a range of lines can be referenced as:

<coderef href="fact.hs#token=START-FACT,END-FACT"/>

to include the range of lines that follows the START-FACT token, up to (but not including) the line that contains the END-FACT token. The resulting <codeblock> would contain the following lines:

fact :: Int -> Int
fact 0 = 1
fact n = n * fact (n-1)
Tip: This approach can be used to reference code samples that are frequently edited. In these cases, referencing line ranges by line number can be error-prone, as the target line range for the reference may shift if preceding lines are added or removed. Specifying ranges by line content makes references more robust, as long as the token keywords are preserved when the referenced resource is modified.