XML Format for Paint-by-Number Puzzles

Version 0.3

Jan Wolter

Introduction

"Paint-By-Number" is one of many names for a type of graphical logic puzzle originating in Japan. Other common names for these puzzles are Nonograms, Griddlers, Hanjie, and Picross. The puzzles consist of a blank grid with numbers along the top and one side. The numbers tell the sizes of the blocks of the foreground color in that row and column. To solve the puzzle, you must reconstruct the image by figuring out how the color blocks must be place in each row and column.

One place to find examples of these puzzles is at my web site, webpbn.com. That site offers offers a tool that allows one to export any puzzle in any of several formats. Many other sites provide puzzles to solve on-line (e.g., griddlers.net) and various books and magazines publish these puzzles regularly. Quite a few people have also written programs that allow puzzles to be solved on a home computer or a PDA, and various other people have written programs to solve puzzles automatically. Steve Simpson's Nonogram site includes a good collection of links to paint-by-number resources.

In surveying these, I noticed a lack of a good, standardized file format for paint-by-number puzzles. Every program seems to invent it's own file format. So I decided to come up with an XML file format that would be rich enough to handle most puzzle-storing applications.

I offer this with some trepedation, as XML formats are typically the subject of endless bickering, and I have very little experience with designing them, but I think this is OK and I presume future versions will be better.

The present format has the following charms to recommend it:

Obviously not every program that reads or writes these file formats needs to support all features. None of my software, for example, supports triddlers.

Another XML format for paint-by-number puzzles has been described by Steve Simpson. He designed his own "so that a future solver will be able to deal with multicoloured and non-rectangular puzzles," which is a bit perplexing, since my format has always supported both those variations. However, his format is fine and if there weren't multiple incompatible XML formats for the same thing, then it wouldn't be XML.

Example

Below is a sample XML paint-by-number puzzle description file. This is a simple two-color puzzle. These files can contain multiple puzzles, but this example contains just one. This example includes both the clues for the puzzle and the intended solution. It would be possible to omit either the clues (the two <clue> tags and all their content) or the solution (the <solution> tag and all it's content).

    <?xml version="1.0"?>
    <!DOCTYPE pbn SYSTEM "https://webpbn.com/pbn-0.3.dtd">

    <puzzleset>

    <puzzle type="grid" defaultcolor="black">

    <source>webpbn.com</source>
    <id>#1</id>
    <title>Sample Puzzle</title>
    <author>Jan Wolter</author>
    <authorid>jan</authorid>
    <copyright>&copy; 2004 by Jan Wolter</copyright>
    <description>
    A dancing stick figure man.
    </description>

    <color name="white" char=".">fff</color>
    <color name="black" char="X">000</color>

    <clues type="columns">
    <line><count>2</count><count>1</count></line>
    <line><count>2</count><count>1</count><count>3</count></line>
    <line><count>7</count></line>
    <line><count>1</count><count>3</count></line>
    <line><count>2</count><count>1</count></line>
    </clues>

    <clues type="rows">
    <line><count>2</count></line>
    <line><count>2</count><count>1</count></line>
    <line><count>1</count><count>1</count></line>
    <line><count>3</count></line>
    <line><count>1</count><count>1</count></line>
    <line><count>1</count><count>1</count></line>
    <line><count>2</count></line>
    <line><count>1</count><count>1</count></line>
    <line><count>1</count><count>2</count></line>
    <line><count>2</count></line>
    </clue>

    <solution type="goal">
    <image>
    |.XX..|
    |.XX.X|
    |..X.X|
    |.XXX.|
    |X.X..|
    |X.X..|
    |..XX.|
    |.X.X.|
    |.X.XX|
    |XX...|
    </image>
    </solution>

    </puzzle>
    </puzzleset>

Details

A DTD file for this format is available from https://webpbn.com/pbn-0.3.dtd.

The character entities that may appear in the file are are identical to those that may be used in HTML files.

In our experence, character entities in XML files are a pain. I recommend not using them if possible.

The following tags appear in the document:

<puzzleset>
The root tag for the document. Should be included even if there is only one puzzle.

<puzzle>
One puzzle in the set of puzzles. Appears inside <puzzleset>. Attributes are:
type=
puzzle type. Defaults to "grid" if omitted. Current legal values are:
"grid"
Cells are square and there will be a set of row clues and a set of column clues.

"triddler"
Cells are triangular, the puzzle is a big hexagon, with clues along six sides.
Other values may be defined in the future.

defaultcolor=
The name of the color to use for any <count> tag that does not include a color= attribute. If this is not defined, the default default color is "black".

backgroundcolor=
The name of the background color. This defaults to "white".

<source>
Where does the puzzle come from? This string should be different for each possible publisher of puzzles. Appears inside <puzzle> and/or <puzzleset>. If placed in the <puzzleset> this is the default source for all puzzles that don't define a source. May be omitted.

<id>
An identifier for the puzzle that is unique within the source. Syntax depends on the source. Can only appear in a <puzzle> tag, not in a <puzzleset> tag. May be omitted.

<title>
In a <puzzle>, this is the title of the puzzle. In a <puzzleset>, this is the title of the collection of puzzles. May be omitted in either place.

<author>
In a <puzzle>, this is the name of the author of the puzzle. If placed in a <puzzleset> tag, this is the default author name for all puzzles that don't define an author name. May be omitted.

<authorid>
In a <puzzle>, this is an identifier for the author of the puzzle which is unique within the source. Syntax depends on the source. If placed in a <puzzleset> tag, this is the default author id for all puzzles that don't define an author id. May be omitted.

<copyright>
In a <puzzle>, this is a copyright message for the puzzle. If placed in the <puzzleset> tag, this is the default copyright message for all puzzles that don't define a copyright message. May be omitted.

<description>
A description of the puzzle. The sort of thing you might want to display to the solver after they have solved the puzzle. Can only appear in a <puzzle> tag. May be omitted.

<color>
Defines a color name used in this puzzle. Must be in a <puzzle> tag. There may be multiple color definitions for a puzzle. Each color definition must have a name= attribute.

name=
The name of the color. This can be any text string.

char=
A one-character representation for the color. This is used to represent the color in solutions. May be omitted, especially if there are no solutions. White space characters are not legal. Neither are the characters '\', '/', '|', '?', '[' or ']'.

The content of the color tag is a color value, typically an RGB color code. This is usually a 3 or 6-digit hexadecimal number, like "3cc" or "210fbe". A three digit string like "123" is equavalent to the six digit string "112233".

For puzzles with triangles, the value can contain two hexadecimal color codes, separated by a "/" or "\", like "000/fff" or "ffffff\000000".

Two color names are predefined. (Because of this, the color declarations in the sample are redundant and could be omitted.)

NameValueChar
black000X
whitefff.
Any other color used in the puzzle must be defined by a <color> tag.

<clues>
A set of clues used in the puzzle. Must be in a <puzzle> tag. The "type=" attribute is required. For a puzzle of type "grid" we expect to see a set of clues with type "columns" and a set of clues of type "rows".

For a puzzle of type "triddler" we expect to see six sets of clues, with types "top", "topright", "bottomright", "bottom", "bottomleft" and "topleft". This labeling assumes that the puzzle is oriented so that there are horizontal lines separating cells, but not vertical lines, and that the horizontal clues are at the left of the puzzle, like this:

	        /   / 2 /
	       / 3 / 1 /
         ___  ________  3 /
      1 1 1  /\  /\  /\  /
       ___  /__\/__\/__\
      2 3  /\  /\  /\  /
     ___  /__\/ _\/__\/  \
       1  \  /\  /\  /  2
       ___ \/__\/__\/  \
                      3 \
	     \ 1 \ 2 \
	      \   \ 1 \
     
The "topleft" and "bottomleft" clues are clues for horizontal rows of cells above and below the bend on the left side of the puzzle. The "top" and "topright" clues are for lines in the / direction. The "bottom" and "bottomright" clues are for lines in the \ direction. It is possible for some clue-sets to be empty (if the puzzle has a sharp corner).

The puzzle above would be represented like:

     <clues type="topleft">
     <line><count>1</count><count>1</count><count>1</count></line>
     <line><count>2</count><count>3</count></line>
     </clues>

     <clues type="bottomleft">
     <line><count>1</count></line>
     </clues>

     <clues type="top">
     <line><count>3</count></line>
     <line><count>2</count><count>1</count></line>
     </clues>

     <clues type="topright">
     <line><count>3</count></line>
     </clues>

     <clues type="bottom">
     <line><count>1</count></line>
     <line><count>2</count><count>1</count></line>
     </clues>

     <clues type="bottomright">
     <line><count>3</count></line>
     <line><count>2</count></line>
     </clues> 

<line>
Each clue set contains one or more lines of clues. Line tags must be in a <clues> tag. The order of the lines within the clue is from top to bottom for the horizontal clues, and from left to right for vertical or diagonal clues.

<count>
Each line contains zero more more counts. Count tags must be in a <line> tag. The order of the counts is from left to right for horizontal lines, and from top to bottom for vertical or diagonal lines. The content of a count tag is normally a positive integer, which gives the length of a block. Zero values are used to indicate blotted clues. Counts have an optional attribute called color:
color=
The color for the clue. This can be any color name defined by a <color> tag. If the color= attribute is omitted, then it defaults to the value set by the defaultcolor= attribute on the <puzzle> tag.

<solution>
A <puzzle> tag file may contain solutions. Each solution can have a type= attribute which can take one of the following values:

type=goal
This is the goal solution intended by the designer of the puzzle. If a solver reaches this solution, then the puzzle is considered "solved". Well designed puzzles typically have only one goal solution, but the file format allows for multiple goal solutions, in which case reaching any goal would constitute "success". This is the default if no type attribute it given.

type=solution
This is a solution, but not necessarily a goal. If a solving program found 16 solutions to a puzzle, it might write them each out with type="solution".

type=saved
This is a saved soluiton to the puzzle. It is not necessarily correct or complete. If you had partially solved the puzzle, and wanted to save it for further work later, then that would be written with type="saved".

Solutions may also have an "id=" attribute, which can be used to label particular solutions. They can contain <image> and <note> tags.

<image>
Each <solution> tag contains exactly one <image> tag. The <image> tag gives that solution as a string of characters. All white space characters (space, tab, newline, carriage return, line feed) are ignored. All other characters must be either:

The latter two forms are acceptable only in type="saved" puzzles, not in type="goal" or type="solution" puzzles.

Probably to be really XMLy, there should be some fancy substructure to this tag, but I felt it was simpler just to have it contain an image of the puzzle.

For grid type puzzles, the solution is given row-by-row. Each row starts and ends with a | character. There may be line-feeds separating the rows, but there need not be. The solution in the sample above could equally well be given as:

	<solution type="goal">
	<image>
	|.XX..||.XX.X||..X.X||.XXX.||X.X..||X.X..||..XX.||.X.X.||.X.XX||XX...|
	</image>
	</solution>
     
For triddlers the solution is also stored row-by-row, but the line starting and line ending characters are / or \ depending on the slope of the edge of the puzzle. Basically if the puzzle looks like this:
	      ________
	     /\ B/\D /\
	    /A_\/C_\/E_\
	   /\G /\I /\K /
	  /F_\/H_\/J_\/
	  \L /\ N/\ P/
	   \/_M\/_O\/ 
then we save it like this (except, of course, that the letters are replaced by whatever symbol indicates the color for that cell):
         /ABCDE\
        /FGHIJK/
        \LMNOP/ 

<note>
Notes are always optional. They can appear in <puzzleset>, <puzzle> or <solution> tags.

Release History

Version 0.1 - Jul 26, 2007
Original release.

Version 0.2 - Jan 14, 2009
Added <authorid> tag.

Version 0.3 - Jan 20, 2009
Generalized use of <note> tag.