Motion Data
The Aim of the project was to incorporate functionality to view and edit motion files. So here is a brief description of the files used in Motion Capture Systems.
Before the process of parsing and interpreting motion capture files can begin certain tools need to be in place. If you spend a little time creating these tools you will find them extremely useful in all your file IO work. Complete code examples are not included here but from these descriptions it should be a simple matter to construct them.
Most motion capture files are ASCII (the only one that isn't is the tracking format C3D) so the creation of a good line parsing routine should be your first project. I have a function called parse_string() which takes the string as input and returns an argv, argc list just like you get from the main()function. This routine separates tokens on a line much like is done in a command shell where the characters "{}[](),;" are always returned as a separate token and the rest is just space delimited. I've never needed any more than this to parse a motion capture file.
There are a number of references you can use to help you create your math routines that you will need to effectively handle the motion capture data. At various times you will need to represent rotation data as a matrix, quaternion or as Euler's angles. So you will need to create conversion routines between these different representations. Also, you will need to construct transformation matrices of translation, rotation and scale values; to multiply matrices together, to take the inverse of a matrix, to multiply a vector by a matrix and to decompose a matrix into its constituent parts of translation, rotation and scale. Graphics Gems II, Ken Shoemake's paper on quaternions, and most any book on graphics should provide enough source material for creating these routines.
The nomenclature I use when writing math expressions is left to right. For example:
v' = v M
Here the vector, v, is on the left of the matrix, M , which transforms it. Saying this now avoids confusion, particularly when discussing Euler's Angles. When you encounter rotation data expressed as Euler's angles you must know the order in which the rotations are applied. This is often simply stated as "XYZ" order or "YZX" order, however this isn't enough. You must also know if the vector is on the left or the right of the transformation matrix. If X,Y and Z represent rotation matrices with just a single rotation about the given axis then the composite matrix M might be composed as follows:
v M = v XYZ
or using a right to left notation you might have
M v = XYZ v
These two equations give entirely different results but can both be called "XYZ" rotation order.
Keep in mind that for the Euler's angles representation many different rotation values can give the same result, 0 degrees and 360 degrees gives you the same rotation. The same is true for quaternions, a quaterion and it's negative (where each component is negated) give you the same rotation. Only with the matrix representation do you get a one-to-one mapping of rotation to representation.
All elements of motion capture file have to be referenced in some fashion. There are usually names given to each element but that isn't always the case. Sometimes the elements are merely numbered. Use the names whenever possible to store and reference your data. Often the order of appearance of names in a header section will be different from the order in the motion section of a file, do not assume that they will be listed the same way. If you allow for this you will find you have much less problems when substituting the motion data from one file over the motion data from another file but you want the same skeleton for each.
Sample rate information is given as either the number of samples per second or as the length of time between each sample. Formats which are used by optical motion capture systems always have a constant rate of sampling for their data, generally 30 or 60 samples a second. Some optical systems can achieve even higher rates of sampling. You can assume that the rate of sampling in these cases doesn't vary. This is not true for magnetic motion capture systems. The timing of data can, and usually does, change throughout the data file if you are handling the raw data as it has been acquired from the motion capture system. For older magnetic systems it's even possible for each sensor to provide different rates of data from the other sensors in the same motion capture session.
Some of the file formats will provide units information to indicate how the values should be interpreted. If you like, you can ignore the units for translation data since the difference between one set of units and another is simply a matter of scale. However, you cannot ignore rotation units because you must know the unit type before applying your rotation calculations. If the units are not given for rotation you can be comfortably certain that the rotations are expressed in degrees.
Character Skeleton Definitions
The skeleton of a character is typically defined as a hierarchy of segments. Each segment has a parent segment (except for segments known as "roots") and possibly one or more children segments. The motion of a child segment depends on the motion of its parent segment, the transformation data of the child relative to its parent is known as the local transform. The global transform of the child is the motion of the child relative to the global coordinate system. The motion for a segment consists of translation, rotation and scale values. A local transformation matrix for a segment is created by first constructing a separate matrix for the translation, rotation and scale values. These matrices are then composted to give you the full child transformation. For example:
M = SRT
It's possible to vary the composition order of the component matrices but that is almost never done with motion capture data. By always using the SRT composition order it makes it possible to construct the composite transform by first creating the rotation matrix (which fills the upper left 3x3 of the matrix). Then poking the translation values into bottom portion of the matrix (in positions [3][0], [3][1] and [3][2] of the matrix), and then multiplying the top three rows of the matrix by the scale values (top row is scaled by X, second by Y and the third row by Z).
The global position of segment is determined by composing the child transform with its parent's transform, its parent's parent's transform, and so on. Like so:
M global = M local M parent M grandparent …
Conversely, if you know the global transform of an object and you know the parent's global transform you can construct the local transform by using the inverse of the parent's global matrix::
M local = M global [ M parent ] -1
This is not something you'll do when importing a motion capture file but it's a calculation you will need when exporting motion capture data.
In addition to the transformation information for a segment there is often information about how the segment is to be displayed or drawn. This usually takes the form of a segment axis (or direction) and a length value. Nearly always this information will coincide with the translation offset information of a child of that segment. For example, the axis of the forearm segment will go from the elbow (the 0,0,0 point of the forearm segment) to the wrist (for this example let's say that is at location 0,10,0 relative to the elbow). So in this case the axis of the forearm will be the positive Y axis and he length of the forearm is 10. In any case, the information is often redundant but it's important to keep it separate because there are places where you would want to draw the segment differently than how the actual transformation information is composed. This will become clear after contrasting some of the different file formats. It's also important to note that some formats use scale information to convey length, it's usually a bad idea to do it this way but it does happen so be aware of it.
When a skeleton is defined it is specified as a set of parent-child relationships with the translation offset of each child from its parent being known. The translation offset of a child from its parent almost never changes, however some of the more flexible formats provide for those cases when you do want the translation data to change (usually only slightly). This initial skeleton definition doesn't have any rotation data (that is, the rotation data values are zero) and is often referred to as the "zero pose" or the "basis pose" of the skeleton. The zero pose can be of any configuration you want, some prefer to use the "Da Vinci" pose for a character where the arms and legs are splayed out in an X formation, others use a pose that has the arms straight out to the front (called the "zombie" pose) with the legs straight up and down, sometimes the arms are also straight down or they can be to the side. It really doesn't matter, the resulting motion will be identical, it's only the representation of the data that changes.
The BVH file format was originally developed by Biovision, a motion capture services company, as a way to provide motion capture data to their customers. The name BVH stands for Biovision hierarchical data. This format mostly replaced an earlier format that they developed, the BVA format which is discussed in the next section, as a way to provide skeleton hierarchy information in addition to the motion data. The BVH format is an excellent all around format, its only drawback is the lack of a full definition of the basis pose (this format has only translational offsets of children segments from their parent, no rotational offset is defined), it also lacks explicit information for how to draw the segments but that has no bearing on the definition of the motion.
A BVH file has two parts, a header section which describes the hierarchy and initial pose of the skeleton; and a data section which contains the motion data. Examine the example BVH file called " Example1.bvh ". The start of the header section begins with the keyword "HIERARCHY" . The following line starts with the keyword "ROOT" followed by the name of the root segment of the hierarchy to be defined. After this hierarchy is described it is permissable to define another hierarchy, this too would be denoted by the keyword "ROOT" . In principle, a BVH file many contain any number of skeleton hierarchies. In practice the number of segments is limited by the format of the motion section, one sample in time for all segments is on one line of data and this will cause problems for readers which assume a limit to the size of a line in a file.
The BVH format now becomes a recursive definition. Each segment of the hierarchy contains some data relevant to just that segment then it recursively defines its children. The line following the ROOT keyword contains a single left curly brace '{', the brace is lined up with the "ROOT" keyword. The line following a curly brace is indented by one tab character, these indentations are mostly to just make the file more human readable but there are some BVH file parsers that expect the tabs so if you create a BVH file be sure to make them tabs and not merely spaces. The first piece of information of a segment is the offset of that segment from its parent, or in the case of the root object the offset will generally be zero. The offset is specified by the keyword "OFFSET" followed by the X,Y and Z offset of the segment from its parent. The offset information also indicates the length and direction used for drawing the parent segment. In the BVH format there isn't any explicit information about how a segment should be drawn. This is usually inferred from the offset of the first child defined for the parent. Typically, only the root and the upper body segments will have multiple children.
The line following the offset contains the channel header information. This has the "CHANNELS" keyword followed by a number indicating the number of channels and then a list of that many labels indicating the type of each channel. The BVH file reader must keep track of the channel count and the types of channels encountered as the hierarchy information is parsed. Later, when the motion information is parsed, this ordering will be needed to parse each line of motion data. This format appears to have the flexibility to allow for segments which have any number of channels which can appear in any order. If you write your parser to handle this then so much the better, however, I have never encountered a BVH file that didn't have 6 channels for the root object and 3 channels for every other object in the hierarchy.
You can see that the order of the rotation channels appears a bit odd, it goes Z rotation, followed by the X rotation and finally the Y rotation. This is not a mistake, the BVH format uses a somewhat unusual rotation order. Place the data elements into your data structure in this order.
On the line of data following the channels specification there can be one of two keywords, either you will find the "JOINT" keyword or you will see the "End Site" keyword. A joint definition is identical to the root definition except for the number of channels. This is where the recursion takes place, the rest of the parsing of the joint information proceeds just like a root. The end site information ends the recursion and indicates that the current segment is an end effector (has no children). The end site definition provides one more bit of information, it gives the length of the preceding segment just like the offset of a child defines the length and direction of its parents segment.
The end of any joint, end site or root definition is denoted by a right curly brace '}'. This curly brace is lined up with its corresponding right curly brace.
One last note about the BVH hierarchy, the world space is defined as a right handed coordinate system with the Y axis as the world up vector. Thus you will typically find that BVH skeletal segments are aligned along the Y or negative Y axis (since the characters are often have a zero pose where the character stands straight up with the arms straight down to the side).
The motion section begins with the keyword "MOTION" on a line by itself. This line is followed by a line indicating the number of frames, this line uses the "Frames:" keyword (the colon is part of the keyword) and a number indicating the number of frames, or motion samples that are in the file. On the line after the frames definition is the "Frame Time:" definition, this indicates the sampling rate of the data. In the example BVH file the sample rate is given as 0.033333, this is 30 frames a second the usual rate of sampling in a BVH file.
The rest of the file contains the actual motion data. Each line is one sample of motion data. The numbers appear in the order of the channel specifications as the skeleton hierarchy was parsed.
Interpreting The Data
To calculate the position of a segment you first create a transformation matrix from the local translation and rotation information for that segment. For any joint segment the translation information will simply be the offset as defined in the hierarchy section. The rotation data comes from the motion section. For the root object, the translation data will be the sum of the offset data and the translation data from the motion section. The BVH format doesn't account for scales so it isn't necessary to worry about including a scale factor calculation.
A straightforward way to create the rotation matrix is to create 3 separate rotation matrices, one for each axis of rotation. Then concatenate the matrices from left to right Y, X and Z.
v R = v YXZ
An alternative method is to compute the rotation matrix directly. A method for doing this is described in Graphics Gems II, p 322.
Adding the offset information is simple, just poke the X,Y and Z translation data into into the proper locations of the matrix. Once the local transformation is created then concatenate it with the local transformation of its parent, then its grand parent, and so on.
v M = v M child M parent M grandparent…
Acclaim ASF/AMC
Acclaim is a game company which has been doing research into motion capture for games for many years. They developed their own methods for creating skeleton motion from optical tracker data and subsequently devised a file format, actually two files, for storing the skeleton data. Later they put the format description in the public domain for anyone to use. Oxford Metrics, makers of the Vicon motion capture system, elected to use the Acclaim format as the output format of their software.
The Acclaim format is made up of two files, a skeleton file and a motion file. This was done knowing that most of the time a single skeleton works for many different motions and rather than storing the same skeleton in each of the motion files it should be stored just once in another file. The skeleton file is the ASF file ( A cclaim S keleton F ile). The motion file is the AMC file ( A cclaim M otion C apture data).
Parsing the ASF file
In the ASF file a base pose is defined for the skeleton that is the starting point for the motion data. Each segment has information regarding the way the segment is to be drawn as well as information that can be used for physical dynamics programs, inverse kinematic programs or skinning programs. One of the peculiar features of the ASF file is the requirement that there be no gaps in the skeleton. No child can have a non-zero offset from the end of the parent segment. This has the effect of creating more skeletal segments than are usually found in other file formats. A limitation of the ASF definition is that only one root exist in the scene, this doesn't prevent a file from cleverly containing two skeletons attached to the root but it does make such a construction clumsy.
Examine the example file " Walk.asf ". In there you will see that keywords in the file all start with a colon ":". Keywords will either set global values or they will indicate the beginning of a section of data.
The ":version" keyword indicates the version of the skeleton definition. This document is for version 1.10.
The ":name" keyword allows the skeleton to be named something other than the file name.
The ":units" keyword denotes a section that defines the units to be used for various types of data. It also defines default values for other parameter. Any number of specifications may be found here, the use of these values are often program specific. Ideally you should store these values and then write them out again if you make modifications to a file. If you intend to just read the motion data in then you can ignore those values that don't interest you. In this section you will find the units used for angles and sometimes the default values for mass and length of segments.
The ":documentation" section allows for the storage of documentation information that will persist from one file creation to the next. Simple comment information in the file is not guaranteed to be retained if the file is read into memory than saved to another file, possibly with modifications.
The ":root" section defines a special segment of the scene (well, it's special to the way the file format is defined, you can really treat this just like any other segment in all other ways). This is the root segment of the skeleton hierarchy. It is much like all the other segments but doesn't contain direction and length information. The "axis" keyword in the root section defines the rotation order of the root object. The "order" keyword specifies the channels of motion that are applied to the root and in what order they will appear in the AMC file. The "position" and "orientation" keywords are each followed by a triplet of numbers indicating the starting position and orientation of the root. These are typically, but not always, zero.
The ":bonedata" section contains a description of each of the segments in the hierarchy. These descriptions are for just the segments. The hierarchy section, which comes next, will describe the parenting organization of the segments. The segment definition is bracketed by a "begin" and "end" pair of keywords (note the lack of a colon in each keyword). Within the segment definition you will find:
- "id" This is a number which provides a unique id for the segment. This really isn't necessary since each segment is also named and the name is used for both the hierarchy section and in the AMC file.
- "name" This gives the name of the segment. Each segment must have a unique name. Often you will see segments with similar names such as "hips" and "hips1". The segments that have numbers at the end will usually be the children of the segment with the same name but no number. The segments with the numbers are usually there simply to fill the gap between the parent segment and a child segment. Often the segments with a number at the end will not have any motion capture data. You don't treat these in any special way, this is just noted as an interesting feature.
- "direction" This is the direction of the segment. This defines how the segment should be drawn and it also defines the direction from the parent to the child segment(s). The direction and length of a segment determine the offset of the child from the parent, if there is a child of the segment.
- "length" The length of the segment. With the direction value gives the information needed for drawing the segment.
- "axis" This gives an axis of rotation for the segment. By specifying this as a separate value the motion data can be independent of the drawing and hierarchy information. This is particularly useful for those applications which might provide motion editing tools that are sensitive to gimbal lock.
- "dof" This stands for "Degrees of Freedom" and specifies the number of motion channels and in what order they appear in the AMC file. If the dof keyword doesn't appear then the segment doesn't get any motion data. No translation channels will ever appear here only rotation channels and sometimes a length channel.
- "limits" This specification provides for putting limits on each of the channels in the dof specification. For each channel that appears there will be a pair of numbers inside parenthesis indicating the mininum and maximum allowed value for that channel. This information is not used for interpreting the motion data, this is useful only for those applications which might apply motion editing functions that put limits on rotation. This also does not say that the data in the AMC file might be limited by the given numbers.
Parsing the AMC file
The AMC file contains the motion data for a skeleton defined by an ASF file. The motion data is given a sample at a time. Each sample consists of a number of lines, a segment per line, containing the data. The start of a sample is denoted by the sample number alone on a line. For each segment the segment name appears followed by the numbers in the order specified by the dof keyword in the ASF file.
Interpreting the data
For each segment it is useful to precalculate some the transformation matrices that will be used to construct a global transform for a segment. First create a matrix C from the axis using the axis order to determine the order the rotation values are composed. In the ASF file the order is given left to right so that an order of "XYZ" is:
v M = v XYZ
Do this same calculation for the root but use the orientation value with the axis order for the root. After calculating C take the inverse of C, call it Cinv , and save it.
Next create a matrix B and from the translation offset from the segments parent. The translation offset is the direction and length of the parent segment. For the root use the position value. This concludes the pre calculation step.
When constructing the transformation matrix of motion for a segment first create a matrix, M, of the motion data. When creating M construct a separate matrix for each dof specification and multiply them together left to right. Compose the local transform, L , by multiplying M on the left by Cinv and on the right by C then B :
L = CinvMCB
Like with other formats create the full transform by traversing the hierarchy and multiplying on the right by each parent in the skeleton.
The RenderMan Interface Bytestream (RIB) is a complete specification of the required interface between modelers and renderers. In a distributed modeling and rendering environment RIB serves well as a rendering file format. As RIB files are passed from one site to another, utilities for shader management, scene editing, and rendering job dispatching (referred to hereafter as Render Managers) can benefit from additional information not strictly required by rendering programs. Additional information relating to User Entities, resource requirements and accounting can be embedded in the RIB file by a modeler through the "proper" use of RIB in conjunction with some simple file structuring conventions.
RenderMan RIB-Structure
This entry should be the first line in a conforming RIB file. Its inclusion indicates full conformance to these specifications. The addition of the special keyword, Entity , specifies that the file conforms to the User Entity.
##Scene name
This entry allows a scene name to be associated with the RIB file.
##Creator name
Indicates the file creator (usually the name of the modeling or animation software).
##CreationDate time
Indicates the time that the file was created. It is expressed as a string of characters and may be in any format.
##For name
Indicates the user name or user identifier (network address) of the individual for whom the frames are intended.
##Frames number
Indicates the number of frames present in the file.
##Shaders shader1, shader2, ...
Indicates the names of nonstandard shaders required. When placed in the header of a RIB file, any nonstandard shaders that appear in the entire file should be listed. When placed within a frame block, any nonstandard shaders that appear in that frame must be listed.
##Textures texture1, texture2 , ...
Lists any preexisting textures required in the file. When placed in the header of a RIB file, any preexisting textures that appear anywhere in the file should be listed. When placed within a frame block, any preexisting shaders that appear in that frame must be listed.
##CapabilitiesNeeded feature1, feature2 , ...
Indicates any RenderMan Interface optional capabilites required in the file (when located in the header) or required in the frame (when located at the top of a frame block). The optional capabilities are:
Area Light Sources |
Motion Blur |
Special Camera Projections |
Bump Mapping |
Programmable Shading |
Spectral Colors |
Deformations |
Radiosity |
Texture Mapping |
Displacements |
Ray Tracing |
Trim Curves |
Environment Mapping |
Shadow Depth Mappin |
Volume Shading |
Level Of Detail |
Solid Modeling |
|
Frame information
Frame-local information must be located directly after a FrameBegin RIB request and be contiguous. These comments should provide frame-local information that contains administrative and resource hints.
##CameraOrientation eyex eyey eyez atx aty atz [ upx upy upz ]
Indicates the location and orientation of the camera for the current frame in World Space coordinates. The up vector is optional and the default value is [0 1 0].
##Shaders shader1, shader2 , ...
Lists the nonstandard shaders required in the current frame.
##Textures texture1, texture2, ...
Lists the nonstandard textures required in the current frame.
##CapabilitiesNeeded fea ture1, feature2 , ...
Lists the special capabilities required in the current frame from among those listed under Header Information .
Body Information
Body information may be located anywhere in the RIB file.
##Include filename
This entry allows the specification of a file name for inclusion in the RIB stream. Note that the Include keyword itself does not cause the inclusion of the specified file. As with all structural hints, the Include keyword serves only as a special hint for render management systems. As such, the Include keyword should only be used if render management facilities are known to exist.