TaskChampion

TaskChampion is a personal task-tracking tool. It works from the command line, with simple commands like ta add "fix the kitchen sink". It can synchronize tasks on multiple devices, and does so in an "offline" mode so you can update your tasks even when you can't reach the server. If you've heard of TaskWarrior, this tool is very similar, but with some different design choices and greater reliability.

Getting Started

NOTE: TaskChampion is still in development and not yet feature-complete. This section is limited to completed functionality.

Once you've installed TaskChampion, your interface will be via the ta command. Start by adding a task:

$ ta add learn how to use taskchampion
added task ba57deaf-f97b-4e9c-b9ab-04bc1ecb22b8

You can see all of your pending tasks with ta next, or just ta for short:

$ ta
 Id Description                    Active  Tags
 1  learn how to use taskchampion

Tell TaskChampion you're working on the task, using the shorthand id:

$ ta start 1

and when you're done with the task, mark it as complete:

$ ta done 1

Synchronizing

Even if you don't have a server, it's a good idea to sync your task database periodically. This acts as a backup and also enables some internal house-cleaning.

$ ta sync

Typically sync is run from a crontab, on whatever schedule fits your needs.

To synchronize multiple replicas of your tasks, you will need a sync server and a client key for that server. Configure these in ~/.config/taskchampion.yml, for example:

server_client_key: "f8d4d09d-f6c7-4dd2-ab50-634ed20a3ff2"
server_origin: "https://taskchampion.example.com"

The next run of ta sync will upload your task history to that server. Configuring another device identically and running ta sync will download that task history, and continue to stay in sync with subsequent runs of the command.

See Usage for more detailed information on using TaskChampion.

Installation

As this is currently in development, installation is by cloning the repository and running "cargo build".

Using the Task Command

The main interface to your tasks is the ta command, which supports various subcommands such as add, modify, start, and done. Customizable reports are also available as subcommands, such as next. The command reads a configuration file for its settings, including where to find the task database. And the sync subcommand synchronizes tasks with a sync server.

NOTE: the task interface does not precisely match that of TaskWarrior.

Subcommands

The sections below describe each subcommand of the ta command. The syntax of [filter] is defined in filters, and that of [modification] in modifications. You can also find a summary of all subcommands, as well as filters, built-in reports, and so on, with ta help.

ta version - Show the TaskChampion version

ta version

Show the version of the TaskChampion binary

ta config set - Set a configuration value

ta config set <key> <value>

Update Taskchampion configuration file to set key = value

ta add - Add a new task

ta add [modification]

Add a new, pending task to the list of tasks. The modification must include a description.

ta modify - Modify tasks

ta <filter> modify [modification]

Modify all tasks matching the required filter.

ta prepend - Prepend task description

ta <filter> prepend [modification]

Modify all tasks matching the required filter by inserting the given description before each task's description.

ta append - Append task description

ta <filter> append [modification]

Modify all tasks matching the required filter by adding the given description to the end of each task's description.

ta start - Start tasks

ta <filter> start [modification]

Start all tasks matching the required filter, additionally applying any given modifications.

ta stop - Stop tasks

ta <filter> stop [modification]

Stop all tasks matching the required filter, additionally applying any given modifications.

ta done - Mark tasks as completed

ta <filter> done [modification]

Mark all tasks matching the required filter as completed, additionally applying any given modifications.

ta delete - Mark tasks as deleted

ta <filter> delete [modification]

Mark all tasks matching the required filter as deleted, additionally applying any given modifications. Deleted tasks remain until they are expired in a 'ta gc' operation at least six months after their last modification.

ta annotate - Annotate a task

ta <filter> annotate [modification]

Add an annotation to all tasks matching the required filter.

ta info - Show tasks

ta [filter] info

Show information about all tasks matching the fiter.

ta debug - Show task debug details

ta [filter] debug

Show all key/value properties of the tasks matching the fiter.

ta gc - Perform 'garbage collection'

ta gc

Perform 'garbage collection'. This refreshes the list of pending tasks and their short id's.

ta sync - Synchronize this replica

ta sync

Synchronize this replica locally or against a remote server, as configured. Synchronization is a critical part of maintaining the task database, and should be done regularly, even if only locally. It is typically run in a crontask.

ta import-tw - Import tasks from TaskWarrior export

ta import-tw

Import tasks into this replica. The tasks must be provided in the TaskWarrior JSON format on stdin. If tasks in the import already exist, they are 'merged'. Because TaskChampion lacks the information about the types of UDAs that is stored in the TaskWarrior configuration, UDA values are imported as simple strings, in the format they appear in the JSON export. This may cause undesirable results.

ta import-tdb2 - Import tasks from the TaskWarrior data directory

ta import-tdb2 <directory>

Import tasks into this replica from a TaskWarrior data directory. If tasks in the import already exist, they are 'merged'. This mode of import supports UDAs better than the import subcommand, but requires access to the "raw" TaskWarrior data. This command supports task directories written by TaskWarrior- 2.6.1 or later.

ta undo - Undo the latest change made on this replica

ta undo

Undo the latest change made on this replica. Changes cannot be undone once they have been synchronized.

ta report - Show a report

ta [filter] [report-name] *or* [report-name] [filter]

Show the named report, including only tasks matching the filter

ta next - Show the 'next' report

ta [filter]

Show the report named 'next', including only tasks matching the filter

Reports

As a to-do list manager, listing tasks is an important TaskChampion feature. Reports are tabular displays of tasks, and allow very flexible filtering, sorting, and customization of columns.

TaskChampion includes several "built-in" reports, as well as supporting custom reports in the configuration file.

Built-In Reports

The next report is the default, and lists all pending tasks:

$ ta
Id Description              Active Tags              
1  learn about TaskChampion        +next
2  buy wedding gift         *      +buy
3  plant tomatoes                  +garden

The Id column contains short numeric IDs that are assigned to pending tasks. These IDs are easy to type, such as to mark task 2 done (ta 2 done).

The list report lists all tasks, with a similar set of columns.

Custom Reports

Custom reports are defined in the configuration file's reports table. This is a mapping from each report's name to its definition. Each definition has the following properties:

  • filter - criteria for the tasks to include in the report (optional)
  • sort - how to order the tasks (optional)
  • columns - the columns of information to display for each task

For example:

[reports.garden]
sort = [
    { sort_by = "description" }
]
filter = [
    "status:pending",
    "+garden"
]
columns = [
    { label = "ID", property = "id" },
    { label = "Description", property = "description" },
]

The filter property is a list of filters. It will be merged with any filters provided on the command line when the report is invoked.

The sort order is defined by an array of tables containing a sort_by property and an optional ascending property. Tasks are compared by the first criterion, and if that is equal by the second, and so on. If ascending is given, it can be true for the default sort order, or false for the reverse.

In most cases tasks are just sorted by one criterion, but a more advanced example might look like:

[reports.garden]
sort = [
    { sort_by = "description" }
    { sort_by = "uuid", ascending = false }
]
...

The available values of sort_by are

  • id

    Sort by the task's shorthand ID

  • uuid

    Sort by the task's full UUID

  • wait

    Sort by the task's wait date, with non-waiting tasks first

  • description

    Sort by the task's description

Finally, the columns configuration specifies the list of columns to display. Each element has a label and a property, as shown in the example above.

The avaliable properties are:

  • id

    The task's shorthand ID

  • uuid

    The task's full UUID

  • active

    * if the task is active (started)

  • wait

    Wait date of the task

  • description

    The task's description

  • tags

    The task's tags

Tags

Each task has a collection of associated tags. Tags are short words that categorize tasks, typically written with a leading +, such as +next or +jobsearch.

Tags are useful for filtering tasks in reports or on the command line. For example, when it's time to continue the job search, ta +jobsearch will show pending tasks with the jobsearch tag.

Allowed Tags

Specifically, tags must be at least one character long and cannot contain whitespace or any of the characters +-*/(<>^! %=~. The first character cannot be a digit, and : is not allowed after the first character. All-capital tags are reserved for synthetic tags (below) and cannot be added or removed from tasks.

Synthetic Tags

Synthetic tags are present on tasks that meet specific criteria, that are commonly used for filtering. For example, WAITING is set for tasks that are currently waiting. These tags cannot be added or removed from a task, but appear and disappear as the task changes. The following synthetic tags are defined:

  • WAITING - set if the task is waiting (has a wait property with a date in the future)
  • ACTIVE - set if the task is active (has been started and not stopped)
  • PENDING - set if the task is pending (not completed or deleted)
  • COMPLETED - set if the task has been completed
  • DELETED - set if the task has been deleted (but not yet flushed from the task list)

Filters

Filters are used to select specific tasks for reports or to specify tasks to be modified. When more than one filter is given, only tasks which match all of the filters are selected. When no filter is given, the command implicitly selects all tasks.

Filters can have the following forms:

  • TASKID[,TASKID,..] - Specific tasks

    Select only specific tasks. Multiple tasks can be specified either separated by commas or as separate arguments. Each task may be specfied by its working- set index (a small number) or by its UUID. Partial UUIDs, broken on a hyphen, are also supported, such as b5664ef8-423d or b5664ef8.

  • +TAG - Tagged tasks

    Select tasks with the given tag.

  • -TAG - Un-tagged tasks

    Select tasks that do not have the given tag.

  • status:pending, status:completed, status:deleted - Task status

    Select tasks with the given status.

  • all - All tasks

    When specified alone for task-modification commands, all matches all tasks. For example, task all done will mark all tasks as done.

Modifications

Modifications can have the following forms:

Timestamps

Times may be specified in a wide variety of convenient formats.

  • RFC3339 timestamps, such as 2019-10-12 07:20:50.12Z
  • A date of the format YYYY-MM-DD is interpreted as the local midnight at the beginning of the given date. Single-digit month and day are accepted, but the year must contain four digits.
  • now refers to the exact current time
  • yesterday, today, and tomorrow refer to the local midnight at the beginning of the given day
  • Any duration (described below) may be used as a timestamp, and is considered relative to the current time.

Times are stored internally as UTC.

Durations

Durations can be given in a dizzying array of units. Each can be preceded by a whole number or a decimal multiplier, e.g., 3days. The multiplier is optional with the singular forms of the units; for example day is allowed. Some of the units allow an adjectival form, such as daily or annually; this form is more readable in some cases, but otherwise has the same meaning.

  • s, second, or seconds
  • min, mins, minute, or minutes (note that m not allowed, as it might also mean month)
  • h, hour, or hours
  • d, day, or days
  • w, week, or weeks
  • mo, or months (always 30 days, regardless of calendar month)
  • y, year, or years (365 days, regardless of leap days)

ISO 8601 standard durations are also allowed. While the standard does not specify the length of "P1Y" or "P1M", Taskchampion treats those as 365 and 30 days, respectively.

Named Timestamps

Some commonly used named timestamps

  • today Start of today
  • yesterday Start of yesterday
  • tomorrow Start of tomorrow
  • sod Start of today
  • eod End of today
  • sow Start of the next week
  • eow End of the week
  • eoww End of work week
  • soww Start of the next work week

named timestamp

Configuration

The ta command will work out-of-the-box with no configuration file, using default values.

Configuration is read from taskchampion.toml in your config directory. On Linux systems, that directory is ~/.config. On OS X, it's ~/Library/Preferences. On Windows, it's AppData/Roaming in your home directory. This can be overridden by setting TASKCHAMPION_CONFIG to the configuration filename.

The file format is TOML. For example:

data_dir = "/home/myuser/.tasks"

Directories

  • data_dir - path to a directory containing the replica's task data (which will be created if necessary). Default: taskchampion in the local data directory.

Command-Line Preferences

  • modification_count_prompt - when a modification will affect more than this many tasks, the ta command will prompt for confirmation. A value of 0 will disable the prompts entirely. Default: 3.

Sync Server

If using a local server:

  • server_dir - path to a directory containing the local server's data. This is only used if server_origin or server_client_key are not set. Default: taskchampion-sync-server in the local data directory.

If using a remote server:

  • server_origin - Origin of the TaskChampion sync server, e.g., https://taskchampion.example.com. If not set, then sync is done to a local server.
  • encryption_secret - Secret value used to encrypt all data stored on the server. This should be a long random string. If you have openssl installed, a command like openssl rand -hex 35 will generate a suitable value. This value is only used when synchronizing with a remote server -- local servers are unencrypted. Treat this value as a password.
  • server_client_key - Client key to identify this replica to the sync server (a UUID) If not set, then sync is done to a local server.

Snapshots

  • avoid_snapshots - If running on a CPU-, memory-, or bandwidth-constrained device, set this to true. The effect is that this replica will wait longer to produce a snapshot, in the hopes that other replicas will do so first.

Reports

  • reports - a mapping of each report's name to its definition. See Reports for details.

Editing

As a shortcut, the simple, top-level configuration values can be edited from the command line:

ta config set data_dir /home/myuser/.taskchampion

Environment Variables

Configuration

Set TASKCHAMPION_CONFIG to the location of a configuration file in order to override the default location.

Terminal Output

Taskchampion uses termcolor to color its output. This library interprets TERM and NO_COLOR to determine how it should behave, when writing to a tty. Set NO_COLOR to any value to force plain-text output.

Debugging

Both ta and taskchampion-sync-server use env-logger and can be configured to log at various levels with the RUST_LOG environment variable. For example:

$ RUST_LOG=taskchampion=trace ta add foo

The output may provide valuable clues in debugging problems.

Undo

It's easy to make a mistake: mark the wrong task as done, or hit enter before noticing a typo in a tag name. The ta undo command makes it just as easy to fix the mistake, by effectively reversing the most recent change. Multiple invocations of ta undo can be used to undo multiple changes.

The limit of this functionality is that changes which have been synchronized to the server (via ta sync) cannot be undone.

Synchronization

A single TaskChampion task database is known as a "replica". A replica "synchronizes" its local information with other replicas via a sync server. Many replicas can thus share the same task history.

This operation is triggered by running ta sync. Typically this runs frequently in a cron task. Synchronization is quick, especially if no changes have occurred.

Each replica expects to be synchronized frequently, even if no server is involved. Without periodic syncs, the storage space used for the task database will grow quickly, and performance will suffer.

Local Sync

By default, TaskChampion syncs to a "local server", as specified by the server_dir configuration parameter. This defaults to taskchampion-sync-server in your data directory, but can be customized in the configuration file.

Remote Sync

For remote synchronization, you will need a few pieces of information. From the server operator, you will need an origin and a client key. Configure these with

ta config set server_origin "<origin from server operator>"
ta config set server_client_key "<client key from server operator>"

You will need to generate your own encryption secret. This is used to encrypt your task history, so treat it as a password. The following will use the openssl utility to generate a suitable value:

ta config set encryption_secret $(openssl rand -hex 35)

Every replica sharing a task history should have precisely the same configuration for server_origin, server_client_key, and encryption_secret.

Adding a New Replica

Synchronizing a new replica to an existing task history is easy: begin with an empty replica, configured for the remote server, and run ta sync. The replica will download the entire task history.

Upgrading a Locally-Sync'd Replica

It is possible to switch a single replica to a remote server by simply configuring for the remote server and running ta sync. The replica will upload the entire task history to the server. Once this is complete, additional replicas can be configured with the same settings in order to share the task history.

Running the Sync Server

NOTE: TaskChampion is still in development and not yet feature-complete. The server is functional, but lacks any administrative features.

Run taskchampion-sync-server to start the sync server. Use --port to specify the port it should listen on, and --data-dir to specify the directory which it should store its data. It only serves HTTP; the expectation is that a frontend proxy will be used for HTTPS support.

The server has optional parameters --snapshot-days and --snapshot-version, giving the target number of days and versions, respectively, between snapshots of the client state. The default values for these parameters are generally adequate.

Internal Details

The following sections get into the details of how TaskChampion works. None of this information is necessary to use TaskChampion, but might be helpful in understanding its behavior. Developers of TaskChampion and of tools that integrate with TaskChampion should be familiar with this information.

Data Model

A client manages a single offline instance of a single user's task list, called a replica. This section covers the structure of that data. Note that this data model is visible only on the client; the server does not have access to client data.

Replica Storage

Each replica has a storage backend. The interface for this backend is given in crate::taskstorage::Storage and StorageTxn.

The storage is transaction-protected, with the expectation of a serializable isolation level. The storage contains the following information:

  • tasks: a set of tasks, indexed by UUID
  • base_version: the number of the last version sync'd from the server (a single integer)
  • operations: all operations performed since base_version
  • working_set: a mapping from integer -> UUID, used to keep stable small-integer indexes into the tasks for users' convenience. This data is not synchronized with the server and does not affect any consistency guarantees.

Tasks

The tasks are stored as an un-ordered collection, keyed by task UUID. Each task in the database has represented by a key-value map. See Tasks for details on the content of that map.

Operations

Every change to the task database is captured as an operation. In other words, operations act as deltas between database states. Operations are crucial to synchronization of replicas, described in Synchronization Model.

Operations are entirely managed by the replica, and some combinations of operations are described as "invalid" here. A replica must not create invalid operations, but should be resilient to receiving invalid operations during a synchronization operation.

Each operation has one of the forms

  • Create(uuid)
  • Delete(uuid, oldTask)
  • Update(uuid, property, oldValue, newValue, timestamp)
  • UndoPoint()

The Create form creates a new task. It is invalid to create a task that already exists.

Similarly, the Delete form deletes an existing task. It is invalid to delete a task that does not exist. The oldTask property contains the task data from before it was deleted.

The Update form updates the given property of the given task, where the property and values are strings. The oldValue gives the old value of the property (or None to create a new property), while newValue gives the new value (or None to delete a property). It is invalid to update a task that does not exist. The timestamp on updates serves as additional metadata and is used to resolve conflicts.

Application

Each operation can be "applied" to a task database in a natural way:

  • Applying Create creates a new, empty task in the task database.
  • Applying Delete deletes a task, including all of its properties, from the task database.
  • Applying Update modifies the properties of a task.
  • Applying UndoPoint does nothing.

Undo

Each operation also contains enough information to reverse its application:

  • Undoing Create deletes a task.
  • Undoing Delete creates a task, including all of the properties in oldTask.
  • Undoing Update modifies the properties of a task, reverting to oldValue.
  • Undoing UndoPoint does nothing.

The UndoPoint operation serves as a marker of points in the operation sequence to which the user might wish to undo. For example, creation of a new task with several properities involves several operations, but is a single step from the user's perspective. An "undo" command reverses operations, removing them from the operations sequence, until it reaches an UndoPoint operation.

Synchronizing Operations

After operations are synchronized to the server, they can no longer be undone. As such, the synchronization model uses simpler operations. Replica operations are converted to sync operations as follows:

  • Create(uuid) -> Create(uuid) (no change)
  • Delete(uuid, oldTask) -> Delete(uuid)
  • Update(uuid, property, oldValue, newValue, timestamp) -> Update(uuid, property, newValue, timestamp)
  • UndoPoint() -> Ø (dropped from operation sequence)

Once a sequence of operations has been synchronized, there is no need to store those operations on the replica. The current implementation deletes operations at that time. An alternative approach is to keep operations for existing tasks, and provide access to those operations as a "history" of modifications to the task.

Task Database

The task database is a layer of abstraction above the replica storage layer, responsible for maintaining some important invariants. While the storage is pluggable, there is only one implementation of the task database.

Reading Data

The task database provides read access to the data in the replica's storage through a variety of methods on the struct. Each read operation is executed in a transaction, so data may not be consistent between read operations. In practice, this is not an issue for TaskChampion's purposes.

Working Set

The task database maintains the working set. The working set maps small integers to current tasks, for easy reference by command-line users. This is done in such a way that the task numbers remain stable until the working set is rebuilt, at which point gaps in the numbering, such as for completed tasks, are removed by shifting all higher-numbered tasks downward.

The working set is not replicated, and is not considered a part of any consistency guarantees in the task database.

Modifying Data

Modifications to the data set are made by applying operations. Operations are described in Replica Storage.

Each operation is added to the list of operations in the storage, and simultaneously applied to the tasks in that storage. Operations are checked for validity as they are applied.

Deletion and Expiration

Deletion of a task merely changes the task's status to "deleted", leaving it in the Task database. Actual removal of tasks from the task database takes place as part of expiration, triggered by the user as part of a garbage-collection process. Expiration removes tasks with a modified property more than 180 days in the past, by creating a Delete(uuid) operation.

Tasks

Tasks are stored internally as a key/value map with string keys and values. All fields are optional: the Create operation creates an empty task. Display layers should apply appropriate defaults where necessary.

Atomicity

The synchronization process does not support read-modify-write operations. For example, suppose tags are updated by reading a list of tags, adding a tag, and writing the result back. This would be captured as an Update operation containing the amended list of tags. Suppose two such Update operations are made in different replicas and must be reconciled:

  • Update("d394be59-60e6-499e-b7e7-ca0142648409", "tags", "oldtag,newtag1", "2020-11-23T14:21:22Z")
  • Update("d394be59-60e6-499e-b7e7-ca0142648409", "tags", "oldtag,newtag2", "2020-11-23T15:08:57Z")

The result of this reconciliation will be oldtag,newtag2, while the user almost certainly intended oldtag,newtag1,newtag2.

The key names given below avoid this issue, allowing user updates such as adding a tag or deleting a dependency to be represented in a single Update operation.

Validity

Any key/value map is a valid task. Consumers of task data must make a best effort to interpret any map, even if it contains apparently contradictory information. For example, a task with status "completed" but no "end" key present should be interpreted as completed at an unknown time.

Representations

Integers are stored in decimal notation.

Timestamps are stored as UNIX epoch timestamps, in the form of an integer.

Keys

The following keys, and key formats, are defined:

  • status - one of P for a pending task (the default), C for completed or D for deleted
  • description - the one-line summary of the task
  • modified - the time of the last modification of this task
  • start - the most recent time at which this task was started (a task with no start key is not active)
  • end - if present, the time at which this task was completed or deleted (note that this key may not agree with status: it may be present for a pending task, or absent for a deleted or completed task)
  • tag_<tag> - indicates this task has tag <tag> (value is an empty string)
  • wait - indicates the time before which this task should be hidden, as it is not actionable
  • entry - the time at which the task was created
  • annotation_<timestamp> - value is an annotation created at the given time

The following are not yet implemented:

  • dep_<uuid> - indicates this task depends on <uuid> (value is an empty string)

UDAs

Any unrecognized keys are treated as "user-defined attributes" (UDAs). These attributes can be used to store additional data associated with a task. For example, applications that synchronize tasks with other systems such as calendars or team planning services might store unique identifiers for those systems as UDAs. The application defining a UDA defines the format of the value.

UDAs should have a namespaced structure of the form <namespace>.<key>, where <namespace> identifies the application defining the UDA. For example, a service named "DevSync" synchronizing tasks from GitHub might use UDAs like devsync.github.issue-id. Note that many existing UDAs for Taskwarrior integrations do not follow this pattern; these are referred to as legacy UDAs.

Synchronization and the Sync Server

This section covers synchronization of replicas containing the same set of tasks. A replica is can perform all operations locally without connecting to a sync server, then share those operations with other replicas when it connects. Sync is a critical feature of TaskChampion, allowing users to consult and update the same task list on multiple devices, without requiring constant connection.

This is a complex topic, and the section is broken into several chapters, beginning at the lower levels of the implementation and working up.

Synchronization Model

The task database also implements synchronization. Synchronization occurs between disconnected replicas, mediated by a server. The replicas never communicate directly with one another. The server does not have access to the task data; it sees only opaque blobs of data with a small amount of metadata.

The synchronization process is a critical part of the task database's functionality, and it cannot function efficiently without occasional synchronization operations

Operational Transforms

Synchronization is based on operational transformation. This section will assume some familiarity with the concept.

State and Operations

At a given time, the set of tasks in a replica's storage is the essential "state" of that replica. All modifications to that state occur via operations, as defined in Replica Storage. We can draw a network, or graph, with the nodes representing states and the edges representing operations. For example:

  o -- State: {abc-d123: 'get groceries', priority L}
  |
  | -- Operation: set abc-d123 priority to H
  |
  o -- State: {abc-d123: 'get groceries', priority H}

For those familiar with distributed version control systems, a state is analogous to a revision, while an operation is analogous to a commit.

Fundamentally, synchronization involves all replicas agreeing on a single, linear sequence of operations and the state that those operations create. Since the replicas are not connected, each may have additional operations that have been applied locally, but which have not yet been agreed on. The synchronization process uses operational transformation to "linearize" those operations. This process is analogous (vaguely) to rebasing a sequence of Git commits.

Sync Operations

The Replica Storage model contains additional information in its operations that is not included in operations synchronized to other replicas. In this document, we will be discussing "sync operations" of the form

  • Create(uuid)
  • Delete(uuid)
  • Update(uuid, property, value, timestamp)

Versions

Occasionally, database states are given a name (that takes the form of a UUID). The system as a whole (all replicas) constructs a branch-free sequence of versions and the operations that separate each version from the next. The version with the nil UUID is implicitly the empty database.

The server stores the operations to change a state from a "parent" version to a "child" version, and provides that information as needed to replicas. Replicas use this information to update their local task databases, and to generate new versions to send to the server.

Replicas generate a new version to transmit local changes to the server. The changes are represented as a sequence of operations with the state resulting from the final operation corresponding to the version. In order to keep the versions in a single sequence, the server will only accept a proposed version from a replica if its parent version matches the latest version on the server.

In the non-conflict case (such as with a single replica), then, a replica's synchronization process involves gathering up the operations it has accumulated since its last synchronization; bundling those operations into a version; and sending that version to the server.

Replica Invariant

The replica's storage contains the current state in tasks, the as-yet un-synchronized operations in operations, and the last version at which synchronization occurred in base_version.

The replica's un-synchronized operations are already reflected in its local tasks, so the following invariant holds:

Applying operations to the set of tasks at base_version gives a set of tasks identical to tasks.

Transformation

When the latest version on the server contains operations that are not present in the replica, then the states have diverged. For example:

  o  -- version N
 w|\a
  o o
 x|  \b
  o   o
 y|    \c
  o     o -- replica's local state
 z|
  o -- version N+1

(diagram notation: o designates a state, lower-case letters designate operations, and versions are presented as if they were numbered sequentially)

In this situation, the replica must "rebase" the local operations onto the latest version from the server and try again. This process is performed using operational transformation (OT). The result of this transformation is a sequence of operations based on the latest version, and a sequence of operations the replica can apply to its local task database to reach the same state Continuing the example above, the resulting operations are shown with ':

  o  -- version N
 w|\a
  o o
 x|  \b
  o   o
 y|    \c
  o     o -- replica's intermediate local state
 z|     |w'
  o-N+1 o
 a'\    |x'
    o   o
   b'\  |y'
      o o
     c'\|z'
        o  -- version N+2

The replica applies w' through z' locally, and sends a' through c' to the server as the operations to generate version N+2. Either path through this graph, a-b-c-w'-x'-y'-z' or a'-b'-c'-w-x-y-z, must generate precisely the same final state at version N+2. Careful selection of the operations and the transformation function ensure this.

See the comments in the source code for the details of how this transformation process is implemented.

Synchronization Process

To perform a synchronization, the replica first requests the child version of base_version from the server (GetChildVersion). It applies that version to its local tasks, rebases its local operations as described above, and updates base_version. The replica repeats this process until the server indicates no additional child versions exist. If there are no un-synchronized local operations, the process is complete.

Otherwise, the replica creates a new version containing its local operations, giving its base_version as the parent version, and transmits that to the server (AddVersion). In most cases, this will succeed, but if another replica has created a new version in the interim, then the new version will conflict with that other replica's new version and the server will respond with the new expected parent version. In this case, the process repeats. If the server indicates a conflict twice with the same expected base version, that is an indication that the replica has diverged (something serious has gone wrong).

Servers

A replica depends on periodic synchronization for performant operation. Without synchronization, its list of pending operations would grow indefinitely, and tasks could never be expired. So all replicas, even "singleton" replicas which do not replicate task data with any other replica, must synchronize periodically.

TaskChampion provides a LocalServer for this purpose. It implements the get_child_version and add_version operations as described, storing data on-disk locally, all within the ta binary.

Snapshots

The basic synchronization model described in the previous page has a few shortcomings:

  • servers must store an ever-increasing quantity of versions
  • a new replica must download all versions since the beginning in order to derive the current state

Snapshots allow TaskChampion to avoid both of these issues. A snapshot is a copy of the task database at a specific version. It is created by a replica, encrypted, and stored on the server. A new replica can simply download a recent snapshot and apply any additional versions synchronized since that snapshot was made. Servers can delete and reclaim space used by older versions, as long as newer snapshots are available.

Snapshot Heuristics

A server implementation must answer a few questions:

  • How often should snapshots be made?
  • When can versions be deleted?
  • When can snapshots be deleted?

A critical invariant is that at least one snapshot must exist for any database that does not have a child of the nil version. This ensures that a new replica can always derive the latest state.

Aside from that invariant, the server implementation can vary in its answers to these questions, with the following considerations:

Snapshots should be made frequently enough that a new replica can initialize quickly.

Existing replicas will fail to synchronize if they request a child version that has been deleted. This failure can cause data loss if the replica had local changes. It's conceivable that replicas may not sync for weeks or months if, for example, they are located on a home computer while the user is on holiday.

Requesting New Snapshots

The server requests snapshots from replicas, indicating an urgency for the request. Some replicas, such as those running on PCs or servers, can produce a snapshot even at low urgency. Other replicas, in more restricted environments such as mobile devices, will only produce a snapshot at high urgency. This saves resources in these restricted environments.

A snapshot must be made on a replica with no unsynchronized operations. As such, it only makes sense to request a snapshot in response to a successful AddVersion request.

Handling Deleted Versions

When a replica requests a child version, the response must distinguish two cases:

  1. No such child version exists because the replica is up-to-date.
  2. No such child version exists because it has been deleted, and the replica must re-initialize itself.

The details of this logic are covered in the Server-Replica Protocol.

Server-Replica Protocol

The server-replica protocol is defined abstractly in terms of request/response transactions from the replica to the server. This is made concrete in an HTTP representation.

The protocol builds on the model presented in the previous chapter, and in particular on the synchronization process.

Clients

From the server's perspective, replicas accessing the same task history are indistinguishable, so this protocol uses the term "client" to refer generically to all replicas replicating a single task history.

Each client is identified and authenticated with a "client key", known only to the server and to the replicas replicating the task history.

Server

For each client, the server is responsible for storing the task history, in the form of a branch-free sequence of versions. It also stores the latest snapshot, if any exists.

  • versions: a set of {versionId: UUID, parentVersionId: UUID, historySegment: bytes}
  • latestVersionId: UUID
  • snapshotVersionId: UUID
  • snapshot: bytes

For each client, it stores a set of versions as well as the latest version ID, defaulting to the nil UUID. Each version has a version ID, a parent version ID, and a history segment (opaque data containing the operations for that version). The server should maintain the following invariants for each client:

  1. latestVersionId is nil or exists in the set of versions.
  2. Given versions v1 and v2 for a client, with v1.versionId != v2.versionId and v1.parentVersionId != nil, v1.parentVersionId != v2.parentVersionId. In other words, versions do not branch.
  3. If snapshotVersionId is nil, then there is a version with parentVersionId == nil.
  4. If snapshotVersionId is not nil, then there is a version with parentVersionId = snapshotVersionId.

Note that versions form a linked list beginning with the latestVersionId stored for the client. This linked list need not continue back to a version with v.parentVersionId = nil. It may end at any point when v.parentVersionId is not found in the set of Versions. This observation allows the server to discard older versions. The third invariant prevents the server from discarding versions if there is no snapshot. The fourth invariant prevents the server from discarding versions newer than the snapshot.

Data Formats

Encryption

The client configuration includes an encryption secret of arbitrary length and a clientId to identify itself. This section describes how that information is used to encrypt and decrypt data sent to the server (versions and snapshots).

Key Derivation

The client derives the 32-byte encryption key from the configured encryption secret using PBKDF2 with HMAC-SHA256 and 100,000 iterations. The salt is the SHA256 hash of the 16-byte form of the client key.

Encryption

The client uses AEAD, with algorithm CHACHA20_POLY1305. The client should generate a random nonce, noting that AEAD is not secure if a nonce is used repeatedly for the same key.

AEAD supports additional authenticated data (AAD) which must be provided for both open and seal operations. In this protocol, the AAD is always 17 bytes of the form:

  • app_id (byte) - always 1
  • version_id (16 bytes) - 16-byte form of the version ID associated with this data
    • for versions (AddVersion, GetChildVersion), the parent version_id
    • for snapshots (AddSnapshot, GetSnapshot), the snapshot version_id

The app_id field is for future expansion to handle other, non-task data using this protocol. Including it in the AAD ensures that such data cannot be confused with task data.

Although the AEAD specification distinguishes ciphertext and tags, for purposes of this specification they are considered concatenated into a single bytestring as in BoringSSL's EVP_AEAD_CTX_seal.

Representation

The final byte-stream is comprised of the following structure:

  • version (byte) - format version (always 1)
  • nonce (12 bytes) - encryption nonce
  • ciphertext (remaining bytes) - ciphertext from sealing operation

The version field identifies this data format, and future formats will have a value other than 1 in this position.

Version

The decrypted form of a version is a JSON array containing operations in the order they should be applied. Each operation has the form {TYPE: DATA}, for example:

  • {"Create":{"uuid":"56e0be07-c61f-494c-a54c-bdcfdd52d2a7"}}
  • {"Delete":{"uuid":"56e0be07-c61f-494c-a54c-bdcfdd52d2a7"}}
  • {"Update":{"uuid":"56e0be07-c61f-494c-a54c-bdcfdd52d2a7","property":"prop","value":"v","timestamp":"2021-10-11T12:47:07.188090948Z"}}
  • {"Update":{"uuid":"56e0be07-c61f-494c-a54c-bdcfdd52d2a7","property":"prop","value":null,"timestamp":"2021-10-11T12:47:07.188090948Z"}} (to delete a property)

Timestamps are in RFC3339 format with a Z suffix.

Snapshot

The decrypted form of a snapshot is a JSON object mapping task IDs to task properties. For example (pretty-printed for clarity):

{
 "56e0be07-c61f-494c-a54c-bdcfdd52d2a7": {
   "description": "a task",
   "priority": "H"
 },
 "4b7ed904-f7b0-4293-8a10-ad452422c7b3": {
   "description": "another task"
 }
}

Transactions

AddVersion

The AddVersion transaction requests that the server add a new version to the client's task history. The request contains the following;

  • parent version ID
  • history segment

The server determines whether the new version is acceptable, atomically with respect to other requests for the same client. If it has no versions for the client, it accepts the version. If it already has one or more versions for the client, then it accepts the version only if the given parent version ID matches its stored latest parent ID.

If the version is accepted, the server generates a new version ID for it. The version is added to the set of versions for the client, the client's latest version ID is set to the new version ID. The new version ID is returned in the response to the client. The response may also include a request for a snapshot, with associated urgency.

If the version is not accepted, the server makes no changes, but responds to the client with a conflict indication containing the latest version ID. The client may then "rebase" its operations and try again. Note that if a client receives two conflict responses with the same parent version ID, it is an indication that the client's version history has diverged from that on the server.

GetChildVersion

The GetChildVersion transaction is a read-only request for a version. The request consists of a parent version ID. The server searches its set of versions for a version with the given parent ID. If found, it returns the version's

  • version ID,
  • parent version ID (matching that in the request), and
  • history segment.

The response is either a version (success, not-found, or gone, as determined by the first of the following to apply:

  • If a version with parentVersionId equal to the requested parentVersionId exists, it is returned.
  • If the requested parentVersionId is the nil UUID ..
    • ..and snapshotVersionId is nil, the response is not-found (the client has no versions).
    • ..and snapshotVersionId is not nil, the response is gone (the first version has been deleted).
  • If a version with versionId equal to the requested parentVersionId exists, the response is not-found (the client is up-to-date)
  • Otherwise, the response is gone (the requested version has been deleted).

AddSnapshot

The AddSnapshot transaction requests that the server store a new snapshot, generated by the client. The request contains the following:

  • version ID at which the snapshot was made
  • snapshot data (opaque to the server)

The server should validate that the snapshot is for an existing version and is newer than any existing snapshot. It may also validate that the snapshot is for a "recent" version (e.g., one of the last 5 versions). If a snapshot already exists for the given version, the server may keep or discard the new snapshot but should return a success indication to the client.

The server response is empty.

GetSnapshot

The GetSnapshot transaction requests that the server provide the latest snapshot. The response contains the snapshot version ID and the snapshot data, if those exist.

HTTP Representation

The transactions above are realized for an HTTP server at <origin> using the HTTP requests and responses described here. The origin should be an HTTPS endpoint on general principle, but nothing in the functonality or security of the protocol depends on connection encryption.

The replica identifies itself to the server using a clientKey in the form of a UUID. This value is passed with every request in the X-Client-Id header, in its dashed-hex format.

AddVersion

The request is a POST to <origin>/v1/client/add-version/<parentVersionId>. The request body contains the history segment, optionally encoded using any encoding supported by actix-web. The content-type must be application/vnd.taskchampion.history-segment.

The success response is a 200 OK with an empty body. The new version ID appears in the X-Version-Id header. If included, a snapshot request appears in the X-Snapshot-Request header with value urgency=low or urgency=high.

On conflict, the response is a 409 CONFLICT with an empty body. The expected parent version ID appears in the X-Parent-Version-Id header.

Other error responses (4xx or 5xx) may be returned and should be treated appropriately to their meanings in the HTTP specification.

GetChildVersion

The request is a GET to <origin>/v1/client/get-child-version/<parentVersionId>.

The response is determined as described above. The not-found response is 404 NOT FOUND. The gone response is 410 GONE. Neither has a response body.

On success, the response is a 200 OK. The version's history segment is returned in the response body, with content-type application/vnd.taskchampion.history-segment. The version ID appears in the X-Version-Id header. The response body may be encoded, in accordance with any Accept-Encoding header in the request.

On failure, a client should treat a 404 NOT FOUND as indicating that it is up-to-date. Clients should treat a 410 GONE as a synchronization error. If the client has pending changes to send to the server, based on a now-removed version, then those changes cannot be reconciled and will be lost. The client should, optionally after consulting the user, download and apply the latest snapshot.

AddSnapshot

The request is a POST to <origin>/v1/client/add-snapshot/<versionId>. The request body contains the snapshot data, optionally encoded using any encoding supported by actix-web. The content-type must be application/vnd.taskchampion.snapshot.

If the version is invalid, as described above, the response should be 400 BAD REQUEST. The server response should be 200 OK on success.

GetSnapshot

The request is a GET to <origin>/v1/client/snapshot.

The response is a 200 OK. The snapshot is returned in the response body, with content-type application/vnd.taskchampion.snapshot. The version ID appears in the X-Version-Id header. The response body may be encoded, in accordance with any Accept-Encoding header in the request.

After downloading and decrypting a snapshot, a client must replace its entire local task database with the content of the snapshot. Any local operations that had not yet been synchronized must be discarded. After the snapshot is applied, the client should begin the synchronization process again, starting from the snapshot version.

Planned Functionality

This section is a bit of a to-do list for additional functionality to add to the synchronzation system. Each feature has some discussion of how it might be implemented.

Snapshots

As designed, storage required on the server would grow with time, as would the time required for new clients to update to the latest version. As an optimization, the server also stores "snapshots" containing a full copy of the task database at a given version. Based on configurable heuristics, it may delete older operations and snapshots, as long as enough data remains for active clients to synchronize and for new clients to initialize.

Since snapshots must be computed by clients, the server may "request" a snapshot when providing the latest version to a client. This request comes with a number indicating how much it 'wants" the snapshot. Clients which can easily generate and transmit a snapshot should be generous to the server, while clients with more limited resources can wait until the server's requests are more desperate. The intent is, where possible, to request snapshots created on well-connected desktop clients over mobile and low-power clients.

Encryption and Signing

From the server's perspective, all data except for version numbers are opaque binary blobs. Clients encrypt and sign these blobs using a symmetric key known only to the clients. This secures the data at-rest on the server. Note that privacy is not complete, as the server still has some information about users, including source and frequency of synchronization transactions and size of those transactions.

Backups

In this design, the server is little more than an authenticated storage for encrypted blobs provided by the client. To allow for failure or data loss on the server, clients are expected to cache these blobs locally for a short time (a week), along with a server-provided HMAC signature. When data loss is detected -- such as when a client expects the server to have a version N or higher, and the server only has N-1, the client can send those blobs to the server. The server can validate the HMAC and, if successful, add the blobs to its datastore.

Expiration

Deleted tasks remain in the task database, and are simply hidden in most views. All tasks have an expiration time after which they may be flushed, preventing unbounded increase in task database size. However, purging of a task does not satisfy the necessary OT guarantees, so some further formal design work is required before this is implemented.