SubcloneSeeker
1.0.0
Subclone deconvolution software framework
|
Computational Framework to reconstructing tumor clone structures
Clonal heterogeneity is a defining feature of many tumor types, and the elucidation of the subclones that exist within the tumor sample, together with the corresponding cellular fractions, is essential to understand tumorigenesis, relapse, and metastasis. We developed a software framework, SubcloneSeeker, that examines somatic variation events (such as copy number changes, loss of heterozygosity, or point mutations) in order to identify the underlying subclone structure, i.e. the subclones including the normal (non-cancerous) cells and their cellular frequencies within the tumor tissue.
Please refer to the documentation for how to get started, or run examples.
http://yiq.github.io/SubcloneSeekerDoc/
'src' directory contains all the classes that consists of the SubcloneSeeker core library, in which the basic data types and logics for representing somatic mutations, mutation clusters and subclones are defined.
'utils' directory contains the source code for many command-line utilities that utilizing the SubcloneSeeker core library.
'test' directory contains the test cases for the core library.
'doc' directory contains the project's documentation, generated by Doxygen. A pre-built version can be read at http://yiq.github.io/SubcloneSeekerDoc/
'contrib' directory contains third part libraries that the project uses.
SubcloneSeeker is distributed under MIT license.
The MIT License (MIT)
Copyright (c) 2013 Yi Qiao
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Descriptions and usage about the included command line utilities can be found at the Command line utilities page
There are four examples prepared for you in a easy-to-run fashion. Each of the example represents a figure in the article. In order reproduce the figures, make sure you have R installed, and the binary can be found through PATH lookup. If you run the following command
which R
And see an absolute path, such as '/usr/bin/R', then the requirement is satisfied
The example tarball consists of four results presented in the paper. The tarball is at the root directory of the source tree.
https://github.com/yiq/SubcloneSeeker/blob/master/examples.tar.gz
Navigate to the example directory on a commandline. Edit the file 'setup.sh' so that it reflects how you organized the project files. Make sure that you have compiled the binaries in the SubcloneSeeker project. If you haven't done so, please refer to the 'INSTALLATION' section in this document.
Directory: 01-simulation
The first example simulates 1000 tumors, each with 3, 4, ..., 8 subclones. Subclone Deconvolution is then performed on all the samples, and statistics will be collected as in how many possible solutions for each tumor. You can initiate the simulation by running the 'sim.sh' script
./sim.sh
Note that this will take some time to finish, especially in the case 7 or 8 subclones per tumor. Results will be stored in 'result' directory. To plot the results, launch R in the example directory and run the 'plot.r' scriot
R source('plot.r');
A result set from a run executed on my computer is also provided, if you just want to plot the results. In that case, please use the 'plot_prerun.r' script
Directory: 02-snuc
In this example, a snuc cancer cell line (thus very high purity) was sequenced, and its sequencing reads were combined with those reads from sequencing the paired-normal sample at various ratios. Subclone deconvolution is performed on each of the digitally created samples, and purity estimation is reported. In order to run the example, simple type
make
in the example directory, as it will take care of removing old results (if you run it before), creating required directories, running the deconvolution and plotting the result. After the execution, a file 'SamplePurity.pdf' should be created in the same directory, which contains the plotted result.
execute
make
As an example, the following is part of the output
SAMPLE UPN933124: Primary tree 11 is compatible with Secondary tree 1
This means that, for the sample UPN933124, the primary tree, with a root node whose id is 11, is compatible with the relapse (secondary) tree, with a root node whose id is 1. To further investigate the actual structures, the utility treeprint
can be used. The following commands assume that an environment variable SSHOME
exists that points to the directory of SubcloneSeeker
$SSHOME/utils/treeprint -l results/subclones/UPN933124-pri.sqlite 1 6 11 16 21 26
There are 6 subclone structures (or root nodes, more precisely) in the database resulted from reconstruction on the primary sample. Since tree 11 is compatible, let's look at that one particularly
$SSHOME/utils/treeprint -r 11 results/subclones/UPN933124-pri.sqlite 0,(0.127401,(0.531157,0.29044,(0.051003,)))
It indicates that the structure of tree 11 is as follows:
A more visualization friendly output can be produced with the -g
option with treeprint
, which print the structure in graphviz format
$SSHOME/utils/treeprint -r 11 -g results/subclones/UPN933124-pri.sqlite digraph { n11 [label="n11: 0%"]; n12 [label="n12: 12.7%"]; n13 [label="n13: 53.1%"]; n14 [label="n14: 29%"]; n15 [label="n15: 5.1%"]; n11->n12; n12->n13; n12->n14; n14->n15; }
Currently the clusters are not labeled when they are provided to the main reconstruction algorithm, so that the output cannot be labeled either (a future update will remedy this), and they are simply labeled as n (as in node) + their subclone ID. But given the subclone frequencies, it is easy to assign the mutations back to each of the subclones, working from the bottom up.
execute
make
The details of the resulting structures can be interrogated in a similar fashion as in the previous example