C++ API#

Exception Types#

class ApiMisuseFailure : public std::runtime_error#: Exception thrown when the GRG API is used incorrectly (bad parameters, incorrect preconditions, etc.).

class BadInputFileFailure : public std::runtime_error#: Exception thrown when the input file does not meet conditions needed for GRG.

class SerializationFailure : public std::runtime_error#: Exception thrown when a GRG fails to serialize/deserialize.

Main GRG Classes#

class GRG : public std::enable_shared_from_this<GRG>#

Abstract GRG base class.

Subclassed by grgl::CSRGRG, grgl::MutableGRG

Public Types

using NodeAndMut = std::pair<NodeID, MutationId>#: A node and a mutation attached to that node.

using NodeMutMiss = std::tuple<NodeID, MutationId, NodeID>#: A node and a mutation attached to that node, and the missingness node associated with the mutation.

using MutAndNode = std::pair<MutationId, NodeID>#: A mutation and node, where the mutation comes first.

using MutNodeMiss = std::tuple<MutationId, NodeID, NodeID>#: A mutation, node, and missingness node, where the mutation comes first.

Public Functions

inline virtual bool isSample(const NodeID nodeId) const#

Is the given nodeID associated with a sample node?

In most GRGs, the first numSamples node IDs are reserved for samples, so you can also just use nodeId < grg.numSamples() if you are using an immutable GRG (CSRGRG) or more generally if samplesAreOrdered() returns true.

Parameters:: nodeId – [in] The node ID to check.

inline virtual NodeIDSizeT numSamples() const#: The number of samples (haplotypes) in the graph.

inline NodeIDSizeT numIndividuals() const#: The number of individuals (haplotypes/ploidy) in the graph.

inline uint16_t getPloidy() const#

How many haploid samples are there per individual?

Returns:: The ploidy, usually 1 or 2. Individual coalescence support only works when ploidy==2.

inline bool isPhased() const#

Is the dataset phased?

If not, GRG still represents the data as a mapping between mutations and haploid samples, but you cannot depend on any haplotype-based analyses representing the true haplotypes. Individual-based analyses are still completely accurate.

virtual bool nodesAreOrdered() const = 0#

Returns true if nodes are ordered by ID in bottom-up topological order.

This means you can just iterate the node IDs from 0…(numNodes - 1) to perform a topological traversal.

virtual bool nodesAreTopo() const = 0#

Returns true if nodes are ordered in bottom-up topological order, If this is true and nodesAreOrdered is false, then you cannot iterate by NodeID (which are always positive), but you can iterate by using getOrderedNodes().

Most users will not need this method, unless they are heavily modifying the GRG (e.g., by adding negative nodes to the graph).

virtual size_t numNodes() const = 0#: Number of nodes (including sample nodes) in the graph.

virtual size_t numEdges() const = 0#: Number of edges (total) in the graph.

virtual size_t numDownEdges(NodeID nodeId) = 0#: Number of down edges for a particular node in the graph.

virtual size_t numUpEdges(NodeID nodeId) = 0#: Number of up edges for a particular node in the graph.

virtual NodeIDList getDownEdges(NodeID nodeId) = 0#: Get list of down edges for a particular node in the graph.

virtual NodeIDList getUpEdges(NodeID nodeId) = 0#: Get list of up edges for a particular node in the graph.

virtual NodeIDList getOrderedNodes(TraversalDirection direction, bool allowSort = false) = 0#

Get all of the node IDs in topological order for the given direction (UP means do the leaves first, DOWN means do the roots first).

Parameters:

direction – [in] Direction for the order.
allowSort – [in] Set to true to allow performing a topological sort to get the result, if the nodes are not already ordered. This can impose a substantial performance hit and is only necessary when the graph has been modified in ways that did not preserve the topological order! It is almost always possible to modify the graph so that the topological order is preserved, so this option should rarely (if ever) be used. Default: false.

virtual bool hasUpEdges() const = 0#: True if this graph stores up edges, false if only down edges are stored.

virtual bool edgesAreOrdered() const = 0#: True if the edges in this graph are always in ascending NodeID order.

inline bool mutationsAreOrdered() const#: True iff the mutations are ordered according to position and the alternate allele (in that order).

inline std::pair<BpPosition, BpPosition> getBPRange()#: Get the base-pair range this GRG covers as a pair {first, last+1} where first is the base-pair position of the first mutation, and last is similarly defined. In general we treat all ranges as left-inclusive and right-exlusive, hence the +1 on the last.

inline std::pair<BpPosition, BpPosition> getSpecifiedBPRange() const#: Get the specified base-pair range this GRG covers as a pair {first, last+1}. Unlike getBPRange, this function just returns a pair of values that were specified by a tool/user at some point.

inline void setSpecifiedBPRange(std::pair<BpPosition, BpPosition> bpRange)#: Set the specified base-pair range this GRG covers as a pair {first, last+1}, i.e. the lower value is inclusive and the higher value is exclusive.

inline virtual NodeIDList getSampleNodes() const#: Get the NodeIDList for all samples in the graph, in order of sample ID (from the 0th sample to the (N-1)th sample, where N is the number of haplotypes).

inline NodeIDList getRootNodes()#: Get the NodeIDList for all roots in the graph. Note: this is an O(numNodes) operation.

inline bool hasMissingData() const#: Does this GRG have Mutations which have missing data represented?

template<typename T = NodeAndMut> inline NodeAndMutIterator<T> getNodesAndMutations(const bool allowSort = true)#

Get the list of all corresponding nodes and mutations, for iterating over the mapping between graph nodes and Mutations. By default, just returns the (NodeID, MutationID) pairs, but using the template argument of GRG::NodeMutMiss will return a tuple (NodeID, MutationID, NodeID) where the additional NodeID is the missingness node (or INVALID_NODE_ID) associated with this Mutation.

Parameters:: allowSort – [in] Allow the mutations-to-node mapping to be resorted; if false and the mapping is not sorted then an exception will be thrown. Sorting repeatedly is slow (obviously), so this flag can help avoid/debug performance issues.
Returns:: An iterator over (NodeID, MutationId) pairs or (NodeID, MutationId, NodeID) tuples, depending on the function template argument.

template<typename T = MutationId> inline std::vector<T> getUnmappedMutations()#

Get the mutations associated with no nodes. By default, the template argument is MutationId, so only the vector of mutations is returned. If you set the template argument to GRG::MutAndNode then the mutation and it’s corresponding missingness node will be returned.

Returns:: A vector of either MutationId or GRG::MutAndNode (a std::pair), depending on the template parameter.

template<typename T = MutationId> inline std::vector<T> getMutationsForNode(const NodeID nodeId, const bool allowSort = true)#

Get the mutations associated with the node. By default, the template argument is MutationId, so only the vector of mutations is returned. If you set the template argument to GRG::MutAndNode then the mutation and it’s corresponding missingness node will be returned.

Parameters:

nodeId – [in] The node to lookup.
allowSort – [in] Allow the mutations-to-node mapping to be resorted; if false and the mapping is not sorted then an exception will be thrown. Sorting repeatedly is slow (obviously), so this flag can help avoid/debug performance issues.

Returns:

A vector of either MutationId or GRG::MutAndNode (a std::pair), depending on the template parameter.

inline bool nodeHasMutations(const NodeID nodeId, const bool allowSort = true)#

Does the node have at least one Mutation associated with it?

Parameters:

nodeId – [in] The node to lookup.
allowSort – [in] Allow the mutations-to-node mapping to be resorted; if false and the mapping is not sorted then an exception will be thrown. Sorting repeatedly is slow (obviously), so this flag can help avoid/debug performance issues.

Returns:

true if the node has one or more Mutation.

template<typename T = MutAndNode> inline std::vector<T> getMutationsToNodeOrdered()#

Get corresponding mutation IDs and node IDs, ordered by the mutation position + allele (ascending). By default, a MutAndNode is returned (std::pair). If you set the template argument to MutIdNodeMiss (std::tuple) then the third item in the returned tuple will be the missingness node associated with the Mutation.

Returns:: A vector of pairs, MutationID and NodeID (in that order), or a vector of tuples of (MutationID, NodeID, NodeID), depending on function template argument.

void visitBfs(GRGVisitor &visitor, TraversalDirection direction, const NodeIDList &seedList, ssize_t maxQueueWidth = -1)#

Visit nodes breadth-first, starting at the given nodes and following up or down edges.

Parameters:

visitor – [in] The visitor that will be called back. Typically owns any state that is being computed.
direction – [in] Whether to go up or down the graph.
seedList – [in] The nodes to start traversal from.
maxQueueWidth – [in] The maximum width of the queue; restricts the number of end-to-end paths that will be visited.

void visitDfs(GRGVisitor &visitor, TraversalDirection direction, const NodeIDList &seedList, bool forwardOnly = false)#

Visit nodes depth-first, starting at the given nodes and following up or down edges. A node is never visited more than one time in either pass (forward or backward), unless forwardOnly is true.

Parameters:

visitor – [in] The visitor that will be called back. Typically owns any state that is being computed.
direction – [in] Whether to go up or down the graph.
seedList – [in] The nodes to start traversal from.
maxPathDepth – [in] The maximum depth of any path that will be explored. Traversal stops when this is reached. Typically only useful when forwardOnly is true.
forwardOnly – [in] Typically depth-first search reaches a given node, computes (recursively) the values for all successors, and then computes the current node value. This causes each node to be visited twice. However, setting forwardOnly will only visit nodes in the forward direction. It also causes nodes to be visited an arbitrary number of times.

inline size_t addPopulation(std::string populationDescription)#: Add a population to the GRG.

inline const std::vector<std::string> &getPopulations() const#: The populations (their descriptions) represented by this GRG.

MutationId addMutation(const Mutation &mutation, NodeID nodeId, NodeID missNodeId = INVALID_NODE_ID)#

Add the given mutation to the GRG, associated with the given node. If there is no node associated with the mutation, use INVALID_NODE_ID.

Parameters:

mutation – [in] The Mutation to add.
nodeId – [in] The nodeId to associated it with, or INVALID_NODE_ID.
missNodeId – [in] The nodeId for the missingness node of this Mutation, or INVALID_NODE_ID if none.

Returns:

The MutationId for the newly added Mutation.

void removeMutation(MutationId mutId, NodeID nodeId)#

Remove the Mutation with the given ID and nodeId from the graph. If the Mutation has a node associated with it, the mapping of that Mutation to its node will also be removed. This is a memory efficient operation; it leaves an empty Mutation at the MutationID. Therefore, if you want the MutationID ordering to match position/allele ordering you must call sortMutations().

Parameters:

mutId – [in] The MutationId.
nodeId – [in] The NodeId.

void sortMutations()#: If needed, sort the mutations by their (position, allele) and renumber them so that the MutationId ascending order matches this order. Can be a memory-intensive operation. Alternatively, you can just write the GRG to disk (saveGrg()) and then reload it (loadImmutableGRG()) and it will use less memory (but be slower).

bool mutationsAreUnique() const#

Check whether the Mutations in GRG are unique: there is a single Mutation (with MutationId) for each unique (position, ref allele, alt allele) combination. Since the mapping between MutationId and nodes is many-to-1 (not many-to-many), having unique mutations means that each Mutation is associated with a single node in the graph. When a GRG is constructed via grg construct this is always true of the resulting GRG. When a GRG is constructed arbitrarily from the API, it may not be true.

This operation is O(M), where M is the number of mutations.

template<typename T> inline std::vector<T> matMul(const std::vector<T> &inputMatrix, const size_t numRows, TraversalDirection direction)#

Compute one of two possible matrix multiplications across the entire graph. The input matrix \(V\) can be either \(K \times N\) ( \(N\) is number of samples) or \(K \times M\) ( \(M\) is number of mutations). The given direction determines which input matrix is expected. Let \(X\) be the \(N \times M\) genotype matrix. For an \(K \times N\) input \(V\), the product performed is \(V \times X\) which gives a \(K \times M\) result. I.e., the input matrix is a column per sample and the output matrix is a column per mutation. For an \(K \times M\) input \(V\), the product performed is \(V \times X^T\) which gives a \(K \times N\) result. I.e., the input matrix is a column per mutation and the output matrix is a column per sample.

The simplest case to consider is a vector input (e.g., a \(1 \times N\) matrix). This vector-matrix product in the graph works by seeding the input nodes (samples in this example) with the corresponding values from the input vector and then traversing the graph in the relevant direction (up or down). The ancestor/descendant values are summed at each node, until the terminal nodes (mutations in this example) are reached. The values at the terminal nodes are then the output vector. When a \(K\)-row matrix is input, instead of a vector, the only difference is that each node stores \(K\) values instead of 1.

Note: the RAM used will be \(O(K * nodes)\) where \(nodes\) is the total number of nodes in the graph.

Parameters:

inputMatrix – [in] The \(K \times N\) or \(K \times M\) input matrix in row-major order. For samples, the \(i\)th column are the values for sample \(i\). Similarly for mutations, the ordering follows the MutationId ordering.
numRows – [in] The number of rows in the matrix. The size of the input must be divisible by numRows, and determines the number of columns.
direction – [in] Use TraversalDirection::DIRECTION_DOWN for mutations as input, and TraversalDirection::DIRECTION_UP for samples as input.

Returns:

The resulting matrix as described above, in row-major order.

inline PopulationID getPopulationId(NodeID nodeId)#

Get the ID of the population associated with the node. Will be POPULATION_UNSPECIFIED if the node is not a sample, or if there is no population associated.

Parameters:: nodeId – [in] The node to retrieve.

inline void setPopulationId(NodeID nodeId, PopulationID popId)#

Set the ID of the population associated with the node.

Parameters:

nodeId – [in] The node to modify.
popId – [in] The population ID, which is already associated with a population description via addPopulation().

inline NodeIDSizeT getNumIndividualCoals(NodeID nodeId)#

Get the number of individuals that coalescence at this node (not below it, but exactly at it).

Parameters:: nodeId – [in] The node to retrieve.
Returns:: The number of individuals. For diploid data this is a number between 0…numSamples()/2.

inline void setNumIndividualCoalsGrow(NodeID nodeId, NodeIDSizeT coals)#

Set the number of individuals that coalesce at this node, growing the underlying data size if needed.

NOT THREADSAFE.

Parameters:

nodeId – [in] The node to modify.
coals – [in] The number of individuals that coalesce, between 0…numSamples()/ploidy.

inline void setNumIndividualCoals(NodeID nodeId, NodeIDSizeT coals)#

Set the number of individuals that coalesce at this node.

The value is updated atomically, so this can be used from threaded code (as long as the rest of the GRG, e.g. number of nodes, is not changing).

Parameters:

nodeId – [in] The node to modify.
coals – [in] The number of individuals that coalesce, between 0…numSamples()/ploidy.

inline bool hasIndividualCoals() const#: Does this dataset have any individual coalescence information? If not, you won’t be able to quickly get the exact sample variance for each mutation (diploid genotypes only).

inline void addIndividualId(const std::string &identifier)#

Add the next individual’s identifier. Must be called in order, for individuals 0…(N-1).

Parameters:: identifier – The identifier to add.

inline void clearIndividualIds()#: Clear all individual identifiers.

inline bool hasIndividualIds() const#

Are there any string identifiers for the individuals in this GRG?

Returns:: true if individual identifiers are available.

inline std::string getIndividualId(NodeIDSizeT individualIndex)#

Get the string identifier for the k’th individual. Returns empty string if there is no identifier for the given individual.

Returns:: String identifier for the given individual.

template<typename T> class NodeAndMutIterator#

Iterator for traversing the NodeID<–>MutationID mapping, along with the (sometimes present) NodeID for the missingness node associated with the Mutation.

Each Mutation can have two different “kinds” of nodes associated with it:

The Mutation Node, which defines the mapping between a Mutation and all of the samples that have that Mutation.
The Missingness Node, which defines the mapping between a Mutation and all of the samples that are missing an allele at the site associated with the Mutation.

Public Functions

inline size_t size() const#: The number of items in the vector underlying the iterator.

inline VectIter3 begin() const#: The beginning of the vector.

inline VectIter3 end() const#: The end of the vector.

class VectIter3#

class MutableGRG : public grgl::GRG #

A MutableGRG can be changed by adding/removing nodes and edges.

Public Functions

inline explicit MutableGRG(NodeIDSizeT numSamples, uint16_t ploidy, bool phased = true, NodeIDSizeT initialNodeCapacity = DEFAULT_NODE_CAPACITY, bool useUpEdges = true)#

Construct a GRG for a given number of samples.

Parameters:

The – [in] number of samples that will be used to construct the graph.
(Optional) – [in] the initial capacity of the node vector. If you know in advance roughly how many nodes will be created this can improve performance.

inline virtual NodeIDSizeT numSamples() const override#: The number of samples (haplotypes) in the graph.

inline virtual bool isSample(const NodeID nodeId) const override#

Is the given nodeID associated with a sample node?

Parameters:: nodeId – [in] The node ID to check.

inline virtual NodeIDList getSampleNodes() const override#: Get the NodeIDList for all samples in the graph, in order of sample ID (from the 0th sample to the (N-1)th sample, where N is the number of haplotypes).

inline virtual bool nodesAreOrdered() const override#

Returns true if nodes are ordered by ID in bottom-up topological order.

This means you can just iterate the node IDs from 0…(numNodes - 1) to perform a topological traversal.

inline virtual bool nodesAreTopo() const override#

Most users will not need this method, unless they are heavily modifying the GRG (e.g., by adding negative nodes to the graph).

inline virtual size_t numNodes() const override#: Number of nodes (including sample nodes) in the graph.

inline virtual size_t numEdges() const override#: Number of edges (total) in the graph.

inline virtual size_t numDownEdges(NodeID nodeId) override#: Number of down edges for a particular node in the graph.

inline virtual size_t numUpEdges(NodeID nodeId) override#: Number of up edges for a particular node in the graph.

inline virtual NodeIDList getDownEdges(NodeID nodeId) override#: Get list of down edges for a particular node in the graph.

inline virtual NodeIDList getUpEdges(NodeID nodeId) override#: Get list of up edges for a particular node in the graph.

inline virtual bool hasUpEdges() const override#: True if this graph stores up edges, false if only down edges are stored.

inline virtual bool edgesAreOrdered() const override#: True if the edges in this graph are always in ascending NodeID order.

virtual NodeIDList getOrderedNodes(TraversalDirection direction, bool allowSort = false) override#

Get all of the node IDs in topological order for the given direction (UP means do the leaves first, DOWN means do the roots first).

Parameters:

direction – [in] Direction for the order.
allowSort – [in] Set to true to allow performing a topological sort to get the result, if the nodes are not already ordered. This can impose a substantial performance hit and is only necessary when the graph has been modified in ways that did not preserve the topological order! It is almost always possible to modify the graph so that the topological order is preserved, so this option should rarely (if ever) be used. Default: false.

inline NodeID makeNode(const size_t count = 1, bool negative = false)#

Create a new node in the GRG.

This is the only valid way to construct a GRG node. When you create a GRG you specify the number of sample nodes, and those nodes are created right away. Each call to makeNode() after that will generate sequential-ID nodes. Use connect() to connect the newly created node to other nodes.

Parameters:

count – [in] The number of nodes to create.
negative – [in] For “normal” nodes, the topological order is based on counting up with NodeIDs but for negative nodes this is not the case, they count (and build) down. If you set this true, the node will be added to a special list of prepended nodes so that the topological order can be maintained (via getOrderedNodes()… the integer iteration will no longer work). This does not return a negative value for NodeID (it is unsigned), and the NodeID increments just like a normal node. It is up to the user to use the NodeID appropriately - which means passing it as a negative value to connect() whenever edges are created with it.

void connect(SignedNodeID srcId, SignedNodeID tgtId)#

Create a graph edge from one node (the source) to another node (the target).

In GRG, “source node” always refers to the node “above” the other node. I.e., this function constructs a down edge from source to target, and an up edge from target to source. In ARG parlance, the source node is the parent and the target is the child.

Parameters:

srcId – [in] The ID of the source node.
tgtId – [in] The ID of the target node.

bool disconnect(NodeID srcId, NodeID tgtId)#

Remove a graph edge between two nodes.

The source and target nodes have the same meaning as in connect().

Parameters:

srcId – [in] The ID of the source node.
tgtId – [in] The ID of the target node.

NodeIDSizeT ensureUniqueMutations()#

Ensure that every Mutation in the GRG is unique: there is a single Mutation (with MutationId) for each unique (position, ref allele, alt allele) combination. It does so by finding duplicate Mutations, adding a new node that becomes a parent to them, and creating a new (single) mutation representing them all. At the end of this method, the mutations will be unordered, and either serializing the GRG to disk or calling sortMutations() will be necessary to get the new, correct MutationIds.

This operation is O(M), where M is the number of mutations.

Returns:: The number of nodes that were added to the graph.

void merge(const std::list<std::string> &otherGrgFiles, bool combineNodes = true, bool useSampleSets = false, bool verbose = false, bool ignoreRangeViolations = false, std::vector<BpPosition> positionAdjust = {})#

Merge one or more GRGs into this one. Only succeeds if all GRGs have the same number of samples.

This assumes that the GRGs were constructed from the same sampleset — e.g., they could be constructed from two subsets of the same sampleset (as long as both were constructed with the same sample node numbering) or from a subset of mutations against the same sampleset.

The specified range of the resulting GRG will be (min(range of any input), max(range of any input)), even if the provided GRGs do not span a contiguous region. It is up to the caller of this method to ensure that either (1) the span is contiguous or (2) they adjust the specified range appropriately afterwards.

Parameters:

otherGrgFiles – [in] The list of GRG filenames to load and merge.
combineNodes – [in] Set to false to never combine nodes from the graphs.
useSampleSets – [in] Set to true to use the slower, more RAM intensive version that tracks sets of samples beneath each graph node, and combines nodes that have the same sample set. The default algorithm just uses the (mapped) children to determine if two nodes can be combined, which combines fewer nodes overall, but also retains more hierarchy in the final graph.
verbose – [in] Emit more information to stdout.
ignoreRangeViolations – [in] Merge graphs that might have overlapping positions.
positionAdjust – [in] Adjust each otherGrgFiles GRG by the given basepair position. I.e., for each Mutation in otherGrgFiles[i], add positionAdjust[i] to its mutation position.

inline void setSamples(const NodeIDList &sampleNodes)#

Change which nodes in the graph are the sample nodes. These nodes do not need to be leaf nodes, but they should not be reachable from each other (this is not checked: the user must ensure).

The order that the NodeIDs are passed in the input list is the order that the samples will be numbered. So for example, sampleNodes[0] is sample 0, samplesNodes[1] is 1, etc. All methods that deal with samples will treat them in this order, and when the GRG is written to disk the nodes will be renumbered such that sampleNodes[0] becomes NodeID=0, and so on.

Parameters:: sampleNodes – [in] The ordered list of new sample nodes. This list length must be evenly divisible by getPloidy().

inline GRGNode &getNode(const NodeID nodeId) const#

Retrieve a node by ID.

Parameters:: nodeId – [in] The ID of the node in question.

void compact(NodeID nodeId = INVALID_NODE_ID)#: Compact the edge vectors in the GRG. If done infrequently won’t affect the amortized edge addition cost but will reduce overall RAM usage by a non-trivial amount.

class Mutation#

A mutation in the GRG.

Public Functions

inline std::pair<std::string, std::string> getBothAlleles() const#: Get a pair containing the reference allele (first) and the alternate allele (second) as strings.

inline std::string alleleStorageString() const#: Returns the concatenation of the reference allele with the alternate allele. There is no delimiter, see getAltIndex() to get the index of the first character in the alternate allele.

inline void setAlleleStorageString(const std::string &storageStr)#: INTERNAL SERIALIZATION METHOD.

inline size_t getAltIndex() const#: The position of the first character of the alternate allele in the allele storage string.

Warning

doxygenclass: Cannot find class “grgl::GRG::NodeAndMut” in doxygen xml output for project “grgl” from directory: doc/xml/

Warning

doxygenclass: Cannot find class “grgl::GRG::NodeMutMiss” in doxygen xml output for project “grgl” from directory: doc/xml/

Warning

doxygenclass: Cannot find class “grgl::GRG::MutAndNode” in doxygen xml output for project “grgl” from directory: doc/xml/

Warning

doxygenclass: Cannot find class “grgl::GRG::MutNodeMiss” in doxygen xml output for project “grgl” from directory: doc/xml/

Serialization and Conversion#

std::pair<NodeIDSizeT, EdgeSizeT> grgl::writeGrg(const GRGPtr &grg, std::ostream &out, bool allowSimplify)#

Serialize the GRG to the given outstream.

Parameters:

grg – [in] The GRG to be serialized.
out – [in] The (binary) output stream.

MutableGRGPtr grgl::readMutableGrg(IFSPointer &inStream, bool loadUpEdges)#

Deserialize the GRG from the given input stream.

Parameters:: inStream – [in] The (binary) input stream.

GRGPtr grgl::readImmutableGrg(IFSPointer &inStream, bool loadUpEdges)#

MutableGRGPtr grgl::convertTreeSeqToGRG(const tsk_treeseq_t *treeSeq, bool binaryMutations, bool useNodeTimes, bool maintainTopology, bool computeCoals, std::pair<size_t, size_t> treeRange)#

Convert a tskit tree-sequence object into a GRG.

Parameters:

treeSeq – [in] The tskit TreeSequence object.
binaryMutations – [in] Set to true to emit Mutations as 0 (ref) and 1 (any alt).
useNodeTimes – [in] Set to true to set the Mutation’s time based on its tskit node time, instead of its tskit mutation time.
maintainTopology – [in] Set to false to save some time and space when converting to GRG, at the cost of not capturing all topology changes due to recombinations “unseen” by Mutations above them. WARNING: This causes inaccurate sample-to-mutation mapping when there are back mutations in the trees (nested mutations at the same site).
computeCoals – [in] Set to true to compute the number of individuals that coalesce at each node. Required if you want to compute the exact variance for each mutation when performing diploid genotype matrix operations (e.g., GWAS) later. However, you can often just use the binomial approximation to variance and leave this as false. Computing the coalescences is expensive on large TreeSequences.
treeRange – [in] Optional range of tree indices. If provided, only trees within that range [first, last) are converted (first is inclusive, last is exclusive).

std::pair<NodeIDSizeT, EdgeSizeT> grgl::simplifyAndSerialize(const GRGPtr &grg, std::ostream &outStream, const GRGOutputFilter &filter, bool allowSimplify)#

Visitors#

enum grgl::TraversalDirection#

Values:

enumerator DIRECTION_DOWN#

enumerator DIRECTION_UP#

enum grgl::DfsPass#

Values:

enumerator DFS_PASS_NONE#

enumerator DFS_PASS_THERE#

enumerator DFS_PASS_BACK_AGAIN#

class GRGVisitor#

Visitor abstract base class for traversing the nodes of a GRG.

To traverse a GRG, create a class that inherits from GRGVisitor and overrides the visit() function. Then use the GRG::visitDfs(), GRG::visitBfs(), or GRG::visitTopo() functions and pass your visitor as an argument.

Subclassed by AlleleFreqVisitor, ZygosityInfoVisitor, grgl::DfsSampleCountVisitor, grgl::FrontierVisitor, grgl::GRGCoalCalcVisitor, grgl::GRGCoalMarkVisitor, grgl::NodeHasherVisitor, grgl::NodeMapperVisitor, grgl::NodeNumberingVisitor, grgl::RenumberAndWriteVisitor, grgl::TopoCandidateCollectorVisitor, grgl::TopoCoalCounter, grgl::TopoOrderVisitor, grgl::TopoSampleSetVisitor

Public Functions

virtual bool visit(const GRGPtr &grg, NodeID node, TraversalDirection direction, DfsPass dfsPass = DFS_PASS_NONE) = 0#

Visit a node in the GRG.

Parameters:

grg – [in] The graph.
node – [in] The current node we are visiting.
direction – [in] Whether we’re traversing bottom-up (DIRECTION_UP) or top-down (DIRECTION_DOWN).
dfsPass – [in] For DFS traversals, is this the forward (DFS_PASS_THERE) pass or the backward (DFS_PASS_BACK_AGAIN) pass after we have computed values for all successors.