Abstract
Boolean networks are commonly used to model biological pathways and processes, in part because there are analyses for finding all of their possible long-term outcomes. Here we describe a Boolean network analysis that captures both the long-term outcomes of a heterogeneous population, as well as the transient behavior leading up to those outcomes. In contrast to other approaches, our method gives an explicit simulation through time using the composition of the mixed population without having to track each subpopulation individually, thus allowing us to simulate heterogeneous populations. This technique accurately models the dynamics of large populations of deterministic, probabilistic or continuous-time Boolean networks that use either synchronous or asynchronous updating. Our method works by treating the network dynamics as a linear system in a variable space that includes products of the Boolean state variables. We show that these product-basis analyses can help find very rare subpopulations or behaviors that sampling-based analyses would likely miss. Such rare events are critical in processes such as the initiation and progression of cancer, and the development of treatment resistance.
Mathematical models are an important part of testing and extrapolating our knowledge of biological systems [1], but they can be difficult to fully analyze. A straightforward way to study a model is to simulate individual instances of that model using a random sampling technique such as Monte Carlo [2]. These simulations can be run very efficiently, allowing the use of complex models extending even to whole-cell simulations [3]. A major drawback to random sampling is that simulations have difficulty capturing rare events such as those that initiate biological processes leading to novel and potentially disease-related cellular phenotypes [4]. For example, tumor initiation, progression, and the survival of select cells following drug treatment all require rare alterations to arise and clonally expand to eventually dominate the population in the long run [5–7]. While one can bias Monte Carlo to oversample certain outcomes by artificially raising or lowering global parameters such as a mutation rate, and then post-correct for the biased sampling (a strategy known as importance sampling [8]), this does not help find outcomes that are rare just because they require a very particular starting state or set of mutations.
An alternative approach to random sampling techniques is to study a model analytically in order to learn about all possible behaviors or outcomes even if they are rare. Analytic results are difficult to obtain with complex models, but significant advances have been made in analyzing Boolean networks [9–16], which are very simple models built entirely from on/off variables. In particular, the focus has been on extracting the possible long-term outcomes, or attractors, of Boolean models. An attractor may be a stable state (steady state) or else a repeating sequence of states (limit cycle). Attractors have been found using network-reduction algorithms that find simple networks encoding the long-term behavior of more complex networks [9, 11, 17], methods that solve steady states as zeros of a polynomial equation [18], SAT methods [13, 14, 19], and binary decision diagrams [15, 16, 20]. See the introduction of Ref. [10] for a review of these techniques.
In between Monte Carlo simulations of individuals and attractor analyses of populations, there remains an unaddressed challenge: methods dealing directly with populations do not explicitly track their dynamics in the way that a simulation does. Therefore the power of exact analyses has not been applied to the early-time ‘transient’ behavior of populations, or used to connect initial states of different subpopulations to the attractors they fall into. Here we present an analytical method for simulating a heterogeneous population of Boolean networks, without having to simulate each network individually. The simulations are exact, so they capture every subpopulation of the model and every event that occurs, no matter how rare. The feasibility of building these simulations depends on the size and topology of the network (for complex or highly-recurrent networks it can be an exponential problem), though we can simplify the difficult cases by ignoring the first few time steps of the simulation. In the limit where we focus only on the longest-term behavior, the output describes the attractors of the network.
Our method applies to deterministic [1], probabilistic [21] and continuous-time [22] Boolean networks, and finds all attractors with equal computational effort (some attractor-finding techniques have more difficulty with limit cycles than steady states). In this paper we only analyze synchronous Boolean network models (i.e. networks whose variables all update together in discrete time steps), but this is completely general in the large-population limit because any infinite population evolves deterministically even if the individuals are stochastic. One major benefit to our approach is its simplicity, as it follows only two rules: 1) work in a linear basis whose variables are products of the Boolean state variables, and 2) ignore quickly-decaying modes if we are looking at late-time behavior. Based on (1) we refer to our analysis as a product-basis method.
Results
To demonstrate our method, we applied it to the T-cell activation network described in Ref. [23] (see Figure 10 and Table 2 of that paper). This is a deterministic, 40-node network with fifty-four edges containing multiple feedback loops, and whose attractors include both steady states and limit cycles. To use our method, we first provided a target set of variables to follow in time, which the product-basis algorithm used to generate a set of time-evolution equations involving those variables (along with other variables that were added automatically to close the system of equations). We chose to track three variables: the ligand-binding state of the T-cell receptor TCRBOUND, the phosphorylation state TCRP, and the co-occurrence of binding and phosphorylation. The starting population we considered was a uniform mixture of all possible 240(≈ 1012) initial states of the Boolean network. We generated the product-basis time-evolution equations, and used them to track the population-level average of each of the three variables of interest for 50 time steps (Figure 1A). It should be noted that the co-occurrence variable TCRBOUND and TCRP is not simply the product TCRBOUND × TCRP. For example if TCRBOUND = 0.5 and TCRP = 0.5, then TCRBOUND and TCRP could equal 0.25 if the binding and phosphorylation states are uncorrelated in the population, 0 if they are perfectly anticorrelated, 0.5 if they are perfect correlated, or any other value on the interval [0, 0.5].
Next we demonstrated the ability of the product-basis method to analyze mutations in the network by including the full set of possible gene knock-outs in the T-cell activation network. We did this by adding a set of ‘wild-type’ variables to the network, one for each original variable in the system, and included the wild-type variables in the update rules using an and operation. For example, an update rule reading A ← B or C became A ← (B or C) and AWT. We also adjusted the initial populations so that the non-wild-type genes always began off. The initial population contained each possible combination of knock-out mutations at a 1% mutation rate per variable and each possible combination of starting states compatible with each knockout set, spanning on the order of ≈ 1024 different subpopulations. We followed the time course of the TCRbound and TCRp variable and compared it to our original wild-type result (Figure 1B). We also validated the result (to within statistical error) using Monte Carlo.
Our results from the T cell network demonstrate several important aspects of our method. First, we are able to simulate extremely heterogeneous populations, involving far more subpopulations than could be analyzed one-byone. Second, although our method only deals with heterogeneity in the states of the Boolean variables, we can still simulate a genetically-heterogeneous population by augmenting the Boolean network with mutation variables. Third, we can exactly model subpopulations that were present at very low levels, and our exact result tracked these rare subpopulations over time far more precisely than Monte Carlo simulations can (see the error bars in Figure 1B). For example, the contribution of each triplemutant was factored in even though a given triple-mutant was present in only 0.0001% of the population. While one might artificially raise the Monte Carlo mutation rate to oversample the mutations [8], this has the disadvantage of overweighting the effect of multiple mutants even though realistic evolutionary paths take one or very few mutational steps at a time [24]. In contrast, our exact result is dominated by the evolutionarily-accessible subpopulations that are closest to wild-type.
The code used to generate these results is named tCellActivationEx.m, and is available for download at https://github.com/CostelloLab/ProductBasis. The equation-generating process for Figure 1A took ~ 0.4 seconds using our code (written in MATLAB R2015b 8.6.0.267246, running on a 2.6GHZ Intel core i7 Mac with OS 10.9.5). We checked this exact result against nruns = 104 Monte Carlo runs, which took longer (~ 290 seconds). Note that Monte Carlo error is proportional to .
Methods
Simulations of mixed populations
The principle behind our method is to write the time evolution of each variable in our network using a linear equation. Doing so guarantees that the dynamical equations
we derive for a single cell also fully describe the dynamics of a mixed population of cells, owing to the superposition property of linear equations.
The key to writing linear equations is to introduce a new variable to represent each nonlinear term in a naïve update rule, acknowledging that we will have to solve for the dynamics of this new variable as well. In our case, each nonlinear term is a product of Boolean variables, so the update rule for its respective introduced variable will be a product of the constituent Boolean update rules. We demonstrate this procedure using Example 1.
Suppose we want to track the time evolution of variable A in the network shown in Figure 2. Since this network evolves by discrete time steps, we write xA(t + 1) = fA(xB) where fA performs the not operation. A linear equation implementing the not gate is:
Evidently, in order to follow xA over all time we must also track the state of its input variable B over time. B implements an and gate which is a nonlinear operation: fB = xA · xC. To make the equation linear, we introduce xAC = xA · xC, which is 1 if and only if both A and C are on, and write fB in terms of this new variable.
We still need to calculate fAC for our new variable xAC, which is simply the product fA · fC. (Proof: fAC = xAC(t + 1) = xA(t + 1) · xC(t + 1) = fA · fC.) fC implements an or gate whose linear equation involves yet another product variable xAB.
The formula for fAC made use of the fact that any Boolean value squared equals itself: for example xB · xB = xB, and xB · xAB = xB · xA · xB = xAB.
The process of replacing product terms with new variables, and then solving for the time evolution of those new variables, continues until the equations form a closed system: each variable’s time evolution is in terms of other variables in our system.
This gives us a closed linear system. To avoid the constant term we can rewrite Eq. (1.1) as fA = x∅ − xB, where x∅ = 1 updates according to:
Equations 1.1–1.6 together with an initial state in (xA, xB, xAC, xAB, xABC) describe the time evolution of these quantities in a single Boolean network as a sequence of 0s and 1s in each variable. The final step is to reinterpret these equations as describing the dynamics of a mixed population of networks, by assigning to each variable the fraction of that population having that variable set to 1. So whereas for a single network the value of xA should always be either 0 or 1, for a mixture of networks in which 40% of the population has gene A set on we would set xA = 0.4. Owing to the superposition property of linear systems, Equations 1.1–1.6 that were derived in the context of a single network also exactly model any mixed ensemble of these networks.
Eqs. 1.1–1.6 in Example 1 contain a single linear dependency: fAB = fABC. Therefore after the first time step xAB will equal xABC: we write . We use this fact to eliminate xAB, giving a new set of steady state equations:
Our new set of equations has the same non-zero eigenspace as the original set (1.1–1.6), except Eq. 1.4′ is only valid from the second time step onwards. However, the equations lack the null eigenspace because we removed the only linear dependency. States lying in the null eigenspace by definition decay and therefore correspond to transients in the time evolution, whereas eigenvectors whose eigenvalues have magnitude 1 do not decay and are part of the final attractor states. The 5 eigenvalues all have phases that are multiples of 2π/5, indicating that the sole attractor is a limit cycle with a period of 5 time steps. The states are: (100) → (101) → (111) → (011) → (001) →…at which point the sequence repeats.
Any state or mixture of states can be written as a linear combination of product-basis variables {x}, because these variables form a complete basis spanning the state space (see Appendix 1 for a proof; also Ref. [18] proves a similar result for a slightly different Boolean algebra). Since each time-evolution function f is a sum over all states causing a ‘1’ in the output variable when written in the state basis, it follows that each f is also a linear combination of our x variables. Therefore our procedure for modeling a mixed population always works in principle, even if some networks require too many equations for this method to be practical.
We can extend mixed-population modeling to probabilistic [21] and continuous-time [22] Boolean networks. Probabilistic Boolean networks require no changes in the algorithm; the only difference is that the polynomial coefficients in our equations may not be integers. For example, if the not gate in Fig. 2 is leaky with A turning on with a probability of 0.9/0.2 if B was off/on at the last time step, then the transition rule for gene A becomes fA = 0.9 − 0.7xB. As before, a linear equation can always be written because a) a linear equation can still be written in the state space basis, and b) our x variables are just a different basis covering the state space. Probabilistic networks give one way to incorporate rate information into our model; another way is to work in continuous time using differential equations: fA = dxA/dt. The differential form does require one change in our method: the rate of change of a higher-order variable is found by using the product rule of derivatives. Whereas under a discrete update fABC… is the product fA · fB · fC ·…, for the differential case we compute:
Also, under discrete updates the trivial function is f∅ = 1 but with differential updates it is f∅ = 0.
Long-term behaviors
A mixed-population simulation may or may not be practical, depending on whether the system of linear equations closes with a manageable number of variables n. In the worst case, a significant fraction of the entire x-variable space is involved. By counting subscripts we know that there are n = 2N product variables associated with an N-Boolean network, which is expected because our x-variables are simply a change of basis from the state space (see Appendix 1). Therefore the problem has potentially exponential complexity.
One way to make progress even when a closed system of equations is unmanageable is to focus on the attractors (steady states or limit cycles). The attractors are governed by a linear space whose size is determined by the number of attractor states, which for biological networks is usually much smaller than the full equation space. Mathematically, this means that our linear equations form a very degenerate system: if there are only n* steady states then there are only n* non-zero eigenvalues and n* linearly independent equations. So for a 50-node network with a single steady state attractor we might have n = 250 ≈ 1015 in the worst case, but n* = 1, which is a vastly smaller linear system. To find only the structure of the final-state space we select n* linearly independent variables, substitute them for the other variables in the time-evolution functions, and do an eigenvalue analysis on the much-smaller n* × n* system. A continuation of Example 1 gives a simple demonstration of this procedure.
There would be no time savings if we only eliminated a variable after we had already computed its time-evolution function. Fortunately, each linear dependency constrains not only the dependent variables, but also any variable that is factorized by a dependent variable (i.e. has all of the dependent variable’s subscripts). Thus a dependency involving a low-order variable with few indices can exclude a significant fraction of the variable space. We find these constraints concurrently with the process of adding equations, and thereby avoid having to evaluate a significant fraction of our variable space.
Constraints are traditionally enforced by substitution; however, there are two problems with substituting constraint expressions in our case. First, there is no guarantee that when applying two constraints the second will not undo the work of the first, for example by reintroducing an index that the first eliminated. That is, there may be no self-consistent solution that uses all constraints. The second problem is that two dependencies might constrain overlapping indices on the same variable. For example, in the network of Figure 3, variable xABC is subject to constraints from both xAB and xBC, and substituting either constraint would eliminate the subscript ‘2’ that the other constraint requires. We avoid both these problems by multiplying constraints rather than substituting them, using the fact that we can freely duplicate (or remove duplicates of) constrained indices, because a Boolean raised to any positive power equals itself. This process forces the removal of all variables containing certain indices and lacking others, of which there is always at least one.
Suppose we want to find the long-time behavior of Boolean variables A, C and E in the network of Figure 3. After two iterations of solving for time-evolution functions we have:
At this point there are two linear dependencies: fAB = fD and fBC = fB, implying that
Since we are only interested in the long-time behavior, we will use these constraints to simplify our equations. For example if we were to retain xABC it would be affected by the relationships involving xAB, xB and xBC, and it would not be possible to enforce all of these by substitution because there is only one B-index on xABC. But our method enforces constraints by multiplying them:
More generally, the first constraint attaches an AB index to every variable containing a D, and a D index to each variable with AB indices, and the second constraint adds a C index to every variable with a B index.
Constraining our system and eliminating disused variables gives us
Our new equations require us to solve for another variable (while applying the constraints):
The system is now closed (5 equations involving 5 variables), so if our goal is to produce a simulation (valid from time step 2 onwards) then we are done. However, if our objective is to find the attractors then we must remove another dependency that was unmasked by the last equation: , which implies three useful constraints.
Each constraint reduces the size of the variable space: for example, the first eliminates all variables containing index A with no C, E or BC. We did not solve for xBC because doing so would not eliminate any variables, because the xC term does not attach any new indices.
After applying the new constraints we obtain:
This produces new dependencies: and , implying that
With these constraints our example ends with the following equations:
The attractor is always reached at or before time step 3. The constraints map our original variables to linear combinations of variables in the final system: for example xA is mapped by constraint C4 to xAC + xAE −xABC, then mapped using constraints C1, C7 and C8 to xAC, whose dynamics are given by the final time-evolution equations. The eigenvalues of this final system are (−1, 1, 1), implying a steady state along with a period-2 cycle.
Each dependency produces at least one constraint that permanently eliminates at least one of the dependent variables, and usually many other variables as well. Proof: if the dependency contains only one term then that variable is zero, eliminating it and all variables it factorizes. If there is more than one term in the dependency then each lowest-index variable (which may be x∅ = 1) accumulates at least one index from any other variable in the dependency. Therefore each lowest-index variable is always eliminated. Corollary: the calculation eventually terminates because there are a finite number of variables, and each new linear dependency removes at least one further variable.
Applying the equation-reduction method to each dependency in our linear system F = {fi} is guaranteed to eliminate the entire null space of F, simply because variables will be removed from the system until there are no more dependencies. On the other hand, our equation-reduction method does not affect the non-null eigenspace involving the variables of interest, because the constraints map those variables to variables in the final equations: Xfinal = C · Xinterest. Therefore the long-time behavior is accurately modeled by Finterest = CT FfinalC, which therefore contains all the persistent eigenmodes.
All eigenvalues of F have modulus either 0 or 1 in a deterministic Boolean network, owing to the fact that the state space (and therefore by Appendix 1 our product-basis space) is finite. Proof: every state has a unique decomposition in terms of a set of generalized eigenvectors, so a state at time step a that repeats at time step b will have the same component of each eigenvector. Since there are a finite number of states one can always find such time steps a and b. Thus for each eigenvalue λ we have λa = λb, which implies that either λ = 0, or else |λ| = 1 and arg(λ) = 2πk/(b − a) for integer k.
Probabilistic and asynchronous Boolean networks
Our method supports modeling large populations of probabilistic Boolean networks (PBNs) [21, 25], in which several state transitions are possible at each time step, and the various transitions may have different probabilities. In the limit where the population of PBNs becomes infinite, each possible state transition occurs in a fraction of the population proportional to its likelihood in an individual. From the standpoint of our method, this implies that the coefficients in the fi equations of a PBN become real-valued, but the process of building the fi equations is unchanged.
The time-evolution equations F of a PBN in general contain eigenmodes having real-valued eigenvalues whose modulus is on the interval [0, 1]. (A modulus larger than 1 would represent a mode growing without bound, which is impossible because in the state-space basis the fraction of the population in each state bi is restricted to the interval [0, 1]). Therefore, unlike a deterministic network, a PBN can have slowly-decaying modes with eigenvalues between 0 and 1. For PBNs we generalize our equation-reduction method to identify decaying modes before all fi have been solved (i.e. before F is a square matrix), by identifying modes m having the property m·F = λ [m 0]. We discard these modes after they have become sufficiently small, defined by where are the ‘starting times’ of the involved equations and ϵ is a user-defined threshold.
Large populations of asynchronous networks behave identically to large populations of PBNs [26] if we define a uniform time step: the likelihoods of the various possible updates give the state-transition weights in the corresponding synchronous PBN. Therefore our analysis also applies to large populations of asynchronous networks. The conversion of an asynchronous updating scheme to a PBN lowers some of the eigenvalues that would otherwise be of magnitude 1 in a synchronous version of the network, and therefore our equation-reduction method can prune asynchronous networks more aggressively than their synchronous counterparts.
Calculational notes
When we attempt to simplify a network by removing dependencies, the order in which we calculate the time evolution of new variables and look for new dependencies greatly influences the total number of variables that will be involved before the linear system closes. That is because the constraints coming from different dependencies simplify the system to different degrees. In general, constraints on variables having the fewest indices are most helpful, because they factorize the largest part of the variable space. Following this rule of thumb, our implementation solves only the fewest-index variables between each search for new dependencies. Additionally, we add low-index factors of new variables even if they were not directly involved in earlier equations. In our tests, this prioritization method greatly speeds up the calculation.
Certain constraints can multiply the prefactors of some variables in such a way that the prefactors can exceed the integer limit if enough of these constraints are applied. For example, the constraint x1 = x2 + x3 takes x123 → 2x123. Typically the variable being multiplied would be found to be zero later in the calculation, so having a coefficient of 2 is not an error in the math, but it can make calculations impractical if the doubling happens repeatedly and the coefficient becomes very large. Fortunately, one can usually identify these outlier coefficients and remove them. For example, if f4 = 2x123 then we can reason that if f4 and x123 are both bounded by 0 and 1, then we have a new constraint x123 = 0 which simplifies f4 → 0. We identify these problematic coefficients using the heuristic that if a large coefficient multiplying a combination of integer-weighted variables is greater than the sum of all other coefficients, then that combination of variables (which is an integer for a homogeneous population) must be zero.
Our product basis has the property that any product of variables is itself a variable in the basis, so that polynomials need never contain products of these variables. In the product basis, multiplication is interpreted as a union operation on the indices of the variables being multiplied. To place these rules in a mathematical context, this algebra is a commutative ring with an idempotent multiplication .
Discussion
Our product-basis method allows the direct simulation of highly heterogeneous populations, including the transient processes that are generally ignored by analytic methods, as well as the steady states and limit cycles. This approach can be used to follow single variables of the system over time, as well as the correlations between these variables that are both necessary and sufficient to fully describe the dynamics of the population. It can account for the effect of mutations as well as variability in network state throughout the population, and exactly accounts for very rare subpopulations. An extension of the method allows the set of equations to be simplified while still capturing the long-term behavior. The only requirement for use of our approach is that the underlying model be built using Boolean variables.
The advance in our method is to write the time-evolution equations as a linear system, but in a different basis than the usual state space basis. Our variables have several advantages over state space variables. First, descriptors of a mixed population naturally use words that correspond more closely to our variables than to individual states. For example, we might specify that half the population starts with both genes A and B on, which implies that xAB = 0.5 but is agnostic about the state of other variables. Another advantage is that our equations often close using relatively few of our product variables for any mixed population, whereas the number of equations required in the state space basis scales with the heterogeneity of the population: the simulations we showed in Figure 1 would require all 240 state space variables. Thus our choice of variables is superior for modeling very heterogeneous populations. Finally, our basis allows some variables to factorize others, allowing us to vastly simplify the calculation in many cases where we only care about the long-term behavior.
We acknowledge that our method can become intractable for complicated networks due to the fact that the construction of these simulations is potentially an exponential problem. A full simulation can require up to 2N equations to model, and even the attractor analysis is known to be NP-hard [27]. Large size, complicated logic rules and certain feedback loops in particular seem to cause the equation set to be large. These are fundamental limitations. However, the attractor analysis depends on an equation-reduction scheme that is somewhat of an art, and we anticipate that future work will greatly improve this part of the calculation for typical network models.
Our method can be applied to any system involving heterogeneous populations, as long as the individuals in a population can be modeled using Boolean logic. Heterogeneity plays a major role in such varied systems as-healthy and cancerous tissues, evolution at the organism scale, and the social dynamics of unique individuals [28]. In all of these cases, rare and unexpected dynamics are difficult to capture by simulations of individuals, while pure attractor analyses may miss important aspects of the dynamics. We believe that the methodology outlined here can help to capture these important but elusive events.
Acknowledgments
This work is supported by the Boettcher Foundation (J.C.) and NIH grant 2T15LM009451 (B.R.).
Appendix 1 correlation variables form a complete and independent basis
For N Boolean variables there are 2N variables in the state space basis (b variables): this is just the number of states of the system. Likewise there are 2N variables in the product space basis (x variables) because there are 2N combinations of subscripts on these variables: each of N subscripts may be present or absent. Therefore the two spaces have the same number of variables, but this does not prove that the product-basis spans the entire Boolean space. In order to show that the x-variables form a complete basis in b-space, we imagine explicitly writing the transformation matrix from b-space to x-space. For example, for the case of three Boolean variables this matrix is:
The transformation matrix is upper-triangular, with ones on the diagonal. The reason is that each x-variable is turned on by the b-variable having the same indices (hence the ones along the diagonal), and by any b-variables containing additional indices implying a higher position in the matrix (since the indices are arranged in binary order of their subscripts); however an x-variable is never turned on by a b-variable that is missing one of its indices, which is why the lower triangular block is empty. Since the matrix is triangular the eigenvalues are the ones on the diagonal, so the determinant is one, the transformation is non-degenerate (and volume-preserving) and therefore the x-space spans the b-space.
Appendix 2 algorithm and code
Pseudocode is given in Algorithm 1. The full code is available at: https://github.com/heltilda/ProBaBool.
build closed system of equations F
1: Initialize set of unsolved variables with variables of interest: X ← {x1, x2,…, xn} 2: Initialize set of variable update rules: F ← ∅ 3: Initialize set of constraints: C ← ∅ 4: Initialize simulation start time: tsim ← 1 5: Initialize set of equation start times of F: TF ← ∅ 6: Initialize set of constraint start times: TC ← ∅ 7: while X is not empty do 8: % Reduce equations if necessary 9: if size(F) > equation reduction threshold then 10: F□ ← square-matrix component of F (i.e. only terms in solved variables for each fi) 11: λ ← set of eigenvalues of F□ 12: for each λi ∈ λ do 13: G ← F − λi[I 0] 14: R ← upper-triangular matrix from QR(G) 15: Vj ← set of dependent variables j found from |Rjj| < ϵ 16: D ← set of linear-dependencies found by 17: if Vj = ∅ then 18: tsim ← min(tsim,min(TF (xk ∈ D)) + max(1,ceiling(log ϵdecay/log |λi|))) 19: break 20: end if 21: end for 22: for each di ∈ D do 23: NewConstraints(di, tsim) 24: end for 25: Sort C by number of indices 26: F ← Constrain(F,1) 27: X ← X ∪ lowest-index factors of X 28: F ← Constrain(F,1) 29: Sort X by number of indices 30: end if 31: % Add new equations 32: for each xi ∈ X do 33: fi ← 1 34: for each Boolean factor xb of xi do 35: fi ← fi · fb 36: end for 37: F ← F ∪ Constrain(fi,1) 38: end for 39:end while 40:function NewConstraints(d, tc) 41: for each xj ∈ d do 42: RHSj ← solve (d = 0) for xj 43: if {xj } = Constrain({RHSj }, tc)[1] then 44: for each prior constraint (xp = RHSp) do 45: if xp is factorized by xj and xp · RHSj = RHSp then 46: C ← C \ {cp} 47: end if 48: end for 49: C ← C ∪ {xj = RHSj} 50: for each xk = multiples of xj do 51: Xnew ← new terms in xk · RHSk 52: X ←X∪Xnew 53: A ← {al = coefficients of xk in F} 54: F ← F + A · (RHSk − xk) 55: (tfl for all l such that al = 0) ← tc 56: end for 57: end if 58: end for 59:end function 60:function Constrain(P olys, tc) 61: for each polyi ∈ P olys do 62: while ∃ unconstrained terms in polyi do 63: for each constraint xi = RHSi do 64: % Multiply all factored variables by the constraint 65: for each unconstrained variable xj with coefficient aj in polyi do 66: if xj is factorized by xi then 67: polynew ← xj · RHSi 68: if xj⊅ polynew then 69: polyi ← (polyi − aj · xj) ∪ polynew 70: tfi ← tc 71: end if 72: end if 73: end for 74: % Check for overlarge coefficients 75: while ∃ unchecked coefficients in polyi do 76: for each coefficient ai in polyi do 77: nj = round(aj/ai) σ 78: if|ai| > ∑j |aj −nj ai|then 79: NewConstraints(nj, tsim) 80: break 81: end if 82: end for 83: end while 84: end for 85: end while 86: end for 87: return Polys 88:end function
Footnotes
↵* E-mail: james.costello{at}ucdenver.edu