# **Sequential Logic Synthesis**

# Giovanni De Micheli Integrated Systems Centre EPF Lausanne







This presentation can be used for non-commercial purposes as long as this note and the copyright footers are not removed

© Giovanni De Micheli – All rights reserved

# Module 1

## Objective

- ▲ Motivation and assumptions for sequential synthesis
- ▲ Finite-state machine design and optimization

# **Synchronous logic circuits**

## Interconnection of

▲ Combinational logic gates

▲ Synchronous delay elements ▼ Edge-triggered, master/slave

Assumptions

No direct combinational feedback

▲ Single-phase clocking

### Extensions to

▲ Multiple-phase clocking

#### Gated latches

# **Modeling synchronous circuits**

### Circuit are modeled in hardware languages

- ▲ Circuit model may be directly related to FSM model
  - ▼ Description by: switch-case
- ▲ Circuit model may be structural
  - Explicit definition of registers

# Sequential circuit models can be generated from high-level models

▲ Control generation in high-level synthesis

# **Modeling synchronous circuits**

#### State-based model:

- ▲ Model circuits as finite-state machines (FSMs)
- ▲ Represent by state tables/diagrams
- ▲ Apply exact/heuristic algorithms for:
  - ▼ State minimization
  - ▼ State encoding

## Structural model

#### ▲ Represent circuit by synchronous logic network

- ▲ Apply
  - ▼ Retiming
  - ▼ Logic transformations

## **State-based optimization**



# **Modeling synchronous circuits**

## Advantages and disadvantages of models

- State-based model
  - ▲ Explicit notion of state
  - Implicit notion of area and delay

### Structural model

- ▲ Implicit notion of state
- Explicit notion of area and delay

### Transition from a model to another is possible

#### ▲ State encoding

#### ▲ State extraction

(c) Giovanni De Micheli

# **Sequential logic optimization**

# Typical flow

## ▲ Optimize FSM state model first

- ▼Reduce complexity of the model
- ▼E.g., apply state minimization
- ▼ Correlates to area reduction

## ▲ Encode states and obtain a structural model

- ▼Apply retiming and transformations
- ▼Achieve performance enhancement
- ▲Use state extraction for verification purposes

- A set of primary input patterns X
- A set of primary output patterns Y
- A set of states S
- A state transition function:  $\delta$ : X × S  $\rightarrow$  S
- An output function:
  - **\land**  $\lambda$ : X  $\times$  S  $\rightarrow$  Y for Mealy models
  - **\land**  $\lambda$ :  $S \rightarrow Y$  for Moore models

Classic problem

- ▲ Exact and heuristic algorithms are available
- ▲ Objective is to reduce the number of states and hence the area
- Completely-specified finite-state machines
  - ▲ No don 't care conditions
  - ▲ Polynomial-time solutions
- Incompletely-specified finite-state machines
  - Unspecified transitions and/or outputs
    - ▼ Usual case in synthesis
  - ▲ Intractable problem:
    - ▼ Requires binate covering

### State minimization for completely-specified FSMs

## Equivalent states:

▲ Given any input sequence, the corresponding output sequence match

## Theorem:

- ▲ Two states are equivalent if and only if:
  - ▼ They lead to identical outputs and their next-states are equivalent

## Equivalence is transitive

- Partition states into equivalence classes
- ▲ Minimum finite-state machine is unique

#### State minimization for completely-specified FSMs

Stepwise partition refinement:

▲ Initially:

- ▼ All states in the same partition block
- ▲ Iteratively:
  - ▼ Refine partition blocks
- ▲ At convergence:
  - Partition blocks identify equivalent states

#### Refinement can be done in two directions

- ▲ Transitions *from* states in block to other states
  - ▼ Classic method. Quadratic complexity
- ▲ Transitions *into* states of block under consideration
  - ▼ Inverted tables. Hopcroft's algorithm.

# Initial partition:

▲П₁: States belong to the same block when outputs are the same for any input

Iteration:

▲ Π<sub>k+1</sub> : States belong to the same block if they were previously in the same block and their next states are in the same block of Π<sub>k</sub> for any input

# Convergence:

# $\mathbf{A}\mathbf{\Pi}_{k+1} = \mathbf{\Pi}_k$

| INPUT | STATE                 | N-STATE               | OUTPUT |
|-------|-----------------------|-----------------------|--------|
| 0     | $s_1$                 | <i>s</i> 3            | 1      |
| 1     | $s_1$                 | $s_5$                 | 1      |
| 0     | <i>s</i> <sub>2</sub> | <i>s</i> 3            | 1      |
| 1     | <i>s</i> <sub>2</sub> | $s_5$                 | 1      |
| 0     | <i>s</i> 3            | <i>s</i> <sub>2</sub> | 0      |
| 1     | <i>s</i> 3            | $s_1$                 | 1      |
| 0     | $s_4$                 | $s_4$                 | 0      |
| 1     | $s_4$                 | <i>s</i> 5            | 1      |
| 0     | $s_5$                 | $s_4$                 | 1      |
| 1     | $s_5$                 | $s_1$                 | 0      |



- $\mathbf{A}\Pi_1 = \{ \{ \mathbf{s}_1, \mathbf{s}_2 \}, \{ \mathbf{s}_3, \mathbf{s}_4 \}, \{ \mathbf{s}_5 \} \}$

# $\mathbf{A} \Pi_2$ is a partition into equivalence classes

## ▲No further refinement is possible



#### State minimization for incompletely-specified finite-state machines

### Applicable input sequences

▲ All transitions are specified

## Compatible states

Given any applicable input sequence, the corresponding output sequence match

## Theorem:

#### ▲ Two states are compatible if and only if:

- ▼ They lead to identical outputs
  - (when both are specified)
- ▼ And their next state is compatible
  - (when both are specified)

- Compatibility is not an equivalence relation
- Minimum finite-state machine is not unique
- Implication relation make the problem intractable
  - ▲ Two states may be compatible, subject to other states being compatible.
  - ▲ Implications are binate satisfiability clauses
    - ▼ a -> b = a'+b

| INPUT | STATE                 | N-STATE               | OUTPUT |
|-------|-----------------------|-----------------------|--------|
| 0     | $s_1$                 | <i>s</i> 3            | 1      |
| 1     | $s_1$                 | <i>s</i> 5            | *      |
| 0     | <sup>s</sup> 2        | <i>s</i> <sub>3</sub> | *      |
| 1     | <i>s</i> <sub>2</sub> | <i>s</i> <sub>5</sub> | 1      |
| 0     | <i>s</i> 3            | <sup>s</sup> 2        | 0      |
| 1     | 83                    | $s_1$                 | 1      |
| 0     | 84                    | <i>s</i> <sub>4</sub> | 0      |
| 1     | 84                    | <i>s</i> <sub>5</sub> | 1      |
| 0     | <i>s</i> <sub>5</sub> | \$4                   | 1      |
| 1     | <i>s</i> <sub>5</sub> | $s_1$                 | 0      |

## **Trivial method**

## Consider all possible don 't care assignments

- ▲ n don 't care imply
  - ▼ 2<sup>n</sup> completely specified FSMs
  - ▼ 2<sup>n</sup> solutions

## Example:

▲ Replace \* by 1

 $\mathbf{\nabla} \Pi_1 = \{ \{ s_1, s_2 \}, \{ s_3 \}, \{ s_4 \}, \{ s_5 \} \}$ 

▲ Replace \* by 0

$$\mathbf{v}\Pi_1 = \{\{s_1, s_5\}, \{s_2, s_3, s_4\}\}$$

#### Compatibility and implications Example

| Compatible states {s <sub>1</sub> , s <sub>2</sub> }   |
|--------------------------------------------------------|
| If {s <sub>3</sub> , s <sub>4</sub> } are compatible   |
| <b>A</b> Then $\{s_1, s_5\}$ are also compatible       |
| Incompatible states {s <sub>2</sub> , s <sub>5</sub> } |

| INPUT | STATE                 | N-STATE        | OUTPUT |
|-------|-----------------------|----------------|--------|
| 0     | $s_1$                 | <sup>8</sup> 3 | 1      |
| 1     | $s_1$                 | <sup>8</sup> 5 | *      |
| 0     | <sup>s</sup> 2        | <sup>8</sup> 3 | *      |
| 1     | <sup>s</sup> 2        | <sup>8</sup> 5 | 1      |
| 0     | <i>s</i> <sub>3</sub> | <sup>8</sup> 2 | 0      |
| 1     | <i>s</i> 3            | $s_1$          | 1      |
| 0     | 84                    | 84             | 0      |
| 1     | 84                    | <sup>8</sup> 5 | 1      |
| 0     | <i>s</i> <sub>5</sub> | 84             | 1      |
| 1     | s <sub>5</sub>        | $s_1$          | 0      |

TNIDL

#### Compatible pairs:

- $\blacktriangle \{ \mathbf{S}_1, \, \mathbf{S}_2 \}$
- $\blacktriangle \{ \mathtt{S}_1, \, \mathtt{S}_5 \} \leftarrow \{ \mathtt{S}_3, \, \mathtt{S}_4 \}$
- $\blacktriangle \{ s_2, s_4 \} \leftarrow \{ s_3, s_4 \}$
- $\blacktriangle \{ \textbf{s}_2, \, \textbf{s}_3 \} \leftarrow \{ \textbf{s}_1, \, \textbf{s}_5 \}$
- $\blacktriangle \{s_3, s_4\} \leftarrow \{s_2, s_4\} \ U \ \{s_1, s_5\}$

#### Incompatible pairs

- **▲** {**S**<sub>2</sub>, **S**<sub>5</sub>}
- ▲ {s<sub>3</sub>, s<sub>5</sub>}
- **▲** {**s**<sub>1</sub>, **s**<sub>4</sub>}
- **▲** {**s**<sub>4</sub>, **s**<sub>5</sub>}

#### **▲** {**s**<sub>1</sub>, **s**<sub>3</sub>}

| INPOT | STATE          | N-STATE               | 001901 |
|-------|----------------|-----------------------|--------|
| 0     | $s_1$          | <sup>s</sup> 3        | 1      |
| 1     | $s_1$          | s5                    | *      |
| 0     | <sup>s</sup> 2 | <sup>s</sup> 3        | *      |
| 1     | <sup>s</sup> 2 | <sup>s</sup> 5        | 1      |
| 0     | <i>s</i> 3     | <sup>s</sup> 2        | 0      |
| 1     | <i>s</i> 3     | $s_1$                 | 1      |
| 0     | 84             | 84                    | 0      |
| 1     | <i>s</i> 4     | <sup>s</sup> 5        | 1      |
| 0     | <i>s</i> 5     | <i>s</i> <sub>4</sub> | 1      |
| 1     | <i>s</i> 5     | $s_1$                 | 0      |

NSTATE

- A class of compatible states is such that all state pairs are compatible
- A class is maximal

▲ If not subset of another class

## Closure property

A set of classes such that all compatibility implications are satisfied

## The set of maximal compatibility classes

- ▲ Has the closure property
- ▲ May not provide a minimum solution

# **Maximum compatibility classes**

## • Example:

 $\{ \mathbf{S}_1, \mathbf{S}_2 \}$   $\{ \mathbf{S}_1, \mathbf{S}_5 \} \leftarrow \{ \mathbf{S}_3, \mathbf{S}_4 \}$   $\{ \mathbf{S}_2, \mathbf{S}_3, \mathbf{S}_4 \} \leftarrow \{ \mathbf{S}_1, \mathbf{S}_5 \}$ 

# Cover with all MCC has cardinality 3

# **Exact problem formulation**

#### Prime compatibility classes:

▲ Compatibility classes having the property that they are not subset of other classes implying the same (or subset) of classes

#### Compute all prime compatibility classes

#### Select a minimum number of prime classes

- Such that all states are covered
- ▲ All implications are satisfied
- Exact solution requires binate cover
- Good approximation methods exists

#### Stamina

## • Example:

 $\{ s_1, s_2 \}$   $\{ s_1, s_5 \} \leftarrow \{ s_3, s_4 \}$   $\{ s_2, s_3, s_4 \} \leftarrow \{ s_1, s_5 \}$ 

# Minimum cover:

- $\blacktriangle \{s_1, s_5\}, \{s_2, s_3, s_4\}$
- ▲Minimun cover has cardinality 2

## Determine a binary encoding of the states

- ▲ Optimizing some property of the representation (mainly area)
- Two-level model for combinational logic
  - ▲ Methods based on symbolic optimization ▼ Minimize a symbolic cover of the finite state machine ▼ Formulate and solve a constrained encoding problem

## Multiple-level model

Some heuristic methods that look for encoding which privilege cube and/or kernel extraction

#### Weak correlation with area minimality

| INPUT | P-STATE | N-STATE | OUTPUT |
|-------|---------|---------|--------|
| 0     | s1      | s3      | 0      |
| 1     | s1      | s3      | 0      |
| 0     | s2      | s3      | 0      |
| 1     | s2      | s1      | 1      |
| 0     | s3      | s5      | 0      |
| 1     | s3      | s4      | 1      |
| 0     | s4      | s2      | 1      |
| 1     | s4      | s3      | 0      |
| 0     | s5      | s2      | 1      |
| 1     | s5      | s5      | 0      |

#### • Minimum symbolic cover:

| * | s1s2s4 | s3 | 0 |
|---|--------|----|---|
| 1 | s2     | s1 | 1 |
| 0 | s4s5   | s2 | 1 |
| 1 | s3     | s4 | 1 |

#### • Encoded cover :

| * | 1** | 001 | 0 |
|---|-----|-----|---|
| 1 | 101 | 111 | 1 |
| 0 | *00 | 101 | 1 |
| 1 | 001 | 100 | 1 |

### Summary finite-state machine optimization

- FSM optimization has been widely researched
  - ▲ Classic and newer approaches
- State minimization and encoding correlate to area reduction
  - ▲ Useful, but with limited impact
- Performance-oriented FSM optimization has mixed results
  - ▲ Performance optimization is usually done by structural methods

# Module 2

## Objective

#### ▲ Structural representation of sequential circuits

#### ▲ Retiming

▲ Extensions

## **Structural model for sequential circuits**

#### Synchronous logic network

- ▲ Variables
- Boolean equations
- Synchronous delay annotation

### Synchronous network graph

- ▲ Vertices  $\leftrightarrow$  equations  $\leftrightarrow$  I/O, gates
- **A** Edges  $\leftrightarrow$  dependencies  $\leftrightarrow$  nets
- ▲ Weights  $\leftrightarrow$  synchronous delays  $\leftrightarrow$  registers



$$a^{(n)} = i^{(n)} \overline{\oplus} i^{(n-1)}$$
  

$$b^{(n)} = i^{(n-1)} \overline{\oplus} i^{(n-2)}$$
  

$$c^{(n)} = a^{(n)}b^{(n)}$$
  

$$d^{(n)} = c^{(n)} + d'^{(n-1)}$$
  

$$e^{(n)} = d^{(n)}e^{(n-1)} + d'^{(n)}b'^{(n)}$$
  

$$v^{(n)} = c^{(n)}$$
  

$$s^{(n)} = e^{(n-1)}$$

- $a = i \overline{\oplus} i$ @1
- $b = i@1 \overline{\oplus} i@2$
- c = a b

$$d = c + d@1'$$

- e = d e @1 + d' b'
- v = c
- s = e@1

# **Approaches to sequential synthesis**

## Optimize combinational logic only

- ▲ Freeze circuit at register boundary
- Modify equation and network graph topology

## Retiming

- ▲ Move register positions. Change weights on graph
- Preserve network topology

## Synchronous transformations

- Blend combinational transformations and retiming
- ▲ Powerful, but complex to use

## **Example of local retiming**



# Retiming

# Global optimization technique

- Change register positions
  - ▲Affects area:
    - ▼ Retiming changes register count
  - ▲Affects cycle-time
    - ▼ Changes path delays between register pairs

# Retiming algorithms have polynomial-time complexity

## **Retiming assumptions**

#### Delay is constant at each vertex

▲ No fanout delay dependency

#### Graph topology is invariant

▲ No logic transformations

#### Synchronous implementation

- ▲ Cycles have positive weights
  - ▼ Each feedback loop has to be broken by at least one register
- Edges have non-negative weights
  - ▼ Physical registers cannot anticipate time

#### Consider topological paths

▲ No false path analysis

# Retiming

- Retiming of a vertex v
  - ▲ Integer r<sub>v</sub>
  - ▲ Registers moved from output to input: r<sub>v</sub> positive
  - ▲ Registers moved from input to output: r<sub>v</sub> negative
- Retiming of a network
  - ▲ Vector whose entries are the retiming at various vertices
- A family of I/O equivalent networks are specified by:
  - ▲ The original network
  - ▲ A set of vectors satisfying specific constraints
    - ▼ Legal retiming

# Example



# **Definitions and properties**

#### Definitions:

- $\mathbf{A}$  w(v<sub>i</sub>, v<sub>j</sub>) weight on edge (v<sub>i</sub>, v<sub>j</sub>)
- $(v_i, ..., v_i)$  path from  $v_i$  to  $v_i$
- $\mathbf{A}$  w(v<sub>i</sub>, ..., v<sub>i</sub>) weight on path from v<sub>i</sub> to v<sub>i</sub>
- $\mathbf{A}$  d( $\mathbf{v}_i, ..., \mathbf{v}_i$ ) combinational delay on path from  $\mathbf{v}_i$  to  $\mathbf{v}_i$

## Properties:

- ▲ Retiming of an edge (v<sub>i</sub>, v<sub>i</sub>)
  - $\mathbf{v} \quad \hat{\mathbf{w}}_{ii} = \mathbf{w}_{ii} + \mathbf{r}_i \mathbf{r}_i$
- A Retiming of a path ( $v_i, ..., v_i$ )
  - $\mathbf{v}$   $\hat{\mathbf{w}}$  ( $\mathbf{v}_i, ..., \mathbf{v}_i$ ) = w ( $\mathbf{v}_i, ..., \mathbf{v}_i$ ) +  $\mathbf{r}_i \mathbf{r}_i$
- Cycle weights are invariant



# Legal retiming

A retiming vector is legal iff:

▲ No edge weight is negative

 $\mathbf{v} \ \hat{\mathbf{w}}_{ij} (\mathbf{v}_i, \mathbf{v}_j) = \mathbf{w}_{ij} (\mathbf{v}_i, \mathbf{v}_j) + \mathbf{r}_j - \mathbf{r}_i \ge 0$  for all i, j

- **\blacktriangle** Given a clock period  $\phi$ :
- ▲ Each path ( $v_i$ , ...,  $v_j$ ) with d ( $v_i$ , ...,  $v_j$ ) >  $\phi$ has at least one register:

 $\mathbf{v} \ \hat{\mathbf{w}} \ (\mathbf{v}_i, ..., \mathbf{v}_j) = \mathbf{w} \ (\mathbf{v}_i, ..., \mathbf{v}_j) + \mathbf{r}_j - \mathbf{r}_i \ge 1$  for all i, j

 $\blacktriangle$  Equivalently, each combinational path delay is less than  $\phi$ 

## **Refined analysis**

#### Least-register path

 $\mathbf{A} \mathbf{W} (\mathbf{v}_i, \mathbf{v}_j) = \min \mathbf{w} (\mathbf{v}_i, ..., \mathbf{v}_j)$  over all paths between  $\mathbf{v}_i$  and  $\mathbf{v}_j$ 

#### Critical delay:

▲ D  $(v_i, v_j) = \max d (v_i, ..., v_j)$  over all paths between  $v_i$  and  $v_j$  with weight W  $(v_i, v_j)$ 

There exist a vertex pair (v<sub>i</sub>, v<sub>j</sub>) whose delay D (v<sub>i</sub>, v<sub>j</sub>) bounds the cycle time

# Example



- •Vertices: v<sub>a</sub>, v<sub>e</sub>
- •Paths:  $(v_a, v_b, v_c, v_e)$  and  $(v_a, v_b, v_c, v_d, v_e)$
- •W(v<sub>a</sub>, v<sub>e</sub>) = 2

•D(v<sub>a</sub>, v<sub>e</sub>) = 16

## Minimum cycle-time retiming problem

- Find the minimum value of the clock period φ such that there exist a retiming vector where:
  - ▲  $\mathbf{r}_i \mathbf{r}_j \leq \mathbf{w}_{ij}$  for all  $(\mathbf{v}_i, \mathbf{v}_j)$

▼ All registers are implementable

▲  $r_i - r_j \le W(v_i, v_j) - 1$  for all  $(v_i, v_j)$  such that D  $(v_i, v_j) > \phi$ 

▼ All timing path constraints are satisfied

#### Solution

- **ightarrow** Given a value of  $m \phi$
- ▲ Solve linear constraints A r ≤ b
  - Mixed integer-linear program

▲ A set of inequalities has a solution if the constraint graph has no positive cycles

- Bellman-Ford algorithm compute longest path
- ▲ Iterative algorithm
  - ▼ Relaxation

## Minimum cycle-time retiming algorithm

- Compute all pair path weights W (v<sub>i</sub>, v<sub>j</sub>) and delays D (v<sub>i</sub>, v<sub>j</sub>)
  - ▲ Warshall-Floyd algorithms with complexity O( |V|<sup>3</sup> )
- ♦ Sort the elements of D (v<sub>i</sub>, v<sub>j</sub>) in decreasing order
  - $\blacktriangle$  Because an element of D is the minimum  $\phi$
- Binary search for a φ in D (v<sub>i</sub>, v<sub>j</sub>) such that
  - ▲ There exists a legal retiming
  - ▲ Bellman-Ford algorithm with complexity O( |V|<sup>3</sup> )

Remarks

- ▲ Result is a global optimum
- ▲ Overall complexity is O( |V|<sup>3</sup> log |V| )

## **Example: original graph**



Constraints (first type):

•  $r_a - r_b \le 1$  or equivalently  $r_b \ge r_a - 1$ 

• 
$$r_c - r_b \le 1$$
 or equivalently  $r_c \ge r_b - 1$ 

## **Example: constraint graph**



•Constraints (first type):

• 
$$r_a - r_b \le 1$$
 or equivalently  $r_b \ge r_a - 1$ 



• 
$$r_c - r_b \le 1$$
 or equivalently  $r_c \ge r_b - 1$ 

(c) Giovanni De Micheli

# Example

- Sort elements of D:
  - **3**3,30,27,26,24,23,21,20,19,17,16,14,13,12,10,9,7,6,3
- Select  $\varphi$  = 19
  - **▲** 33,30,27,26,24,23,21,20,19,17,16,14,13,12,10,9,7,6,3
  - Pass: legal retiming found
- Select φ = 13
  - **▲** 33,30,27,26,24,23,21,20,19,17,16,14,13,12,10,9,7,6,3
  - Pass: legal retiming found
- Select φ < 13</li>
  - **▲** 33,30,27,26,24,23,21,20,19,17,16,14,13,12,10,9,7,6,3
  - ▲ Fail: no legal retiming found
- Fastest cycle time is  $\phi$  = 13. Corresponding retiming vector is used

## Example $\varphi = 13$

 $r_a - r_e \le 2 - 1$  or equivalently  $r_e \ge r_a - 1$  $r_e - r_f \le 0 - 1$  or equivalently  $r_f \ge r_e + 1$  $r_f - r_g \le 0 - 1$  or equivalently  $r_g \ge r_f + 1$  $r_g - r_f \le 2 - 1$  or equivalently  $r_f \ge r_g - 1$  $r_g - r_c \le 3 - 1$  or equivalently  $r_c \ge r_g - 2$ 



# Example $\phi = 13$

◆Constraint graph:
 ◆Longest path from source

 -1



Retimed graph



## Example $\phi$ = 13



#### Most common algorithm for retiming

- ▲ Avoids storage of matrices W and D
- ▲ Applicable to large circuits

Rationale

- **\triangle** Search for decreasing  $\phi$  in fixed step
  - **v** Look for values of  $\phi$  compatible with peripheral circuits
- ▲ Use efficient method to determine legality
  - ▼ Network graph is often very sparse
- ▲ Can be coupled with topological timing analysis

- Start with a given cycle-time  $\phi$
- Look for paths with excessive delays
- Make such paths shorter
  - ▲ By bringing the terminal register closer
  - ▲ Some other paths may become longer
  - ▲ Namely, those path whose tail has been moved
- Use an iterative approach

#### Define data ready time at each node

- ▲ Total delay from register boundary
- Iterative approach
  - **A** Find vertices with *data ready* >  $\phi$
  - ▲ Retime these vertices by 1

#### Algorithm properties

- ▲ If at some iteration there is no vertex with *data ready* >  $\phi$ , a legal retiming has been found
- ▲ If a legal retiming is not found in |V| iterations, then no legal retiming exists for that  $\phi$

### **Example** $\varphi$ = 13 iteration = 1



## **Example** $\varphi = 13$ iteration = 2



#### **Example** $\varphi$ = 13 iteration = 3



## **Retiming for minimum area**

- Find a retiming vector that minimizes the number of registers
- Simple area modeling
  - ▲ Every edge with a positive weight denotes registers
  - ▲ Total register area is proportional to the sum of all weights
- Register sharing model
  - Every set of positively-weighted edges with common tail is realized by a shift registers with taps
  - ▲ Total register area is proportional to the sum, over all vertices, of the maxima of weights on outgoing edges

## Example









- Register variation at node v
  - $\land$  r<sub>v</sub> (indegree(v) outdegree(v))
- Total area variation:
  - $\Delta \Sigma r_v$  (indegree(v) outdegree(v))
- Area minimization problem:
  - $Min \Sigma r_v ( indegree(v) outdegree(v) )$
  - ▲ Such that  $\mathbf{r}_i \mathbf{r}_j \leq \mathbf{w}_{ij}$  for all  $(\mathbf{v}_i, \mathbf{v}_j)$

#### Area recovery under timing constraint

- ▲ Min  $\Sigma$  r<sub>v</sub> (indegree(v) outdegree(v)) such that:
- $\mathbf{A}\mathbf{r}_{i} \mathbf{r}_{j} \leq \mathbf{w}_{ij}$  for all (  $\mathbf{v}_{i}, \mathbf{v}_{j}$  ) and
- $\mathbf{A} \mathbf{r}_i \mathbf{r}_j \le W (\mathbf{v}_i, \mathbf{v}_j) 1 \text{ for all } (\mathbf{v}_i, \mathbf{v}_j \text{ }) \text{ such that } D (\mathbf{v}_i, \mathbf{v}_j) > \phi$

#### Common implementation is by integer linear program

Problem can alternatively be transformed into a matching problem and solved by Edmonds-Karp algorithm

## Register sharing

▲ Construct auxiliary network and apply this formulation.

Auxiliary network construction takes into account register sharing (c) Giovanni De Micheli

## Other problems related to retiming

#### Retiming pipelined circuits

- ▲ Balance pipe stages by using retiming
- ▲ Trade-off latency versus cycle time

#### Peripheral retiming

- ▲ Use retiming to move registers to periphery of a circuit
- Restore registers after optimizing combinational logic

## Wire pipelining

- ▲ Use retiming to pipeline interconnection wires
- Model sequential and combinational macros
- Consider wire delay and buffering

#### Sequential optimization technique for:

- ▲ Cycle time or register area
- Applicable to
  - ▲ Synchronous logic networks
  - Architectural models of data paths
    - ▼ Vertices represent complex (arithmetic) operators
  - Exact algorithm in polynomial time
- Extension and issues
  - Delay modeling
  - Network granularity

# Module 3

#### Objective

#### ▲ Relating state-based and structural models

▲ State extraction

## **Relating the sequential models**

#### State encoding

▲ Maps a state-based representation into a structural one

#### State extraction

▲ Recovers the state information from a structural model

#### Remark

- ▲ A circuit with n registers may have 2<sup>n</sup> states
- ▲ Unreachable states

## **State extraction**

- State variables: p, q
- Initial state p=0; q=0;
- Four possible states



## **State extraction**

#### Reachability analysis

▲ Given a state, determine which states are reachable for some inputs

▲ Given a state subset, determine the reachable state subset

▲ Start from an initial state

▲ Stop when convergence is reached

#### Notation:

▲ A state (or a state subset) is represented by an expression over the state variables

#### Implicit representation

## **Reachability analysis**

- State transition function: f
- Initial state: r<sub>0</sub>
- States reachable from r<sub>0</sub>
  - ▲ Image of r<sub>0</sub> under f
- States reachable from set r<sub>k</sub>
  - ▲ Image of r<sub>k</sub> under f
- Iteration
  - $\mathbf{A}\mathbf{r}_{k+1} = \mathbf{r}_k \mathbf{U}$  (image of  $\mathbf{r}_k$  under f)

#### Convergence

$$\mathbf{A}\mathbf{r}_{k+1} = \mathbf{r}_k$$
 for some k

(c) Giovanni De Micheli

## Example



• Initial state  $r_0 = p'q'$ .

• The state transition function 
$$\mathbf{f} = \begin{bmatrix} x'p'q' + pq \\ xp' + pq' \end{bmatrix}$$

(c) Giovanni De Micheli

# Example

Image of p' q' under f: ▲ When (p = 0 and q = 0), f reduces to [x' x]<sup>T</sup> ▲ Image is  $[01]^T$  U  $[10]^T$ States reachable from the reset state: (p = 1; q = 0) and (p = 0; q = 1) $Ar_1 = p'q' + pq' + p'q = p' + q'$ ◆States reacheable from r<sub>1</sub>: ▲[00]<sup>T</sup>U[01]<sup>T</sup>U[10]<sup>T</sup> Convergence:  $A s_0 = p' q'; s_1 = pq'; s_2 = p' q;$ 1/1



0/1

s0

## **Completing the extraction**

#### Determine state set

▲ Vertex set

#### Determine transitions and I/O labels

- ▲ Edge set
- ▲ Inverse image computation
- ▲ Look at conditions that lead into a given state

# Example

х  $\bullet$ Transition into  $s_0 = p' q'$ A Patterns that make  $f = [0 0]^T$  are: (x'p'q' + pq)'(xp' + pq')' = x'p'q**\land** Transition from state  $s_2 = p' q$  under q input x' s0 **s1** s<sub>0</sub> ▲ And so on ... 0 1/1 0/0 s2 s2 1/1

z

q

0/1

\*/1

s,

s 3,

\*/1

## Remarks

Extraction is performed efficiently with implicit methods

## ♦ Model transition relation x (i,x,y) with BDDs

▲ This function relates possible triples:

( input, current\_state, next\_state )

▲ Image of r<sub>k</sub>:

 $\bullet$  S<sub>i,x</sub> (  $\chi$ (i,x,y) r<sub>k</sub> (x) )

 $\blacksquare$  Where  $r_k$  depends on inputs x

▲ Smoothing on BDDs can be achieved efficiently

# Summary

#### State extraction can be performed efficiently to:

- ▲ Apply state-based optimization techniques
- ▲ Apply verification techniques

State extraction is based on forward and backward state space traversal:

- ▲ Represent state space implicitly with BDDs
- ▲ Important to manage the space size, which grows exponentially with the number of registers