# Solve the clustering problem using QAOA¶

**Author: Cooper Midroni** cooper@entropicalabs.com

In this notebook we demonstrate a start-to-finish project workflow for using Quantum Approximate Optimization Algorithm to cluster a simple dataset. Along the way, we will explain the major concepts of QAOA and build intuition as to how QAOA can be used to solve clustering problems. This notebook will steer away from heavy mathematical explanations in favor of a higher level view of the algorithm’s core components. It is mainly geared towards users that don’t have physics background but come from computer science.

## Variational Hybrid Algorithms¶

We often take for granted the many decades of progress that lead to
today’s widespread use of classical computers. As memory and compute
power become ever cheapened by Moore’s Law, the pressure to find optimal
resource allocations for algorithms shrinks away. However, with quantum
computers in their early stages, they still feel this daunting
requirement. In response to this, a family of algorithms known as
*variational hybrid quantum-classical algorithms* was created, with the
notion that quantum resources can be made more useful when partnered
with classical routines. The Quantum Approximate Optimization Algorithm
(QAOA), belongs to the family of variatonal hybrid algorithms.

We can infer a lot from merely unpacking this name. The presence of
‘*variational*’ tells us these algorithms will follow an iterative
approach, while ‘*hybrid*’ tells us they will leverage the use of both
quantum and classical computers. In fact, this describes the main flow
of the algorithm, with all that needs be answered is *when* does this
iteration stop and *what* information is passed between devices.

*A visual representation of a generic
variational hybrid quantum-classical algorithm.*

To answer the question of *what*, we note that the main goal of QAOA is
optimize a set of **parameters**, which we denote as
\(\vec{\gamma}\) and \(\vec{\beta}\). You’ll notice that these
symbols are vectors, as such they are \(n-\)length. We discuss
later what aspects of our problem decide the value of \(n\) in the
second notebook.

\(\vec{\gamma}\) and \(\vec{\beta}\) parameterize a **cost
function** which is evaluated with our **Quantum Circuit** to produce a
cost value. This output value is input to the optimizer, and is used to
determine whether the nudging of our parameters is in a direction of
lower cost. We will sometimes call the cost value an **expectation
value**, represented by \(\langle\psi|Cost|\psi\rangle\), which is
the expected value of the cost function \(Cost\) over the **wave
function** \(\psi\). If you were caught off guard by the term ‘wave
function’, then it is equally as effective to think of
\(\langle\psi|Cost|\psi\rangle\) as the notion of cost as in the
more traditional machine learning sense. The **Classical Optimizer**
will return updated parameters to the quantum circuit for re-evaluation,
and the cycle repeats.

*When* does this algorithm stop? Well, once a stopping criterion is met
of course. This criterion is often a pre-defined maximum number of
iterations, or occurs after a repeat number of evaluations land within
the same threshold of convergence (a tolerance for the cost value in
which we consider numbers within an \(\epsilon-\)window the same).
Once this criterion is met, the **optimized parameters** are returned
and used to define the solution.

*A visual representation of QAOA in the format of a
variational hybrid algorithm.*

The above description should leave you with many questions. - How does the above process solve a clustering problem? - How exactly do \(\vec{\gamma}\) and \(\vec{\beta}\) define the solution? - How do we define a meaningful cost function for our problem? - What in the world is a wave function?

We hope to answer these and more. For now, if you feel comfortable with the critical vocabulary of QAOA (the bolded words), then you’ll be well prepared for the explanations below. *** ### Data Preparation Now let’s get to the fun part! We will import our data and define the problem setting as a highly manicured example for this clustering demo.

The dataset we will be using is the **Pokemon dataset**, which can be
found on
Github.
In our journey to Catch ’Em All, we will attempt to cluster Pokemon into
Legendary and non-Legendary classes.

**Import Libraries**

```
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
```

**Import Data**

```
df = pd.read_csv('./data/pokemon.csv')
df = df.set_index('#') #index pokemon by their ID number
df = df.rename_axis('ID') #rename axis to 'ID' instead of '#'
df = df.loc[~df.index.duplicated(keep='first')] #drop duplicates
df.head()
```

Name | Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|

ID | ||||||||||||

1 | Bulbasaur | Grass | Poison | 318 | 45 | 49 | 49 | 65 | 65 | 45 | 1 | False |

2 | Ivysaur | Grass | Poison | 405 | 60 | 62 | 63 | 80 | 80 | 60 | 1 | False |

3 | Venusaur | Grass | Poison | 525 | 80 | 82 | 83 | 100 | 100 | 80 | 1 | False |

4 | Charmander | Fire | NaN | 309 | 39 | 52 | 43 | 60 | 50 | 65 | 1 | False |

5 | Charmeleon | Fire | NaN | 405 | 58 | 64 | 58 | 80 | 65 | 80 | 1 | False |

To avoid the many bells and whistles of later iterations of Pokemon games, we’ll stick to our roots and only consider Pokemon from the first three generations.

```
df = df.loc[df['Generation']<=3]
df.sample(frac=1).head() #sample the whole dataset (frac=1) to shuffle the arrangement
```

Name | Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|

ID | ||||||||||||

266 | Silcoon | Bug | NaN | 205 | 50 | 35 | 55 | 25 | 25 | 15 | 3 | False |

185 | Sudowoodo | Rock | NaN | 410 | 70 | 100 | 115 | 30 | 65 | 30 | 2 | False |

5 | Charmeleon | Fire | NaN | 405 | 58 | 64 | 58 | 80 | 65 | 80 | 1 | False |

380 | Latias | Dragon | Psychic | 600 | 80 | 80 | 90 | 110 | 130 | 110 | 3 | True |

56 | Mankey | Fighting | NaN | 305 | 40 | 80 | 35 | 35 | 45 | 70 | 1 | False |

```
print('Percent of Non-Legendary Pokemon: %.2f' %((df.Legendary.count()-df.Legendary.sum())/df.Legendary.count()))
print('Percent of Legendary Pokemon: %.2f' %((df.Legendary.sum())/df.Legendary.count()))
```

```
Percent of Non-Legendary Pokemon: 0.95
Percent of Legendary Pokemon: 0.05
```

We can see that the classes are quite unevenly distributed. To remedy this, we will randomly select 5 Legendary and 5 Non-Legendary Pokemon to act as our samples to be clustered.

```
legendary = df.loc[df['Legendary'] == True].sample(5)
non_legendary = df.loc[df['Legendary'] == False].sample(5)
pokemon = pd.concat([legendary,non_legendary])
```

To further simplify the problem, and not worry about the encoding of categorical data, we will only consider numerical values in our clustering of the data.

```
numerical_columns = ['Total','HP','Attack','Defense','Sp. Atk','Sp. Def','Speed']
```

```
labels = pokemon['Legendary']
data = pokemon[numerical_columns].copy()
data.head()
```

Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | |
---|---|---|---|---|---|---|---|

ID | |||||||

379 | 580 | 80 | 75 | 150 | 75 | 150 | 50 |

386 | 600 | 50 | 150 | 50 | 150 | 50 | 150 |

243 | 580 | 90 | 85 | 75 | 115 | 100 | 115 |

381 | 600 | 80 | 90 | 80 | 130 | 110 | 110 |

378 | 580 | 80 | 50 | 100 | 100 | 200 | 50 |

We now have a dataset which is ready to be processed, but we may not be exactly clear on what to do with it. For that we must further understand how the QAOA process detailed above is actually used to solve a clustering problem.

## The Maxcut Problem¶

As laid out by Rigetti’s paper on
QAOA, there are a number of
important steps that we must follow to map the problem of clustering
into a format which QAOA can process. Broadly speaking, QAOA solves the
**MAXCUT** problem, in which a graph of \(n\) vertices is separated
into two complementary subsets, \(S\) and \(S^{c}\), such that
the number of edges between \(S\) and \(S^{c}\) is as large as
possible.

*A depiction of the maxcut problem, displaying a cut which separates
white and black vertices. Image
credit:*Wikipedia

This problem can be made more sophisticated by adding numerical values as weights to the edges, such that the best solution maximizes the sum of weights which separate \(S\) and \(S^{c}\). This is precisely the approach we take in using MAXCUT to cluster our data.

We allow the weights associated to each edge to be some notion of distance between points. In this way, the sets dictated by our optimal cut, \(S\) and \(S^{c}\), separate the data into binary clusters which are maximally distant (and hence, maximally dissimilar) from one another.

From our current understanding, we can already begin to formulate some first steps in preparing our data to fit this frameowrk.

We can use the `distances_dataset`

function from
`entropica_qaoa.utilities`

to easily turn this set of points into the
desired matrix of pairwise distances.

```
from entropica_qaoa.utilities import distances_dataset
dist = pd.DataFrame(distances_dataset(data.values),
index=data.index,columns=data.index)
dist.iloc[0:5, 0:5]
```

ID | 379 | 386 | 243 | 381 | 378 |
---|---|---|---|---|---|

ID | |||||

379 | 0.000000 | 206.276513 | 118.953773 | 117.260394 | 79.056942 |

386 | 206.276513 | 0.000000 | 108.627805 | 104.880885 | 220.907220 |

243 | 118.953773 | 108.627805 | 0.000000 | 30.000000 | 128.062485 |

381 | 117.260394 | 104.880885 | 30.000000 | 0.000000 | 122.474487 |

378 | 79.056942 | 220.907220 | 128.062485 | 122.474487 | 0.000000 |

```
df.loc[dist.index].head()
```

Name | Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|

ID | ||||||||||||

379 | Registeel | Steel | NaN | 580 | 80 | 75 | 150 | 75 | 150 | 50 | 3 | True |

386 | DeoxysNormal Forme | Psychic | NaN | 600 | 50 | 150 | 50 | 150 | 50 | 150 | 3 | True |

243 | Raikou | Electric | NaN | 580 | 90 | 85 | 75 | 115 | 100 | 115 | 2 | True |

381 | Latios | Dragon | Psychic | 600 | 80 | 90 | 80 | 130 | 110 | 110 | 3 | True |

378 | Regice | Ice | NaN | 580 | 80 | 50 | 100 | 100 | 200 | 50 | 3 | True |

## From Maxcut to QUBO¶

With an understanding of the Maxcut structure which produces our
clustered output, we ask ourselves how we can turn what is effectively a
graph problem into the setting of an optimization problem. The answer is
to map our Maxcut interpretation into a **Quadratic Unconstrainted
Binary Optimization**
(QUBO)
problem. QUBO problems attempt to minimize a quadratic polynomial with
binary variables. Luckily, MAXCUT already has a well-known QUBO cost
function. This cost function is sophisticated enough to allow for our
pairwise distanes to be meaningfully included, as well as to allow for
the inclusion of bias terms on individual samples.

To explain the notation: - \(\sigma_{i}\) is the cluster class (-1 or 1) of sample \(i\) - \(J_{i j}\) is the distance between sample \(i\) and sample \(j\) - \(h_{j}\) is a bias term on sample \(j\) - \(\mu\) is a universal weight applied to all bias terms

By convention, a negative sign is applied to the cost function, as
above. In quantum mechanics we would denote thie function as
\(H(\sigma)\). The symbol \(H\) stands for *Hamiltonian*, which
is an operator which acts as a sum of the energies of the system. For
the scope of this notebook, thinking of \(Cost\) as any traditional
cost function which we want to minimize will serve us equally as
valuable.

## From QUBO to a Hamiltonian¶

Now we must use our data to create the cost function defined above. To
make a Hamiltonian that is recognizable by pyQuil, we must use the
pyQuil `PauliTerm`

object.

```
from pyquil.api import WavefunctionSimulator
from pyquil.paulis import PauliSum, PauliTerm
```

A `PauliTerm`

object can be quadratic or of order one. In the case of
it being quadratic, it represents the relationship between any two
samples of data. An order one `PauliTerm`

would be an implementation
of a bias term - a cost constraint which only affects one variable.
Below we show some basic functionality of the `PauliTerm`

object.

```
#Constructing a quadratic PauliTerm
i = 3
j = 6
print('Distance between samples %d and %d: %.3f' %(i,j,dist.values[i][j]))
```

```
Distance between samples 3 and 6: 433.417
```

To create the quadratic term we multiply two Paulis together. Each
`PauliTerm`

has an accompanying coefficient which is also multiplied.
For simplicity’s sake, we include the pairwise distance as a coefficient
of one factor, and make the other ‘1.0’.

```
term1 = PauliTerm("Z",i,dist.values[i][j])
term2 = PauliTerm("Z",j,1.0)
term = term1*term2
print(term)
```

```
(433.4166586553867+0j)*Z3*Z6
```

Feel free to play with the coefficient number of `term2`

to see how it
affects the output of the cell.

For those new to quantum computing, you’re likely wondering what the
purpose of the letter ‘Z’ is. It indicates that this `PauliTerm`

is a
Z operator.

You may also note that our sample numbers, \(i=3\) and \(j=6\),
have found their way into the printed output. Including \(i\) and
\(j\) in each `PauliTerm`

tells pyQuil which samples or **qubits**
the operation is applied to. That’s right, in the QAOA setup we consider
each datapoint to be mapped to a qubit. Thus, the above printed
statement actually means *“apply a penalty of :math:`Q` should sample 3
and sample 6 be in the same class”*, where \(Q\) is the coefficient
of the operator product Z3*Z6. Said in a more quantum-intuitive sense:
*“Apply a penalty of :math:`Q` should qubit 3 and qubit 6 both be found
in the same spin state (spin up or spin down)”*.

Thus, as QAOA tries to minimize the cost function, sample 3 and 6 will only appear in the same class if this configuration is optimal. The choice of our weights as the distances between the samples implies, that in a “good” configuration samples that lie far apart will end up in different classes.

We can see now that to make the Hamiltonian for our system we must
iterate over each distance in our distance matrix, and assign it within
a `PauliTerm`

as the interaction strength between the appropriate
qubits. We can readily achieve this using the utility function
`hamiltonian_from_distances`

.

```
from entropica_qaoa.utilities import hamiltonian_from_distances
hamiltonian = hamiltonian_from_distances(dist)
print(hamiltonian)
```

```
(206.27651344736267+0j)*Z0*Z1 + (118.95377253370319+0j)*Z0*Z2 + (117.26039399558574+0j)*Z0*Z3 + (79.05694150420949+0j)*Z0*Z4 + (429.1270208225066+0j)*Z0*Z5 + (414.1255848169731+0j)*Z0*Z6 + (329.6209944769902+0j)*Z0*Z7 + (365.8551625985343+0j)*Z0*Z8 + (95.06839643120105+0j)*Z0*Z9 + (108.62780491200216+0j)*Z1*Z2 + (104.88088481701516+0j)*Z1*Z3 + (220.90722034374522+0j)*Z1*Z4 + (467.9743582719036+0j)*Z1*Z5 + (451.3867521316947+0j)*Z1*Z6 + (364.0741682679506+0j)*Z1*Z7 + (415.45156155681974+0j)*Z1*Z8 + (159.80613254815975+0j)*Z1*Z9 + (30+0j)*Z2*Z3 + (128.06248474865697+0j)*Z2*Z4 + (425.20583250938597+0j)*Z2*Z5 + (410.91361622608713+0j)*Z2*Z6 + (325.1153641401772+0j)*Z2*Z7 + (368.0353243915589+0j)*Z2*Z8 + (77.5112895003044+0j)*Z2*Z9 + (122.47448713915891+0j)*Z3*Z4 + (447.88391353117385+0j)*Z3*Z5 + (433.4166586553867+0j)*Z3*Z6 + (346.04912945996557+0j)*Z3*Z7 + (389.1015291668744+0j)*Z3*Z8 + (96.42613753542138+0j)*Z3*Z9 + (436.3484845854286+0j)*Z4*Z5 + (425.73465914816+0j)*Z4*Z6 + (340.80786375903944+0j)*Z4*Z7 + (372.96112397943034+0j)*Z4*Z8 + (123.44229421069588+0j)*Z4*Z9 + (30.822070014844883+0j)*Z5*Z6 + (113.35784048754634+0j)*Z5*Z7 + (70+0j)*Z5*Z8 + (362.9572977638003+0j)*Z5*Z9 + (96.17692030835673+0j)*Z6*Z7 + (63.245553203367585+0j)*Z6*Z8 + (347.4881292936494+0j)*Z6*Z9 + (65.95452979136459+0j)*Z7*Z8 + (261.3197275369772+0j)*Z7*Z9 + (303.59183124715327+0j)*Z8*Z9
```

The above exercise brings up an important limitation to our present QAOA approach. The number of datapoints we are able to use is limited by the number of qubits we have available.

## Minimize the Hamiltonian with QAOA¶

Now that we have mapped the clustering problem to a Hamiltonian it is time to find the spin class assignments/spin configuration that minimizes our cost function. We do this using the QAOA algorithm. First we need to import the neccesary bits and pieces:

```
# import the neccesary pyquil modules
from entropica_qaoa.qaoa.cost_function import QAOACostFunctionOnQVM, QAOACostFunctionOnWFSim
# import QAOAParameters
from entropica_qaoa.qaoa.parameters import ExtendedParams
# import an optimizer
from scipy.optimize import minimize
#Some utilities for time tracking and measuring our outcomes.
import time
from math import log
from entropica_qaoa.utilities import cluster_accuracy, max_probability_bitstring
```

Now we can set up the *hyperparameters* (problem parameters that remain
fixed for this problem instance):

```
timesteps = 3 # The QAOA p parameter
iters = 500 # Number of classical optimiser iterations
n_qubits = 10 #this number might be defined before your dataset - should equal the number of data points
#The hamiltonian is also a hyperparameter
```

And of course also the parameters need to be chosen. In this QAOA run,
we will use `ExtendedParameters`

. This parameter class provides the
most degrees of freedom for our optimizer to explore the energy
landscape. Conversely, it also has the most parameters to optimize and
thus will take longer to converge.

To instantiate this parameter class, we need to pass in three separate lists of angles. - \(\vec{\beta}\): every timestep requires \(n_{qubits}\) beta rotations. Thus there are \(n_{qubits}\times timesteps\) beta values. - \(\vec{\gamma}_{pairs}\): there is a gamma rotation for every two-qubit interaction. A simple way to come up with this number is to measure the length of your Hamiltonian, subtracted by the number of single qubit bias terms in place. - \(\vec{\gamma}_{singles}\): there is a gamma single rotation for each bias term included in the hamiltonian.

We randomly generate these lists as their initial starting states are somewhat redunant. They will be optimized over 100s of iterations!

```
betas = [round(val,1) for val in np.random.rand(timesteps*n_qubits)]
gammas_singles = [round(val,1) for val in np.random.rand(0)] #we don't want any bias terms
gammas_pairs = [round(val,1) for val in np.random.rand(timesteps*len(hamiltonian))]
hyperparameters = (hamiltonian, timesteps)
parameters = (betas, gammas_singles, gammas_pairs)
params = ExtendedParams(hyperparameters, parameters)
```

Before starting the simulator, make sure you are running Rigetti’s QVM
and Quil Compiler by running `qvm -S`

and `quilc -S`

in two open and
disposable terminals

Let’s begin by running QAOA with \(p=3\) timesteps, and a maximum of 500 optimiser Iterations.

```
# Set up the WavefunctionSimulator from pyQuil
sim = WavefunctionSimulator()
cost_function = QAOACostFunctionOnWFSim(hamiltonian,
params=params,
sim=sim,
enable_logging=True)
```

```
t0 = time.time()
res = minimize(cost_function, params.raw(), tol=1e-3, method='Cobyla',
options={"maxiter": iters})
print('Run complete!\n','Runtime:','{:.3f}'.format(time.time()-t0))
```

```
Run complete!
Runtime: 56.446
```

```
wave_func = cost_function.get_wavefunction(params.raw())
lowest = max_probability_bitstring(wave_func.probabilities())
```

```
true_clusters = [1 if val else 0 for val in labels]
acc = cluster_accuracy(lowest,true_clusters)
```

```
True Labels of samples: [1, 1, 1, 1, 1, 0, 0, 0, 0, 0]
Lowest QAOA State: [0, 0, 1, 1, 1, 1, 0, 0, 1, 0]
Accuracy of Original State: 60.0 %
Accuracy of Complement State: 40.0 %
```

We can analyze the optimizer to see whether or not our QAOA run converged. For the full message, run:

```
print(res)
```

```
print('Cost Function Value:', res.fun)
print('Converged?:',res.message)
```

```
Cost Function Value: -446.28427408573975
Converged?: Maximum number of function evaluations has been exceeded.
```

We can see we did not converge. Let’s tighten up our operations by wrapping our QAOA runs in a function and increase the QAOA parameter \(p\).

```
def run_qaoa(hamiltonian, params, timesteps, max_iters, init_state=None):
cost_function = QAOACostFunctionOnWFSim(hamiltonian,
params=params,
initial_state=init_state)
res = minimize(cost_function, params.raw(), tol=1e-3, method='Cobyla',
options={"maxiter" : max_iters})
return cost_function.get_wavefunction(params.raw()), res
```

The cell below will take around 2 to 3 minutes to run:

```
t0 = time.time()
wave_func, res = run_qaoa(hamiltonian, params, timesteps=3, max_iters=1500)
print('Run complete\n','Runtime:','{:.3f}'.format(time.time()-t0))
```

```
Run complete
Runtime: 159.021
```

```
lowest = max_probability_bitstring(wave_func.probabilities())
```

```
true_clusters = [1 if val else 0 for val in labels]
acc = cluster_accuracy(lowest,true_clusters)
print('Cost Function Value:', res.fun)
print('Converged?:',res.message)
```

```
True Labels of samples: [1, 1, 1, 1, 1, 0, 0, 0, 0, 0]
Lowest QAOA State: [1, 0, 1, 0, 0, 0, 1, 1, 1, 1]
Accuracy of Original State: 30.0 %
Accuracy of Complement State: 70.0 %
Cost Function Value: -860.5133756234061
Converged?: Optimization terminated successfully.
```

You should typically find that increasing the number of allowed iterations gives a more accurate answer. The precise numbers will depend on which Pokemons are randomly selected at the beginning.