Solve the clustering problem using QAOA¶
Author: Cooper Midroni cooper@entropicalabs.com
In this notebook we demonstrate a start-to-finish project workflow for using Quantum Approximate Optimization Algorithm to cluster a simple dataset. Along the way, we will explain the major concepts of QAOA and build intuition as to how QAOA can be used to solve clustering problems. This notebook will steer away from heavy mathematical explanations in favor of a higher level view of the algorithm’s core components. It is mainly geared towards users that don’t have physics background but come from computer science.
Variational Hybrid Algorithms¶
We often take for granted the many decades of progress that lead to today’s widespread use of classical computers. As memory and compute power become ever cheapened by Moore’s Law, the pressure to find optimal resource allocations for algorithms shrinks away. However, with quantum computers in their early stages, they still feel this daunting requirement. In response to this, a family of algorithms known as variational hybrid quantum-classical algorithms was created, with the notion that quantum resources can be made more useful when partnered with classical routines. The Quantum Approximate Optimization Algorithm (QAOA), belongs to the family of variatonal hybrid algorithms.
We can infer a lot from merely unpacking this name. The presence of ‘variational’ tells us these algorithms will follow an iterative approach, while ‘hybrid’ tells us they will leverage the use of both quantum and classical computers. In fact, this describes the main flow of the algorithm, with all that needs be answered is when does this iteration stop and what information is passed between devices.
A visual representation of a generic
variational hybrid quantum-classical algorithm.
To answer the question of what, we note that the main goal of QAOA is optimize a set of parameters, which we denote as \(\vec{\gamma}\) and \(\vec{\beta}\). You’ll notice that these symbols are vectors, as such they are \(n-\)length. We discuss later what aspects of our problem decide the value of \(n\) in the second notebook.
\(\vec{\gamma}\) and \(\vec{\beta}\) parameterize a cost function which is evaluated with our Quantum Circuit to produce a cost value. This output value is input to the optimizer, and is used to determine whether the nudging of our parameters is in a direction of lower cost. We will sometimes call the cost value an expectation value, represented by \(\langle\psi|Cost|\psi\rangle\), which is the expected value of the cost function \(Cost\) over the wave function \(\psi\). If you were caught off guard by the term ‘wave function’, then it is equally as effective to think of \(\langle\psi|Cost|\psi\rangle\) as the notion of cost as in the more traditional machine learning sense. The Classical Optimizer will return updated parameters to the quantum circuit for re-evaluation, and the cycle repeats.
When does this algorithm stop? Well, once a stopping criterion is met of course. This criterion is often a pre-defined maximum number of iterations, or occurs after a repeat number of evaluations land within the same threshold of convergence (a tolerance for the cost value in which we consider numbers within an \(\epsilon-\)window the same). Once this criterion is met, the optimized parameters are returned and used to define the solution.
A visual representation of QAOA in the format of a
variational hybrid algorithm.
The above description should leave you with many questions. - How does the above process solve a clustering problem? - How exactly do \(\vec{\gamma}\) and \(\vec{\beta}\) define the solution? - How do we define a meaningful cost function for our problem? - What in the world is a wave function?
We hope to answer these and more. For now, if you feel comfortable with the critical vocabulary of QAOA (the bolded words), then you’ll be well prepared for the explanations below. *** ### Data Preparation Now let’s get to the fun part! We will import our data and define the problem setting as a highly manicured example for this clustering demo.
The dataset we will be using is the Pokemon dataset, which can be found on Github. In our journey to Catch ’Em All, we will attempt to cluster Pokemon into Legendary and non-Legendary classes.
Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
Import Data
df = pd.read_csv('./data/pokemon.csv')
df = df.set_index('#') #index pokemon by their ID number
df = df.rename_axis('ID') #rename axis to 'ID' instead of '#'
df = df.loc[~df.index.duplicated(keep='first')] #drop duplicates
df.head()
Name | Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
ID | ||||||||||||
1 | Bulbasaur | Grass | Poison | 318 | 45 | 49 | 49 | 65 | 65 | 45 | 1 | False |
2 | Ivysaur | Grass | Poison | 405 | 60 | 62 | 63 | 80 | 80 | 60 | 1 | False |
3 | Venusaur | Grass | Poison | 525 | 80 | 82 | 83 | 100 | 100 | 80 | 1 | False |
4 | Charmander | Fire | NaN | 309 | 39 | 52 | 43 | 60 | 50 | 65 | 1 | False |
5 | Charmeleon | Fire | NaN | 405 | 58 | 64 | 58 | 80 | 65 | 80 | 1 | False |
To avoid the many bells and whistles of later iterations of Pokemon games, we’ll stick to our roots and only consider Pokemon from the first three generations.
df = df.loc[df['Generation']<=3]
df.sample(frac=1).head() #sample the whole dataset (frac=1) to shuffle the arrangement
Name | Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
ID | ||||||||||||
266 | Silcoon | Bug | NaN | 205 | 50 | 35 | 55 | 25 | 25 | 15 | 3 | False |
185 | Sudowoodo | Rock | NaN | 410 | 70 | 100 | 115 | 30 | 65 | 30 | 2 | False |
5 | Charmeleon | Fire | NaN | 405 | 58 | 64 | 58 | 80 | 65 | 80 | 1 | False |
380 | Latias | Dragon | Psychic | 600 | 80 | 80 | 90 | 110 | 130 | 110 | 3 | True |
56 | Mankey | Fighting | NaN | 305 | 40 | 80 | 35 | 35 | 45 | 70 | 1 | False |
print('Percent of Non-Legendary Pokemon: %.2f' %((df.Legendary.count()-df.Legendary.sum())/df.Legendary.count()))
print('Percent of Legendary Pokemon: %.2f' %((df.Legendary.sum())/df.Legendary.count()))
Percent of Non-Legendary Pokemon: 0.95
Percent of Legendary Pokemon: 0.05
We can see that the classes are quite unevenly distributed. To remedy this, we will randomly select 5 Legendary and 5 Non-Legendary Pokemon to act as our samples to be clustered.
legendary = df.loc[df['Legendary'] == True].sample(5)
non_legendary = df.loc[df['Legendary'] == False].sample(5)
pokemon = pd.concat([legendary,non_legendary])
To further simplify the problem, and not worry about the encoding of categorical data, we will only consider numerical values in our clustering of the data.
numerical_columns = ['Total','HP','Attack','Defense','Sp. Atk','Sp. Def','Speed']
labels = pokemon['Legendary']
data = pokemon[numerical_columns].copy()
data.head()
Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | |
---|---|---|---|---|---|---|---|
ID | |||||||
379 | 580 | 80 | 75 | 150 | 75 | 150 | 50 |
386 | 600 | 50 | 150 | 50 | 150 | 50 | 150 |
243 | 580 | 90 | 85 | 75 | 115 | 100 | 115 |
381 | 600 | 80 | 90 | 80 | 130 | 110 | 110 |
378 | 580 | 80 | 50 | 100 | 100 | 200 | 50 |
We now have a dataset which is ready to be processed, but we may not be exactly clear on what to do with it. For that we must further understand how the QAOA process detailed above is actually used to solve a clustering problem.
The Maxcut Problem¶
As laid out by Rigetti’s paper on QAOA, there are a number of important steps that we must follow to map the problem of clustering into a format which QAOA can process. Broadly speaking, QAOA solves the MAXCUT problem, in which a graph of \(n\) vertices is separated into two complementary subsets, \(S\) and \(S^{c}\), such that the number of edges between \(S\) and \(S^{c}\) is as large as possible.

A depiction of the maxcut problem, displaying a cut which separates white and black vertices.¶
A depiction of the maxcut problem, displaying a cut which separates white and black vertices. Image credit:Wikipedia
This problem can be made more sophisticated by adding numerical values as weights to the edges, such that the best solution maximizes the sum of weights which separate \(S\) and \(S^{c}\). This is precisely the approach we take in using MAXCUT to cluster our data.
We allow the weights associated to each edge to be some notion of distance between points. In this way, the sets dictated by our optimal cut, \(S\) and \(S^{c}\), separate the data into binary clusters which are maximally distant (and hence, maximally dissimilar) from one another.
From our current understanding, we can already begin to formulate some first steps in preparing our data to fit this frameowrk.
We can use the distances_dataset
function from
entropica_qaoa.utilities
to easily turn this set of points into the
desired matrix of pairwise distances.
from entropica_qaoa.utilities import distances_dataset
dist = pd.DataFrame(distances_dataset(data.values),
index=data.index,columns=data.index)
dist.iloc[0:5, 0:5]
ID | 379 | 386 | 243 | 381 | 378 |
---|---|---|---|---|---|
ID | |||||
379 | 0.000000 | 206.276513 | 118.953773 | 117.260394 | 79.056942 |
386 | 206.276513 | 0.000000 | 108.627805 | 104.880885 | 220.907220 |
243 | 118.953773 | 108.627805 | 0.000000 | 30.000000 | 128.062485 |
381 | 117.260394 | 104.880885 | 30.000000 | 0.000000 | 122.474487 |
378 | 79.056942 | 220.907220 | 128.062485 | 122.474487 | 0.000000 |
df.loc[dist.index].head()
Name | Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
ID | ||||||||||||
379 | Registeel | Steel | NaN | 580 | 80 | 75 | 150 | 75 | 150 | 50 | 3 | True |
386 | DeoxysNormal Forme | Psychic | NaN | 600 | 50 | 150 | 50 | 150 | 50 | 150 | 3 | True |
243 | Raikou | Electric | NaN | 580 | 90 | 85 | 75 | 115 | 100 | 115 | 2 | True |
381 | Latios | Dragon | Psychic | 600 | 80 | 90 | 80 | 130 | 110 | 110 | 3 | True |
378 | Regice | Ice | NaN | 580 | 80 | 50 | 100 | 100 | 200 | 50 | 3 | True |
From Maxcut to QUBO¶
With an understanding of the Maxcut structure which produces our clustered output, we ask ourselves how we can turn what is effectively a graph problem into the setting of an optimization problem. The answer is to map our Maxcut interpretation into a Quadratic Unconstrainted Binary Optimization (QUBO) problem. QUBO problems attempt to minimize a quadratic polynomial with binary variables. Luckily, MAXCUT already has a well-known QUBO cost function. This cost function is sophisticated enough to allow for our pairwise distanes to be meaningfully included, as well as to allow for the inclusion of bias terms on individual samples.
To explain the notation: - \(\sigma_{i}\) is the cluster class (-1 or 1) of sample \(i\) - \(J_{i j}\) is the distance between sample \(i\) and sample \(j\) - \(h_{j}\) is a bias term on sample \(j\) - \(\mu\) is a universal weight applied to all bias terms
By convention, a negative sign is applied to the cost function, as above. In quantum mechanics we would denote thie function as \(H(\sigma)\). The symbol \(H\) stands for Hamiltonian, which is an operator which acts as a sum of the energies of the system. For the scope of this notebook, thinking of \(Cost\) as any traditional cost function which we want to minimize will serve us equally as valuable.
From QUBO to a Hamiltonian¶
Now we must use our data to create the cost function defined above. To
make a Hamiltonian that is recognizable by pyQuil, we must use the
pyQuil PauliTerm
object.
from pyquil.api import WavefunctionSimulator
from pyquil.paulis import PauliSum, PauliTerm
A PauliTerm
object can be quadratic or of order one. In the case of
it being quadratic, it represents the relationship between any two
samples of data. An order one PauliTerm
would be an implementation
of a bias term - a cost constraint which only affects one variable.
Below we show some basic functionality of the PauliTerm
object.
#Constructing a quadratic PauliTerm
i = 3
j = 6
print('Distance between samples %d and %d: %.3f' %(i,j,dist.values[i][j]))
Distance between samples 3 and 6: 433.417
To create the quadratic term we multiply two Paulis together. Each
PauliTerm
has an accompanying coefficient which is also multiplied.
For simplicity’s sake, we include the pairwise distance as a coefficient
of one factor, and make the other ‘1.0’.
term1 = PauliTerm("Z",i,dist.values[i][j])
term2 = PauliTerm("Z",j,1.0)
term = term1*term2
print(term)
(433.4166586553867+0j)*Z3*Z6
Feel free to play with the coefficient number of term2
to see how it
affects the output of the cell.
For those new to quantum computing, you’re likely wondering what the
purpose of the letter ‘Z’ is. It indicates that this PauliTerm
is a
Z operator.
You may also note that our sample numbers, \(i=3\) and \(j=6\),
have found their way into the printed output. Including \(i\) and
\(j\) in each PauliTerm
tells pyQuil which samples or qubits
the operation is applied to. That’s right, in the QAOA setup we consider
each datapoint to be mapped to a qubit. Thus, the above printed
statement actually means “apply a penalty of :math:`Q` should sample 3
and sample 6 be in the same class”, where \(Q\) is the coefficient
of the operator product Z3*Z6. Said in a more quantum-intuitive sense:
“Apply a penalty of :math:`Q` should qubit 3 and qubit 6 both be found
in the same spin state (spin up or spin down)”.
Thus, as QAOA tries to minimize the cost function, sample 3 and 6 will only appear in the same class if this configuration is optimal. The choice of our weights as the distances between the samples implies, that in a “good” configuration samples that lie far apart will end up in different classes.
We can see now that to make the Hamiltonian for our system we must
iterate over each distance in our distance matrix, and assign it within
a PauliTerm
as the interaction strength between the appropriate
qubits. We can readily achieve this using the utility function
hamiltonian_from_distances
.
from entropica_qaoa.utilities import hamiltonian_from_distances
hamiltonian = hamiltonian_from_distances(dist)
print(hamiltonian)
(206.27651344736267+0j)*Z0*Z1 + (118.95377253370319+0j)*Z0*Z2 + (117.26039399558574+0j)*Z0*Z3 + (79.05694150420949+0j)*Z0*Z4 + (429.1270208225066+0j)*Z0*Z5 + (414.1255848169731+0j)*Z0*Z6 + (329.6209944769902+0j)*Z0*Z7 + (365.8551625985343+0j)*Z0*Z8 + (95.06839643120105+0j)*Z0*Z9 + (108.62780491200216+0j)*Z1*Z2 + (104.88088481701516+0j)*Z1*Z3 + (220.90722034374522+0j)*Z1*Z4 + (467.9743582719036+0j)*Z1*Z5 + (451.3867521316947+0j)*Z1*Z6 + (364.0741682679506+0j)*Z1*Z7 + (415.45156155681974+0j)*Z1*Z8 + (159.80613254815975+0j)*Z1*Z9 + (30+0j)*Z2*Z3 + (128.06248474865697+0j)*Z2*Z4 + (425.20583250938597+0j)*Z2*Z5 + (410.91361622608713+0j)*Z2*Z6 + (325.1153641401772+0j)*Z2*Z7 + (368.0353243915589+0j)*Z2*Z8 + (77.5112895003044+0j)*Z2*Z9 + (122.47448713915891+0j)*Z3*Z4 + (447.88391353117385+0j)*Z3*Z5 + (433.4166586553867+0j)*Z3*Z6 + (346.04912945996557+0j)*Z3*Z7 + (389.1015291668744+0j)*Z3*Z8 + (96.42613753542138+0j)*Z3*Z9 + (436.3484845854286+0j)*Z4*Z5 + (425.73465914816+0j)*Z4*Z6 + (340.80786375903944+0j)*Z4*Z7 + (372.96112397943034+0j)*Z4*Z8 + (123.44229421069588+0j)*Z4*Z9 + (30.822070014844883+0j)*Z5*Z6 + (113.35784048754634+0j)*Z5*Z7 + (70+0j)*Z5*Z8 + (362.9572977638003+0j)*Z5*Z9 + (96.17692030835673+0j)*Z6*Z7 + (63.245553203367585+0j)*Z6*Z8 + (347.4881292936494+0j)*Z6*Z9 + (65.95452979136459+0j)*Z7*Z8 + (261.3197275369772+0j)*Z7*Z9 + (303.59183124715327+0j)*Z8*Z9
The above exercise brings up an important limitation to our present QAOA approach. The number of datapoints we are able to use is limited by the number of qubits we have available.
Minimize the Hamiltonian with QAOA¶
Now that we have mapped the clustering problem to a Hamiltonian it is time to find the spin class assignments/spin configuration that minimizes our cost function. We do this using the QAOA algorithm. First we need to import the neccesary bits and pieces:
# import the neccesary pyquil modules
from entropica_qaoa.qaoa.cost_function import QAOACostFunctionOnQVM, QAOACostFunctionOnWFSim
# import QAOAParameters
from entropica_qaoa.qaoa.parameters import ExtendedParams
# import an optimizer
from scipy.optimize import minimize
#Some utilities for time tracking and measuring our outcomes.
import time
from math import log
from entropica_qaoa.utilities import cluster_accuracy, max_probability_bitstring
Now we can set up the hyperparameters (problem parameters that remain fixed for this problem instance):
timesteps = 3 # The QAOA p parameter
iters = 500 # Number of classical optimiser iterations
n_qubits = 10 #this number might be defined before your dataset - should equal the number of data points
#The hamiltonian is also a hyperparameter
And of course also the parameters need to be chosen. In this QAOA run,
we will use ExtendedParameters
. This parameter class provides the
most degrees of freedom for our optimizer to explore the energy
landscape. Conversely, it also has the most parameters to optimize and
thus will take longer to converge.
To instantiate this parameter class, we need to pass in three separate lists of angles. - \(\vec{\beta}\): every timestep requires \(n_{qubits}\) beta rotations. Thus there are \(n_{qubits}\times timesteps\) beta values. - \(\vec{\gamma}_{pairs}\): there is a gamma rotation for every two-qubit interaction. A simple way to come up with this number is to measure the length of your Hamiltonian, subtracted by the number of single qubit bias terms in place. - \(\vec{\gamma}_{singles}\): there is a gamma single rotation for each bias term included in the hamiltonian.
We randomly generate these lists as their initial starting states are somewhat redunant. They will be optimized over 100s of iterations!
betas = [round(val,1) for val in np.random.rand(timesteps*n_qubits)]
gammas_singles = [round(val,1) for val in np.random.rand(0)] #we don't want any bias terms
gammas_pairs = [round(val,1) for val in np.random.rand(timesteps*len(hamiltonian))]
hyperparameters = (hamiltonian, timesteps)
parameters = (betas, gammas_singles, gammas_pairs)
params = ExtendedParams(hyperparameters, parameters)
Before starting the simulator, make sure you are running Rigetti’s QVM
and Quil Compiler by running qvm -S
and quilc -S
in two open and
disposable terminals
Let’s begin by running QAOA with \(p=3\) timesteps, and a maximum of 500 optimiser Iterations.
# Set up the WavefunctionSimulator from pyQuil
sim = WavefunctionSimulator()
cost_function = QAOACostFunctionOnWFSim(hamiltonian,
params=params,
sim=sim,
enable_logging=True)
t0 = time.time()
res = minimize(cost_function, params.raw(), tol=1e-3, method='Cobyla',
options={"maxiter": iters})
print('Run complete!\n','Runtime:','{:.3f}'.format(time.time()-t0))
Run complete!
Runtime: 56.446
wave_func = cost_function.get_wavefunction(params.raw())
lowest = max_probability_bitstring(wave_func.probabilities())
true_clusters = [1 if val else 0 for val in labels]
acc = cluster_accuracy(lowest,true_clusters)
True Labels of samples: [1, 1, 1, 1, 1, 0, 0, 0, 0, 0]
Lowest QAOA State: [0, 0, 1, 1, 1, 1, 0, 0, 1, 0]
Accuracy of Original State: 60.0 %
Accuracy of Complement State: 40.0 %
We can analyze the optimizer to see whether or not our QAOA run converged. For the full message, run:
print(res)
print('Cost Function Value:', res.fun)
print('Converged?:',res.message)
Cost Function Value: -446.28427408573975
Converged?: Maximum number of function evaluations has been exceeded.
We can see we did not converge. Let’s tighten up our operations by wrapping our QAOA runs in a function and increase the QAOA parameter \(p\).
def run_qaoa(hamiltonian, params, timesteps, max_iters, init_state=None):
cost_function = QAOACostFunctionOnWFSim(hamiltonian,
params=params,
initial_state=init_state)
res = minimize(cost_function, params.raw(), tol=1e-3, method='Cobyla',
options={"maxiter" : max_iters})
return cost_function.get_wavefunction(params.raw()), res
The cell below will take around 2 to 3 minutes to run:
t0 = time.time()
wave_func, res = run_qaoa(hamiltonian, params, timesteps=3, max_iters=1500)
print('Run complete\n','Runtime:','{:.3f}'.format(time.time()-t0))
Run complete
Runtime: 159.021
lowest = max_probability_bitstring(wave_func.probabilities())
true_clusters = [1 if val else 0 for val in labels]
acc = cluster_accuracy(lowest,true_clusters)
print('Cost Function Value:', res.fun)
print('Converged?:',res.message)
True Labels of samples: [1, 1, 1, 1, 1, 0, 0, 0, 0, 0]
Lowest QAOA State: [1, 0, 1, 0, 0, 0, 1, 1, 1, 1]
Accuracy of Original State: 30.0 %
Accuracy of Complement State: 70.0 %
Cost Function Value: -860.5133756234061
Converged?: Optimization terminated successfully.
You should typically find that increasing the number of allowed iterations gives a more accurate answer. The precise numbers will depend on which Pokemons are randomly selected at the beginning.