CVXPY
¶One of the major goals of the modern enterprise of data science and analytics is to solve complex optimization problems for business and technology companies to maximize their profit.
In my article “Linear Programming and Discrete Optimization with Python”, we touched on basic discrete optimization concepts and introduced a Python library PuLP for solving such problems.
Although a linear programming (LP) problem is defined only by linear objective function and constraints, it can be applied to a surprisingly wide variety of problems in diverse domains ranging from healthcare to economics, business to military.
In this notebook, we show one such amazing application of LP using Python programming in the area of economic planning — maximizing the expected profit from a stock market investment portfolio while minimizing the risk associated with it.
The 1990 Nobel prize in Economics went to Harry Markowitz, acknowledged for his famous Modern Portfolio Theory (MPT), as it is known in the parlance of financial markets. The original paper was published long back in 1952.
The key word here is Balanced.
A good, balanced portfolio must offer both protections (minimizing the risk) and opportunities (maximizing profit).
And, when concepts such as minimization and maximization are involved, it is natural to cast the problem in terms of mathematical optimization theory.
The fundamental idea is rather simple and is rooted in the innate human nature of risk aversion.
In general, stock market statistics show that higher risk is associated with a greater probability of higher return and lower risk with a greater probability of smaller return.
MPT assumes that investors are risk-averse, meaning that given two portfolios that offer the same expected return, investors will prefer the less risky one.
Think about it. You will collect high-risk stocks only if they carry a high probability of large return percentage.
But how to quantify the risk? It is a murky concept for sure and can mean different things to different people. However, in the generally accepted economic theory, the variability (volatility) of a stock price (defined over a fixed time horizon) is equated with risk.
Therefore, the central optimization problem is to minimize the risk while ensuring a certain amount of return in profits. Or, maximizing the profit while keeping the risk below a certain threshold.
In this article, we will show a very simplified version of the portfolio optimization problem, which can be cast into an LP framework and solved efficiently using simple Python scripting.
The goal is to illustrate the power and possibility of such optimization solvers for tackling complex real-life problems. We work with 24 months stock price (monthly average) for three stocks — Microsoft, Visa, Walmart. These are older data but they demonstrate the process flawlessly.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from cvxpy import *
mp = pd.read_csv("Data/monthly_prices.csv",index_col=0)
mr = pd.DataFrame()
mp.head()
plt.figure(figsize=(10,5))
plt.plot([i for i in range(1,25)],mp['MSFT'],lw=3,marker='o',markersize=12)
plt.plot([i for i in range(1,25)],mp['V'],lw=3,c='red',marker='^',markersize=12)
plt.plot([i for i in range(1,25)],mp['WMT'],lw=3,marker='*',markersize=12)
plt.legend(mp.columns,fontsize=16)
plt.xlabel("Months",fontsize=18)
plt.ylabel("Stock price (Monthly average)",fontsize=18)
plt.xticks(fontsize=16)
plt.yticks(fontsize=16)
plt.grid(True)
plt.show()
# compute monthly returns
for s in mp.columns:
date = mp.index[0]
pr0 = mp[s][date]
for t in range(1,len(mp.index)):
date = mp.index[t]
pr1 = mp[s][date]
ret = (pr1-pr0)/pr0
mr.set_value(date,s,ret)
pr0 = pr1
mr.head()
# get symbol names
symbols = mr.columns
return_data = mr.as_matrix().T
plt.figure(figsize=(10,5))
plt.plot([i for i in range(1,24)],100*mr['MSFT'],lw=3,marker='o',markersize=12)
plt.plot([i for i in range(1,24)],100*mr['V'],lw=3,c='red',marker='^',markersize=12)
plt.plot([i for i in range(1,24)],100*mr['WMT'],lw=3,marker='*',markersize=12)
plt.legend(mp.columns,fontsize=16)
plt.xlabel("Months",fontsize=18)
plt.ylabel("Monthly return (%))",fontsize=18)
plt.xticks(fontsize=16)
plt.yticks(fontsize=16)
plt.grid(True)
plt.show()
r = np.asarray(np.mean(return_data, axis=1))
C = np.asmatrix(np.cov(return_data))
C
for j in range(len(symbols)):
print ('%s: Exp ret = %f, Risk = %f' %(symbols[j],r[j], C[j,j]**0.5))
The library we are going to use for this problem is called CVXPY
. It is a Python-embedded modeling language for convex optimization problems. It allows you to express your problem in a natural way that follows the mathematical model, rather than in the restrictive standard form required by solvers.
Note the use of extremely useful classes like quad_form()
and Problem()
from the CVXPY
framework.
# Number of variables
n = len(symbols)
# The variables vector
x = Variable(n)
# The minimum return
req_return = 0.02
# The return
ret = r.T*x
# The risk in xT.Q.x format
risk = quad_form(x, C)
# The core problem definition with the Problem class from CVXPY
prob = Problem(Minimize(risk), [sum(x)==1, ret >= req_return, x >= 0])
try/except
loop)¶try:
prob.solve()
print ("Optimal portfolio")
print ("----------------------")
for s in range(len(symbols)):
print (" Investment in {} : {}% of the portfolio".format(symbols[s],round(100*x.value[s],2)))
print ("----------------------")
print ("Exp ret = {}%".format(round(100*ret.value,2)))
print ("Expected risk = {}%".format(round(100*risk.value**0.5,2)))
except:
print ("Error")
prob.status
x.value
For underdstanding the theory and concepts more clearly, please read my article on this problem.
Optimization with Python: How to make the most amount of money with the least amount of risk?
Needless to say that the setup and simplifying assumptions of our model can make this problem sound simpler than what it is. But once you understand the basic logic and the mechanics of solving such an optimization problem, you can extend it to multiple scenarios,
You have to construct more complicated matrices and a longer list of constraints, use indicator variables to turn this into a mixed-integer problem - but all of these are inherently supported by packages like CVXPY.
Look at the examples page of the CVXPY package to know about the breadth of optimization problems that can be solved using the framework.