Basic portfolio optimization problem using CVXPY

Dr. Tirthajyoti Sarkar, Fremont, CA


Application of linear programming for portfolio optimization

One of the major goals of the modern enterprise of data science and analytics is to solve complex optimization problems for business and technology companies to maximize their profit.

In my article “Linear Programming and Discrete Optimization with Python”, we touched on basic discrete optimization concepts and introduced a Python library PuLP for solving such problems.

Although a linear programming (LP) problem is defined only by linear objective function and constraints, it can be applied to a surprisingly wide variety of problems in diverse domains ranging from healthcare to economics, business to military.

In this notebook, we show one such amazing application of LP using Python programming in the area of economic planning — maximizing the expected profit from a stock market investment portfolio while minimizing the risk associated with it.

How to maximize profit and minimize risk in the stock market?

The 1990 Nobel prize in Economics went to Harry Markowitz, acknowledged for his famous Modern Portfolio Theory (MPT), as it is known in the parlance of financial markets. The original paper was published long back in 1952.

markowitz

The key word here is Balanced.

A good, balanced portfolio must offer both protections (minimizing the risk) and opportunities (maximizing profit).

And, when concepts such as minimization and maximization are involved, it is natural to cast the problem in terms of mathematical optimization theory.

The fundamental idea is rather simple and is rooted in the innate human nature of risk aversion.

In general, stock market statistics show that higher risk is associated with a greater probability of higher return and lower risk with a greater probability of smaller return.

MPT assumes that investors are risk-averse, meaning that given two portfolios that offer the same expected return, investors will prefer the less risky one.

Think about it. You will collect high-risk stocks only if they carry a high probability of large return percentage.

But how to quantify the risk? It is a murky concept for sure and can mean different things to different people. However, in the generally accepted economic theory, the variability (volatility) of a stock price (defined over a fixed time horizon) is equated with risk.

Therefore, the central optimization problem is to minimize the risk while ensuring a certain amount of return in profits. Or, maximizing the profit while keeping the risk below a certain threshold.

An example problem

In this article, we will show a very simplified version of the portfolio optimization problem, which can be cast into an LP framework and solved efficiently using simple Python scripting.

The goal is to illustrate the power and possibility of such optimization solvers for tackling complex real-life problems. We work with 24 months stock price (monthly average) for three stocks — Microsoft, Visa, Walmart. These are older data but they demonstrate the process flawlessly.


In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from cvxpy import *

Read the data (please make sure that the CSV data file is in the same directory as this Notebook)

In [4]:
mp = pd.read_csv("Data/monthly_prices.csv",index_col=0)
mr = pd.DataFrame()
In [5]:
mp.head()
Out[5]:
MSFT V WMT
1 44.259998 69.660004 64.839996
2 52.639999 77.580002 57.240002
3 54.349998 79.010002 58.840000
4 55.480000 77.550003 61.299999
5 55.090000 74.489998 66.360001

Plot the data

In [6]:
plt.figure(figsize=(10,5))
plt.plot([i for i in range(1,25)],mp['MSFT'],lw=3,marker='o',markersize=12)
plt.plot([i for i in range(1,25)],mp['V'],lw=3,c='red',marker='^',markersize=12)
plt.plot([i for i in range(1,25)],mp['WMT'],lw=3,marker='*',markersize=12)
plt.legend(mp.columns,fontsize=16)
plt.xlabel("Months",fontsize=18)
plt.ylabel("Stock price (Monthly average)",fontsize=18)
plt.xticks(fontsize=16)
plt.yticks(fontsize=16)
plt.grid(True)
plt.show()

Compute monthly returns

In [ ]:
# compute monthly returns
for s in mp.columns:
    date = mp.index[0]
    pr0 = mp[s][date] 
    for t in range(1,len(mp.index)):
        date = mp.index[t]
        pr1 = mp[s][date]
        ret = (pr1-pr0)/pr0
        mr.set_value(date,s,ret)
        pr0 = pr1
In [9]:
mr.head()
Out[9]:
MSFT V WMT
2 0.189336 0.113695 -0.117212
3 0.032485 0.018433 0.027952
4 0.020791 -0.018479 0.041808
5 -0.007030 -0.039458 0.082545
6 -0.076420 -0.028192 -0.000301

Get symbol names

In [10]:
# get symbol names
symbols = mr.columns

Convert monthly return data frame to a numpy matrix

In [ ]:
return_data = mr.as_matrix().T
In [12]:
plt.figure(figsize=(10,5))
plt.plot([i for i in range(1,24)],100*mr['MSFT'],lw=3,marker='o',markersize=12)
plt.plot([i for i in range(1,24)],100*mr['V'],lw=3,c='red',marker='^',markersize=12)
plt.plot([i for i in range(1,24)],100*mr['WMT'],lw=3,marker='*',markersize=12)
plt.legend(mp.columns,fontsize=16)
plt.xlabel("Months",fontsize=18)
plt.ylabel("Monthly return (%))",fontsize=18)
plt.xticks(fontsize=16)
plt.yticks(fontsize=16)
plt.grid(True)
plt.show()

Mean return

In [13]:
r = np.asarray(np.mean(return_data, axis=1))

Covariance matrix

In [14]:
C = np.asmatrix(np.cov(return_data))
In [15]:
C
Out[15]:
matrix([[ 0.00336865,  0.0016328 , -0.00075249],
        [ 0.0016328 ,  0.00183242, -0.00056339],
        [-0.00075249, -0.00056339,  0.00197676]])
In [16]:
for j in range(len(symbols)):
    print ('%s: Exp ret = %f, Risk = %f' %(symbols[j],r[j], C[j,j]**0.5))
MSFT: Exp ret = 0.024611, Risk = 0.058040
V: Exp ret = 0.018237, Risk = 0.042807
WMT: Exp ret = 0.009066, Risk = 0.044461

Set up the optimization model

The library we are going to use for this problem is called CVXPY. It is a Python-embedded modeling language for convex optimization problems. It allows you to express your problem in a natural way that follows the mathematical model, rather than in the restrictive standard form required by solvers.

Note the use of extremely useful classes like quad_form() and Problem() from the CVXPY framework.

In [17]:
# Number of variables
n = len(symbols)

# The variables vector
x = Variable(n)

# The minimum return
req_return = 0.02

# The return
ret = r.T*x

# The risk in xT.Q.x format
risk = quad_form(x, C)

# The core problem definition with the Problem class from CVXPY
prob = Problem(Minimize(risk), [sum(x)==1, ret >= req_return, x >= 0])

Try solving the problem (within a try/except loop)

In [18]:
try:
    prob.solve()
    print ("Optimal portfolio")
    print ("----------------------")
    for s in range(len(symbols)):
       print (" Investment in {} : {}% of the portfolio".format(symbols[s],round(100*x.value[s],2)))
    print ("----------------------")
    print ("Exp ret = {}%".format(round(100*ret.value,2)))
    print ("Expected risk    = {}%".format(round(100*risk.value**0.5,2)))
except:
    print ("Error")
Optimal portfolio
----------------------
 Investment in MSFT : 58.28% of the portfolio
 Investment in V : 20.43% of the portfolio
 Investment in WMT : 21.29% of the portfolio
----------------------
Exp ret = 2.0%
Expected risk    = 3.83%
In [19]:
prob.status
Out[19]:
'optimal'
In [20]:
x.value
Out[20]:
array([0.58281755, 0.20432414, 0.21285832])

Read my detailed article

For underdstanding the theory and concepts more clearly, please read my article on this problem.

Optimization with Python: How to make the most amount of money with the least amount of risk?

Extending the problem

Needless to say that the setup and simplifying assumptions of our model can make this problem sound simpler than what it is. But once you understand the basic logic and the mechanics of solving such an optimization problem, you can extend it to multiple scenarios,

  • Hundreds of stocks, longer time horizon data
  • Multiple risk/return ratio and threshold
  • Minimize risk or maximize return (or both)
  • Investing in a group of companies together
  • Either/or scenario — invest either in Cococola or in Pepsi but not in both

You have to construct more complicated matrices and a longer list of constraints, use indicator variables to turn this into a mixed-integer problem - but all of these are inherently supported by packages like CVXPY.

Look at the examples page of the CVXPY package to know about the breadth of optimization problems that can be solved using the framework.