Python Finance Fundamentals

Finance Data & Analysis Tools Using Python


Last Updated: February 15, 2021 by Pepe Sandoval



Want to show support?

If you find the information in this page useful and want to show your support, you can make a donation

Use PayPal

This will help me to create more stuff and fix the existent content... or probably your money will be used to buy beer


Python Data & Analysis Tools for Finance

Finance Fundamentals

Portfolio Basics

  • A Portfolio is a set of allocations in a variety of assets. In order words is a set of weighted assets. E.x. you have 0.1 of your money in BTC, 0.2 in AMZN stocks, 0.3 in AAPL stocks and 0.4 in an ETF

  • Key statistics in a portfolio

    • Daily Returns The percent return from 1 day to the next
    • Normalized return The percent return from the day we started the investment to a certain day. Formula $N_R = \dfrac{\text{Current Stock price}}{\text{Stock price at day of investment}}$
    • Cumulative Return The amount returned after a certain time period
    • Avg. Daily Return Mean of the daily returns over a period of time (E.x. 15, 30 days)
    • STD Daily Return Standard Deviation of daily returns (Volatility of the daily return)

Sharpe Ratio

  • Quantifies the relationships between mean daily return and the std. daily return (volatility)
  • It is a measure of risk or more formally a measure of the risk-adjusted return
  • Formula $SR = \dfrac{R_p - R_f}{\sigma_p}$
    • $R_p$ Expected portfolio return
    • $R_f$ Risk-free return: return you would receive in a risk-free investment (E.x. bank account with constant return)
    • $\sigma_p$ Portfolio standard deviation
  • Annualized Sharpe Ratio (ASR):
    • Allow an investors to analyze how much greater a return he is obtaining in relation to the level of additional risk taken to generate that return.
    • ASR values around >=1 are considered good values, around >=2 is very good and >=3 is considered excellent
    • If you have daily data and want to get annual data you need to perform following conversions
      • Daily: $ASR = \sqrt{252} \cdot SR$
      • Weekly: $ASR = \sqrt{52} \cdot SR$
      • monthly: $ASR = \sqrt{12} \cdot SR$

Optimize holdings

  • Randomly guessing and checking allocations based on a the values of another statistics in general is known as Monte Carlo Simulation
    • E.x. We randomly assign weight to each asset in our portfolio, then we calculate daily return mean and STD of daily return and see which combinations gives us better results by evaluating the SR and ASR. We can plot return vs SR or ASR to see which combination gives betters results (there is 1-to-1 map between ASR & SR to a combination of allocations)
  • Minimization its just finding the value of independent variable ($x$) that gives us the minimum value of the dependent variable ($y$)
    • E.x. What value of $x$ minimize $y$ in equations 1) $y=x^2$ the answer is $x=0$ which gives a $y=0$ and 2) $y=(2-x)^2$ the answer is $x=20$ which gives a $y=0$
  • Finding the optimal value for a metric can be done using Optimization algorithm which are based on Minimization

  • We want to maximize Sharpe Ratio which means we can create an optimizer that attempts to minimize the negative Sharpe Ratio (inverse)

  • Border/Efficient Frontier indicates the highest return for a certain value of volatility

Border

import os
import seaborn as sns
import numpy as np
import pandas as pd
import matplotlib
matplotlib.use('agg')
import matplotlib.pyplot as plt


base_path = "/home/ubuntu/"
YOUR_QUANDL_API_KEY_HERE = "QQQQQQQQQQQQQ"
start, end = pd.to_datetime("2012-01-01"), pd.to_datetime("2017-01-01")

# Get data
aapl = pd.read_csv(os.path.join(base_path, "AAPL.csv"), sep=",", index_col='Date', parse_dates=True)
#appl = quandl.get("WIKI/AAPL.11", start_date=start, end_date=end, api_key=YOUR_QUANDL_API_KEY_HERE)
aapl.name = "AAPL"

cisco = pd.read_csv(os.path.join(base_path, "CISCO.csv"), sep=",", index_col='Date', parse_dates=True)
#cisco = quandl.get("WIKI/CSCO.11", start_date=start, end_date=end, api_key=YOUR_QUANDL_API_KEY_HERE)
cisco.name = "CISCO"

amzn = pd.read_csv(os.path.join(base_path, "AMZN.csv"), sep=",", index_col='Date', parse_dates=True)
#amzn = quandl.get("WIKI/AMZN.11", start_date=start, end_date=end, api_key=YOUR_QUANDL_API_KEY_HERE)
amzn.name = "AMZN"

ibm = pd.read_csv(os.path.join(base_path, "IBM.csv"), sep=",", index_col='Date', parse_dates=True)
#ibm = quandl.get("WIKI/IBM.11", start_date=start, end_date=end, api_key=YOUR_QUANDL_API_KEY_HERE)
ibm.name ="IBM"

stock_dfs = (aapl, cisco, ibm, amzn)
stocks_allocations = (0.3, 0.2, 0.4, 0.1)
assert(sum(stocks_allocations) == 1.0)
investment_amount = 10000
all_position_values = []
col_names = []
risk_free_rate = 0.0


for df, allo in zip(stock_dfs, stocks_allocations):
    df["Normed Return"] = df["Adj. Close"] / df.iloc[0]["Adj. Close"]
    df['Allocation'] = df["Normed Return"] * allo
    df["Position Values"] = df['Allocation'] * investment_amount
    all_position_values.append(df["Position Values"])
    col_names.append("{} Pos".format(str(df.name)))


portfolio =  pd.concat(all_position_values, axis=1)
portfolio.columns = col_names
portfolio['Total Pos'] = portfolio.sum(axis=1)
portfolio['Daily Returns'] = portfolio['Total Pos'].pct_change(1)
average_daily_return = portfolio['Daily Returns'].mean()
std_daily_return = portfolio['Daily Returns'].std()
cumulative_return_percent = 100*(portfolio['Total Pos'][-1]/portfolio['Total Pos'][0] - 1)
total_value = portfolio['Total Pos'][-1]
SR = (average_daily_return - risk_free_rate)/std_daily_return
ASR = (252**0.5) * SR

# Print portfolio data
#print(aapl.head()) ; print(cisco.head()) ; print(amzn.head()) ; print(ibm.head())
#print(aapl.tail()) ; print(cisco.tail()) ; print(amzn.tail()) ; print(ibm.tail())
#print(portfolio.head())
print("average_daily_return=", average_daily_return)
print("std_daily_return=", std_daily_return)
print("cumulative_return_percent=", cumulative_return_percent, "total_value=", total_value)
print("SR=", SR, "ASR=", ASR)

# Plot portfolio stuff
#portfolio["Total Pos"].plot(figsize=(10,8)).get_figure().savefig(os.path.join(base_path, "total.png"))
#portfolio.drop("Total Pos", axis=1).plot(figsize=(10,8)).get_figure().savefig(os.path.join(base_path, "positions.png"))
#portfolio["Daily Returns"].plot(kind='hist', bins=100, figsize=(4,5))
#portfolio["Daily Returns"].plot(kind='kde', figsize=(4,5)).get_figure().savefig(os.path.join(base_path, "hist_and_kde.png"))

import os
import seaborn as sns
import numpy as np
import pandas as pd
import matplotlib
matplotlib.use('agg')
import matplotlib.pyplot as plt
from scipy.optimize import minimize
np.random.seed(101)

risk_free_rate = 0.0
base_path = "/home/ubuntu/"
YOUR_QUANDL_API_KEY_HERE = "QQQQQQQQQQQQQ"
start, end = pd.to_datetime("2012-01-01"), pd.to_datetime("2017-01-01")

# Get data
aapl = pd.read_csv(os.path.join(base_path, "AAPL.csv"), sep=",", index_col='Date', parse_dates=True)
#appl = quandl.get("WIKI/AAPL.11", start_date=start, end_date=end, api_key=YOUR_QUANDL_API_KEY_HERE)
aapl.name = "AAPL"

cisco = pd.read_csv(os.path.join(base_path, "CISCO.csv"), sep=",", index_col='Date', parse_dates=True)
#cisco = quandl.get("WIKI/CSCO.11", start_date=start, end_date=end, api_key=YOUR_QUANDL_API_KEY_HERE)
cisco.name = "CISCO"

amzn = pd.read_csv(os.path.join(base_path, "AMZN.csv"), sep=",", index_col='Date', parse_dates=True)
#amzn = quandl.get("WIKI/AMZN.11", start_date=start, end_date=end, api_key=YOUR_QUANDL_API_KEY_HERE)
amzn.name = "AMZN"

ibm = pd.read_csv(os.path.join(base_path, "IBM.csv"), sep=",", index_col='Date', parse_dates=True)
#ibm = quandl.get("WIKI/IBM.11", start_date=start, end_date=end, api_key=YOUR_QUANDL_API_KEY_HERE)
ibm.name ="IBM"

stock_dfs = [aapl, cisco, ibm, amzn]

stocks = pd.concat(stock_dfs, axis=1)
stocks.columns = ["{}".format(str(df.name)) for df in stock_dfs]

#print(stocks.head())
#print(stocks.pct_change(1).mean())
#print(stocks.pct_change(1).corr())
#print(stocks.pct_change(1).head()) # daily returns in arithmetic

log_ret = np.log(stocks/stocks.shift(1)) # daily returns in log normalization
#print(log_ret.head())

number_of_portfolios = 500
all_weights = np.zeros((number_of_portfolios, len(stocks.columns))) # matrix of 'number_of_portfolios' elemenst each eleements is a list of `len(stocks.columns)` elements
expected_returns_array = np.zeros(number_of_portfolios)
expected_volatility_array = np.zeros(number_of_portfolios)
sharpe_ratio_arrays = np.zeros(number_of_portfolios)

def get_returns_volatility_sharpe_ratio(weights):
    expected_returns = np.sum(log_ret.mean() * weights * 252)
    expected_volatility = np.sqrt(np.dot(weights.T, np.dot(log_ret.cov()*252, weights)))
    sharpe_ratio = (expected_returns-risk_free_rate)/expected_volatility

    return np.array([expected_returns, expected_volatility, sharpe_ratio])

def negative_sharpe(weights):
    return get_returns_volatility_sharpe_ratio(weights)[2] * -1

def check_sum(weights):
    return (np.sum(weights) - 1)

# 1. Random allocation
for portfolio_index in range(number_of_portfolios):
    weights = np.array(np.random.random(len(stocks.columns)))
    weights = weights/np.sum(weights) # normalize so they all sum 1
    all_weights[portfolio_index, :] = weights

    ret = get_returns_volatility_sharpe_ratio(weights)
    expected_returns_array[portfolio_index] = ret[0]
    expected_volatility_array[portfolio_index] = ret[1]
    sharpe_ratio_arrays[portfolio_index] = ret[2]

    #print("index", portfolio_index, "weights=", weights) ; print("expected_return=",expected_returns_array[portfolio_index]) ; print("expected_volatility=",expected_volatility_array[portfolio_index]) ; print("sharpe_ratio=",sharpe_ratio_arrays[portfolio_index]) ; print("-"*10 +"\n")

max_sr, max_sr_index = sharpe_ratio_arrays.max(), sharpe_ratio_arrays.argmax()
max_sr_returns = expected_returns_array[max_sr_index]
max_sr_volatility = expected_volatility_array[max_sr_index]

#print("max SR", max_sr, "in index", max_sr_index, "optimal weights=", all_weights[max_sr_index], "max_sr_returns=", max_sr_returns, "max_sr_volatility", max_sr_volatility)


# 2. Optimization scipy function
constraints = ({"type": "eq", "fun": check_sum})
bounds = ((0,1), (0,1), (0,1), (0,1))

initial_guess = [0.25, 0.25, 0.25, 0.25]

opt_results = minimize(fun=negative_sharpe, x0=initial_guess, method="SLSQP", bounds=bounds, constraints=constraints)

print(opt_results)
results_weighst =  opt_results.x
ret = get_returns_volatility_sharpe_ratio(results_weighst)
max_sr_returns, max_sr_volatility, max_sr = ret[0], ret[1], ret[2]
#print("max SR", max_sr, "weights", results_weighst, "max_sr_returns=", max_sr_returns, "max_sr_volatility", max_sr_volatility)

# 3. Get optimal returns portfolios for levels of volatility
frontier_y = np.linspace(0, 0.3, 25)

def minimize_volatility(weights):
    return get_returns_volatility_sharpe_ratio(weights)[1]

frontier_volatility = []

for possible_return in frontier_y:
    constrainsts = constraints = ({"type": "eq", "fun": check_sum}, {"type": "eq", "fun": lambda w: get_returns_volatility_sharpe_ratio(w)[0]-possible_return})
    result = minimize(fun=minimize_volatility, x0=initial_guess, method="SLSQP", bounds=bounds, constraints=constraints)
    frontier_volatility.append(result['fun'])

# Do all plotting
plt.figure(figsize=(12,8))
plt.scatter(expected_volatility_array, expected_returns_array, c=sharpe_ratio_arrays, cmap='plasma')
plt.colorbar(label="Sharpe Ratio")
plt.xlabel("Volatility")
plt.ylabel("Return")
plt.scatter(max_sr_volatility, max_sr_returns, c="red", s=50, edgecolors="black")
plt.plot(frontier_volatility, frontier_y, "b--", linewidth=3)
plt.savefig(os.path.join(base_path, "sr_optimal.png"))

Fundamental Financial Concepts

  • Liquidity refers to how easy can you get your money in and out of an investment, high liquidity means it is very easy, low liquidity means it is hard or you need to wait for a certain period of time until a contract date is reached for example
  • ETF (Exchange Traded Funds)
    • Constituted of a combination of other funds, bonds, commodities, stocks, etc
    • Holdings are public (transparency) and individuals can trade buy and sell (high liquidity) titles of an ETF
    • Can be seen as a diversified portfolio used for long term investment
    • Can be operated by individuals
    • Liquidity: high - Buy/Sell just like a stock
    • Fee: Pay a expense ratio from 0.01% to 1% of the AUM (Assets Under Management)
  • Mutual Funds
    • An investment mechanism made of a pool of other funds (money) collected from many investors for the purpose of investing in stocks, bonds, other market instruments or any other assets
    • These are operated by money manager who is responsible of investing the fund's capital and attempting to produce gains for fund's investors
    • They usually need to disclose they holdings every certain time but this can vary (More less transparent)
    • Liquidity: medium - Buy/Sell at end of the day through a broker
    • Fee: Pay a expense ratio from 0.5% to 3%
  • Hedge Funds
    • Aggressive/Risky investment mechanism that uses pooled funds and employ multiple strategies to earn as much returns (alpha=returns) as possible for their investors
    • Usually only accessible to accredited investor
    • Don't need to disclose holdings to the public or investors (Not so transparent)
    • Liquidity: low - Depends on agreement (could be every week month, 6 months year, etc)
    • Fee: 1% or 2% fund and 10 or20% of profits (2 & 20 rule or 1 & 10 rule)
  • Some stocks pay Dividends, so for each stock a shareholder has, he receives some payout at certain defined date, this usually cause price to jump before dividend announcement and then drop after dividend pay out

  • Stock Splits usually occur if the price of an individual stock become really high, so a company creates a ratio split (e.g. 2:1, 3:1, 4:1, 5:1) this is the reason the Adj. Close price exists, which adjusts the historical prices to match up and take into account the stock splits, also takes into account dividends. This is the reason it is important to use adjusted close/open prices for historical analysis

  • Survivorship Bias: if you are using S&P500 as an indicator the time period you pick can matter since the list of companies has changed trough the years

  • EMH (Efficient Market Hypothesis) is an investment theory that stats it is impossible to "beat the market" so it is not possible for trades to purchase undervalued stocks or sell stocks for inflated prices

Order & Order Books

  • When you want to buy or sell and asset what is happening is that you are creating an Order in a broker, then it goes to an exchange (or multiple exchanges), once the exchange receives the order it goes in to an Order Book which is just a list of Buy and sell orders
  • Order Information:
    • Buy or sell action type
    • Symbol: pair to trade
    • Number of shares or assets
    • Fulfill type: LIMIT, MARKET, etc.
    • Price (only needed for LIMIT or other types)
  • Orders Scenarios:
    1. You say you want to buy an asset that goes to the broker, then to the exchanges
    2. You say you want to buy an asset other client of the same broker says he wants to sell that same asset so the transaction can be kept internal by the broker but usually by law broker also has to guarantee you get the same price you could have gotten in exchange
    3. You say you want to buy an asset other client on a different broker says he want to sell that same assets, it can happen that brokers use and intermediary (a dark pool) so the trade never gets to exchanges but it is performed
      • Dark pool It is a private exchange that pays brokers to see orders before they hit the exchange, although it looks shady it can work as protection if you want to make a massive trade once it hits the exchange it is public so buyers and sellers there can rapidly change to benefit from your trade.E.x. You say you want to sell 1 billion of certain stock since order is so big if it reaches the exchanges while it is being fulfilled sellers of that stock can start increasing their sell price which makes you pay more for what you want to buy

Orders Scenarios

  • High Frequency Trading (HFT) or Latency Arbitrage
    • Technique to try to take advantage of latency differences due to geographical distances (Times are in order of microseconds)
    • Affects or works only large orders that need to be divided between exchanges in order to be fulfilled. In other words it happens when you put a large order that requires multiple exchanges to fulfill the order
    • General process
      1. You put an order for 90 stocks at certain price 2. Because it is a large order it is divided between 3 exchanges but the whole order information is disclosed. E.x. it says I will buy 30 stocks on first exchange but this is part of larger order of 90 stocks
      2. It reaches first exchange at a certain time (time 0) with all the information but will reach last exchange after 2 ms of that
      3. If there is a HFT server close to first exchange it knows there will be other 2 orders of 30 stocks each in the other exchanges
      4. Since this HFT server now knows it will take you 2ms to reach last exchange and also using technology to reach that exchange faster (wave towers, direct fiber, etc) it buys the stock at market price then puts a sell order specifying a sell price just a little higher to what it just paid
      5. Once you reach the last exchange you end up buying stocks from the server that just bought the shares and right after that placed a sell order

HFT Scenario

  • Short-Selling
    • It is a technique that allows you to profit if a stock drops in price
    • It comes with great risk because there could be no limit to the amount of you can lose
    • The Short selling transactions happen through a broker that works as intermediary. General process:
      1. A certain stock is currently valued at 500 USD you think it will drop and somebody has 10 shares of this stock
      2. You borrow the 10 stocks from that somebody promise to return the stocks at a future date
      3. You sell the shares and get 5000 USD, then place an order to buy 10 stocks at 100 USD each (invest 1000 USD total)
      4. Scenarios:
        • 4.a) You are correct and your order is fulfilled now you have back the 10 stocks so you return those stocks and also you have 4000 USD that you can keep and everybody is happy
        • 4.b) You are wrong and now each stock costs 1000 USD so your order is not fulfilled and the date when you said you will return the stocks has passed so you need to buy stocks at this higher price (need to take 10000 USD out of your pocket) to return the stocks

CAPM

  • We can define the return of a portfolio at some time $t$ as the following formula for $r_p(t)$ where:
    • $n$: Number of assets, securities or stocks
    • $w_i$: Weight of a particular asset $i$
    • $r_i(t)$: Return of a particular asset $i$ at time $t$

$$r_p(t) = \sum_{i=1}^n w_i r_i(t)$$

  • We can also image the whole market as a portfolio where the weight can be obtained using the market cap (Total number of shares * price of share), de dive by the whole market cap of the market which can be obtained by summing all the market caps

$$w_i = \dfrac{\text{MarketCap}_i}{\sum_{j=1}^n \text{MarketCap}_j}$$

  • The CAPM equation describes the return of an individual stock $i$, it states the return of a given stock is defined by the entire market multiplied by a particular factor for this stock $\beta_i$ plus an adjustment or residual factor $\alpha_i(t)$ which is random and cannot be predicted but it can be considered zero or close to zero for CAPM (in other words that it can be ignored)
    • $r_i(t)$ Return of particular asset $i$ at time $t$
    • $r_m(t)$ Return of entire market at time $t$
    • $\beta_i$ Market factor. For example if $\beta_i=1$ means asset moves in line with market, if $\beta_i=2$ means it moves up and down twice as much as the market
    • $\alpha_i(t)$ adjustment factor that is considered zero or very close to zero in CAPM

$$r_i(t) = \beta_i r_m(t) + \alpha_i(t)$$

  • In general what CAPM model is expressing is that there is some relationship between our portfolio return and the overall market return, this relationship is $\beta_i$ and we can express this as:

$$r_p(t) = \beta_i r_m(t) + \alpha_i(t)$$

import os
import datetime
from scipy import stats
import pandas as pd
import pandas_datareader as web
import matplotlib.pyplot as plt
import numpy as np

base_path = "."
SYM = "AMZN"
stock = pd.read_csv(os.path.join(base_path, "{}.csv".format(SYM)), index_col='Date', parse_dates=True)
spy_etf = df = pd.read_csv(os.path.join(base_path, "SPY.csv"), sep=",", index_col='Date', parse_dates=True)

#stock["close"].plot(label=SYM)
#spy_etf["Close"].plot(label="SPY ETF")

stock["cumulative"] = stock["close"]/stock["close"].iloc[0]
stock["daily return"] = stock["close"].pct_change(1)
spy_etf["Cumulative"] = spy_etf["Close"]/spy_etf["Close"].iloc[0]
spy_etf["Daily Return"] = spy_etf["Close"].pct_change(1)

# Compare cumulative returns
# stock["cumulative"].plot(label=SYM)
# spy_etf["Cumulative"].plot(label="SPY ETF")

# Check correlation, if there is high correlation we expect to see a line
# plt.scatter(stock["daily return"], spy_etf["Daily Return"], alpha=0.25)

# High beta means asset behaves pretty much like the market
beta, alpha, r_value, p_value, std_err = stats.linregress(stock["daily return"].iloc[1:], spy_etf["Daily Return"].iloc[1:])
print("CAPM for {}\nbeta".format(SYM), beta, "\nalpha", alpha, "\nr_value", r_value, "\np_value", p_value, "\nstd_err", std_err)

# Simulate a stock that behaves a lot like the market
noise = np.random.normal(0, 0.001, len(spy_etf["Daily Return"].iloc[1:]))
fake_stock = spy_etf["Daily Return"].iloc[1:] + noise
beta, alpha, r_value, p_value, std_err = stats.linregress(fake_stock, spy_etf["Daily Return"].iloc[1:])
print("CAPM for fake stock\nbeta", beta, "\nalpha", alpha, "\nr_value", r_value, "\np_value", p_value, "\nstd_err", std_err)
# plt.scatter(fake_stock, spy_etf["Daily Return"].iloc[1:], alpha=0.25)

# print(stock.head())
# print(spy_etf.head())

# plt.legend()
# plt.show()

Algorithm Trading on a Trading platform

  • Blueshift® is a platform that allows coders to write, test and backtest investment algorithms
  • initialize:
    • Called once when our algorithm starts
    • Requires context as input
      • context is an object which is basically an extender dictionary used to maintain the status of your algorithm during the backtest or live trading
      • Used instead of global variables
  • handle_data:
    • Called once at the end of each minute
    • Requires context and data as inputs
  • data object with methods to adjust portfolio and check historical data

from datetime import datetime
AAPL = 0
CSCO = 1
AMZN = 2
IBM = 3

# Zipline
from zipline.api import(    symbol,
                            order_target_percent,
                            schedule_function,
                            date_rules,
                            time_rules,
                            record
                       )

def initialize(context):
    """
        A function to define things to do at the start of the strategy
    """
    context.tech_stocks = [symbol("AAPL") , symbol("CSCO"), symbol("AMZN"), symbol("IBM")]

    ## Uncomment one of the following sets
    # 1 Record
    schedule_function(rebalance, date_rules.every_day(), time_rules.market_open())
    schedule_function(record_vars, date_rules.every_day(), time_rules.market_close())

    # 2 schedule to do something witha certain frequency, like adjust portfolio
    #schedule_function(open_positions, date_rules.week_start(), time_rules.market_open())
    #schedule_function(close_positions, date_rules.week_end(), time_rules.market_close())

def rebalance(context, data):
    order_target_percent(context.tech_stocks[AMZN], 0.5)
    order_target_percent(context.tech_stocks[IBM], -0.5) #Short sell ibm

def record_vars(context, data):
    record(amzn_close=data.current(context.tech_stocks[AMZN], 'close'))
    record(ibm_close=data.current(context.tech_stocks[IBM], 'close'))

def open_positions(context, data):
    order_target_percent(context.tech_stocks[AAPL], 0.1)

def close_positions(context, data):
    order_target_percent(context.tech_stocks[AAPL], 0)

def handle_data(context, data):
    print(datetime.now().strftime("%d/%m/%Y %H:%M:%S"))
    print(data.is_stale(symbol("AAPL")), data.can_trade(symbol("AAPL")))
    price_history = data.history(context.tech_stocks, fields="price", bar_count=5, frequency='1d')
    print(price_history)
    tech_close = data.current(context.tech_stocks, 'close')
    print(type(tech_close))
    print(tech_close)
    print("-"*15 + "\n")
  • Pairs Trading is a strategy that takes two or more securities we believe are paired or related to each other in some way (e.g. Pepsi and coca-cola, two airlines, two retail companies, etc.) and creates an algorithm based on that relationship
    • A general way to know if they is a relation ship is to plot them ads see if the graphs look similar or get their correlation matrix np.corrcoef(series1, series2) and see if their correlation coefficient is close to 1
    • price trading makes the assumption that changes in spread are trading opportunity
## Research script pure python
import os
import seaborn as sns
import numpy as np
import pandas as pd
import matplotlib
import datetime
import pandas_datareader.data as web
matplotlib.use('agg')
import matplotlib.pyplot as plt

# takes a time series and normalizes it
def zscore(stock):
    return (stock-stock.mean())/np.std(stock)

base_path = "/home/ubuntu/"
start, end = datetime.datetime(2015, 1, 1), datetime.datetime(2017, 1, 1)

ual = pd.read_csv(os.path.join(base_path, "UAL.csv"), index_col='Date', parse_dates=True)
american = pd.read_csv(os.path.join(base_path, "AMER.csv"), index_col='Date', parse_dates=True)

corr_matrix = np.corrcoef(american['close'], ual['close'])

#print(ual.head()) ; print(american.head())
print(corr_matrix)

dif = american['close'] - ual['close']
spread = zscore(dif)
spread_mavg1 = dif.rolling(1).mean()
spread_mavg30 = dif.rolling(30).mean()
std_30 = dif.rolling(30).std()
zscore_30_1 = (spread_mavg1-spread_mavg30)/std_30

fig = plt.figure() ; american['close'].plot(label='AAL') ; ual['close'].plot(label='UAL'); plt.legend() ; fig.savefig(os.path.join(base_path, "airlines.png"))
fig = plt.figure() ;spread.plot(label='Spread') ; plt.axhline(spread.mean(), c='b') ; plt.axhline(1.0, c='g', ls='--') ; plt.axhline(-1.0, c='r', ls='--') ; plt.legend() ; fig.savefig(os.path.join(base_path, "spread.png"))
fig = plt.figure() ; zscore_30_1.plot(label='Rolling 30 day z score') ; plt.axhline(0, c='b', ls='--') ; plt.axhline(1.0, c='r', ls='--') ; fig.savefig(os.path.join(base_path, "zscore_30_1.png"))


## Blueshift Implementation
import numpy as np

# Zipline
from zipline.api import(    symbol,
                            order_target_percent,
                            schedule_function,
                            date_rules,
                            time_rules,
                            record,
                       )

def initialize(context):
    context.aa = symbol("AAL")
    context.ual = symbol("UAL")
    context.long_on_spread = False
    context.short_spread = False
    schedule_function(check_pairs, date_rules.every_day(), time_rules.market_close(minutes=60))

def check_pairs(context, data):
    aa = context.aa
    ual = context.ual

    prices = data.history([aa, ual], 'price', 30, '1d')

    short_prices = prices.iloc[-1:]
    mavg_30 = np.mean(prices[aa] - prices[ual])
    std_30 = np.std(prices[aa] - prices[ual])

    if std_30 > 0:
        mavg_1 = np.mean(short_prices[aa] - short_prices[ual])
        zscore = (mavg_1-mavg_30)/std_30
        if zscore > 1.0 and not context.short_spread:
            order_target_percent(aa, -0.5)
            order_target_percent(ual, 0.5)
            context.short_spread = True
            context.long_on_spread = False
        elif zscore < 1.0 and not context.long_on_spread:
            order_target_percent(aa, 0.5)
            order_target_percent(ual, -0.5)
            context.short_spread = False
            context.long_on_spread = True
        elif abs(zscore) < 0.1:
            order_target_percent(aa, 0)
            order_target_percent(ual, 0)
            context.long_on_spread = False
            context.short_spread = False

        record(zscore=zscore)
import numpy as np

# Zipline
from zipline.api import(    symbol,
                            order_target_percent,
                            schedule_function,
                            date_rules,
                            time_rules,
                            record,
                       )

def initialize(context):
    context.stock = symbol("JNJ")
    schedule_function(check_stock_bands, date_rules.every_day())

def check_stock_bands(context, data):
    stock = context.stock

    current_price = data.current(stock, 'price')
    historical_prices = data.history(stock, 'price', 20, '1d')

    avg = historical_prices.mean()
    std = historical_prices.std()
    lower_band = avg - 2*std
    upper_band = avg + 2*std

    date = historical_prices.index[-1].strftime('%Y-%m-%d')
    if current_price <= lower_band:
        print(date, '- BUYING', stock, 'UPPER', upper_band, 'LOWER', lower_band, 'CURR', current_price, 'MAVG20', avg)
        order_target_percent(stock, 1.0)
    elif current_price >= upper_band:
        print(date, '- SELLING', stock, 'UPPER', upper_band, 'LOWER', lower_band, 'CURR', current_price, 'MAVG20', avg)
        order_target_percent(stock, 0)
    else:
        pass

    record(upper=upper_band, lower=lower_band, curr=current_price, mavg_20=avg)
  • Trading platforms pipelines are sued for algorithms that follow a structure (set of steps you need to perform in a certain order). Pipelines provide an API to implement the general structure of the algorithms which is:
    1. Compute scalar values for all assets (for example 20 MAVG)
    2. Select a smaller set of assets based on the values computed in previous step
    3. Set/adjust desired portfolio weights on the selected assets
    4. Place orders on assets to reflect selection and desired weights
  • Classifier is a function that transforms the input of an asset and a timestamp to a categorical output. e.x. taking AAPL and 2017 as inputs and returns it belongs to tech sector

  • A Factor is a function that takes in an asset and a timestamp and returns a numerical value such as the 20 day Moving Average

  • Filters take in an asset and timestamp and return a Boolean while Screens allows you to actually select rows where that filter is true and Masking allows you to ignore assets, the main differences with screens is that masks take place at the beginning and screens & filters take place at the end

  • Leverage is the ability to borrow money for use in investing, in trading it can be seen as reinvesting bet to gain a greater return on our investment
    • Leverage is usually provided by the broker (Terms and conditions depend on the broker)
    • Measured in terms of ratio (the sum of your bet and capital divided by your capital base) \text{Leverage Ratio} = (\text{Debt} + \text{Base Capital})/\text{Base Capital}
import os
import datetime
from scipy import stats
import pandas as pd
import numpy as np
import pandas_datareader as web
import matplotlib.pyplot as plt
from statsmodels import regression
import statsmodels.api as sm


def alpha_beta(benchmark_daily_returns, stock_daily_returns):
    stock_values = stock_daily_returns.values
    benchmark_values = benchmark_daily_returns.values

    benchmark_const = sm.add_constant(benchmark_values)
    model = regression.linear_model.OLS(stock_values, benchmark_const).fit()
    alpha, beta = model.params

    return alpha, beta

base_path = "."
SYM = "AMZN"

stock = pd.read_csv(os.path.join(base_path, "{}_2016.csv".format(SYM)), index_col='Date', parse_dates=True)
spy_etf = df = pd.read_csv(os.path.join(base_path, "SPY_2016.csv"), sep=",", index_col='Date', parse_dates=True)

stock["cumulative"] = stock["close"]/stock["close"].iloc[0]
stock["daily return"] = (stock["close"].pct_change(1))
spy_etf["Cumulative"] = spy_etf["Close"]/spy_etf["Close"].iloc[0]
spy_etf["Daily Return"] = (spy_etf["Close"].pct_change(1))

stock_daily_return = stock["daily return"].dropna()
spy_etf_daily_return = spy_etf["Daily Return"].dropna()

alpha, beta = alpha_beta(benchmark_daily_returns=spy_etf_daily_return, stock_daily_returns=stock_daily_return)
print(alpha, beta)

min_spy = spy_etf_daily_return.min()
max_spy = spy_etf_daily_return.max()
spy_line = np.linspace(min_spy, max_spy, 100)
y = spy_line*beta + alpha

plt.figure(figsize=(12, 6))
plt.plot(spy_line, y, 'r')
plt.scatter(spy_etf["Daily Return"], stock["daily return"], alpha=0.6, s=50)
plt.xlabel("SPY Ret")
plt.ylabel("{} Ret".format(SYM))

hedged = -1*(beta*spy_etf_daily_return) + stock_daily_return
alpha, beta = alpha_beta(spy_etf_daily_return, hedged)
print("HEDGED", alpha, beta)
print(hedged.mean(), hedged.std())
print(stock_daily_return.mean(), stock_daily_return.std())

plt.figure(figsize=(12, 6))
hedged.plot(label="{} with Hedge".format(SYM), alpha=0.9)
stock_daily_return.plot(label=SYM, alpha=0.5)
spy_etf_daily_return.plot(label="SPY ETF", alpha=0.5)
plt.xlim(['2016-06-01', '2016-08-01'])
plt.legend()
plt.show()
  • Sentiment Analysis uses NLP (Natural Language Processing) to attempt to detect sentiment in some text
    • Examples of Sentiment Analysis:
      • "This is bad! It is going to fail" -> Negative sentiment
      • "Awesome! Good job you will win" -> Positive sentiment
    • Impact is a measure of how likely a stock price will change as a result of the sentiment, usually goes from 0 to 100 while Sentiment goes from -1 (negative) to 1 (positive)
  • Forward Contracts are an agreement between two parties to pay a delivery price $K$ for some asset at some future time while the actual market price at time of that asset is $S_T$
    • The person selling is in the short position while the person buying it in the future time is in the long position
      • Pay off for long position is $S_T-K$
      • Pay off for short position is $K-S_T$
    • These contract actually reduces risk on both sides because parties know how much price will be in the future but depending on how the contract is defined parties need to wait until contract matures to execute transactions (lack of liquidity)
    • Example: Bill (long position, he thinks price will go up) agrees to pay Sandy (short position, she thinks price will go down) 100 USD in one year from now for a barrel of gasoline, the one who wins depends on the price of gasoline. One year passes scenarios:
      • Price of barrel of gasoline is 150USD so Sandy lost potential profit and Bill save up 50USD
      • Price of barrel of gasoline is 50USD so Sandy gets more profit and Bill ends up paying an extra 50USD
  • derivatives are contracts between two or more entities, the value of the contract is based on an agreed-upon underlying financial asset (like a stock, market index, currency, etc)
  • Futures
    • They re an example of a simple derivative
    • They are forward contracts that have standardized for trade on an exchange, this means the exchange is a third entity that handles the transaction and manages the contract
    • The only difference between a futures contract and a forward contracts is the exchange that acts as a middle man
    • They are defined daily based on the price of the underlying asset which increases or decreases the amount of money in a margin account

References

Want to show support?

If you find the information in this page useful and want to show your support, you can make a donation

Use PayPal

This will help me to create more stuff and fix the existent content... or probably your money will be used to buy beer