Interview Prep: Pandas

https://pandas.pydata.org/pandas-docs/stable/cookbook.html#cookbook-selection
https://gridwizard.wordpress.com/2018/10/09/interview-prep-pandas/

pandas DataFrame
Creating a DataFrame, and some rows:
Example 1. No index (Or default index = 0,1,2,3..etc)
columns=[‘Date’,’direction’,’size’,’ticker’,’tradePrices’]
orders = pd.DataFrame(columns=columns)
orders.loc[0] = [‘2011-01-10′,’Buy’,1500,’AAPL’,339.44]
orders.loc[1] = [‘2011-01-13′,’Sell’,1500,’AAPL’,342.64]
orders.loc[2] = [‘2011-01-13′,’Buy’,4000,’IBM’,143.92]
orders.loc[3] = [‘2011-01-26′,’Buy’,1000,’GOOG’,616.50]
orders.loc[4] = [‘2011-02-02′,’Sell’,4000,’XOM’,79.46]
orders.loc[5] = [‘2011-02-10′,’Buy’,4000,’XOM’,79.68]
orders.loc[6] = [‘2011-03-03′,’Sell’,1000,’GOOG’,609.56]
orders.loc[7] = [‘2011-03-03′,’Sell’,2200,’IBM’,158.73]
orders.loc[8] = [‘2011-06-03′,’Sell’,3300,’IBM’,160.97]
orders.loc[9] = [‘2011-05-03′,’Buy’,1500,’IBM’,167.84]
orders.loc[10] = [‘2011-06-10′,’Buy’,1200,’AAPL’,323.03]
orders.loc[11] = [‘2011-08-01′,’Buy’,55,’GOOG’,606.77]
orders.loc[12] = [‘2011-08-01′,’Sell’,55,’GOOG’,606.77]
orders.loc[13] = [‘2011-12-20′,’Sell’,1200,’AAPL’,392.46]

To set a value of a particular cell:
orders.iat[0,1] = “Sell”

Or:
orders.at[0,”direction”] = “Buy”

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.iat.html
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.at.html

Example 2. Index = dates
columns=[‘AAPL’,’GOOG’,’IBM’,’XOM’]
index = [‘2011-01-10′,’2011-01-13′,’2011-01-26′,’2011-02-02′,’2011-02-10′,’2011-03-03′,’2011-05-03′,’2011-06-03′,’2011-06-10′,’2011-08-01′,’2011-12-20’]
prices = pd.DataFrame(columns=columns, index=index)
prices.iloc[0]=[339.441,614.219,142.781,71.571]
prices.iloc[1]=[342.642,616.698,143.922,73.083]
prices.iloc[2]=[340.823,616.507,155.743,75.895]
prices.iloc[3]=[341.294,612.006,157.934,79.467]
prices.iloc[4]=[351.425,616.445,159.325,79.689]
prices.iloc[5]=[356.406,609.564,158.736,82.192]
prices.iloc[6]=[345.147,533.893,167.847,82.004]
prices.iloc[7]=[340.428,523.082,160.978,78.196]
prices.iloc[8]=[323.039,509.511,159.149,76.848]
prices.iloc[9]=[393.261,606.779,176.281,76.671]
prices.iloc[10]=[392.462,630.378,184.142,79.973]

orders[‘streetPrices’] = prices.lookup(orders.Date, orders.ticker)

Basic example, showing:
a) initialize DataFrame
b) Calculated field “spread” = bid – offer
c) Merge Join DataFrames
d) Sort multiple fields

exchangeBalances = [
[‘ETHBTC’,’binance’,10],
[‘LTCBTC’,’binance’,10],
[‘XRPBTC’,’binance’,10],
[‘ETHBTC’,’bitfinex’,10],
[‘LTCBTC’,’bitfinex’,10],
[‘XRPBTC’,’bitfinex’,10]
]
bidOffers = [
[‘ETHBTC’,’binance’, 0.0035, 0.0351, datetime(2018, 9, 1, 8, 15)], [‘LTCBTC’,’binance’,0.009,0.092, datetime(2018, 9, 1, 8, 15)], [‘XRPBTC’,’binance’,0.000077, 0.000078, datetime(2018, 9, 1, 8, 15)], [‘ETHBTC’,’bitfinex’, 0.003522, 0.0353, datetime(2018, 9, 1, 8, 15)], [‘LTCBTC’,’bitfinex’,0.0093,0.095, datetime(2018, 9, 1, 8, 15)], [‘XRPBTC’,’bitfinex’,0.000083, 0.000085, datetime(2018, 9, 1, 8, 15)],
[‘ETHBTC’,’binance’, 0.0035, 0.0351, datetime(2018, 9, 1, 8, 30)], [‘LTCBTC’,’binance’,0.009,0.092, datetime(2018, 9, 1, 8, 30)], [‘XRPBTC’,’binance’,0.000077, 0.000078, datetime(2018, 9, 1, 8, 30)], [‘ETHBTC’,’bitfinex’, 0.003522, 0.0353, datetime(2018, 9, 1, 8, 30)], [‘LTCBTC’,’bitfinex’,0.0093,0.095, datetime(2018, 9, 1, 8, 30)], [‘XRPBTC’,’bitfinex’,0.000083, 0.000085, datetime(2018, 9, 1, 8, 30)],
[‘ETHBTC’,’binance’, 0.0035, 0.0351, datetime(2018, 9, 1, 8, 45)], [‘LTCBTC’,’binance’,0.009,0.092, datetime(2018, 9, 1, 8, 45)], [‘XRPBTC’,’binance’,0.000077, 0.000078, datetime(2018, 9, 1, 8, 45)], [‘ETHBTC’,’bitfinex’, 0.003522, 0.0353, datetime(2018, 9, 1, 8, 45)], [‘LTCBTC’,’bitfinex’,0.0093,0.095, datetime(2018, 9, 1, 8, 45)], [‘XRPBTC’,’bitfinex’,0.000083, 0.000085, datetime(2018, 9, 1, 8, 45)]
]
dfExchangeBalances = pd.DataFrame(exchangeBalances, columns=[‘symbol’,’exchange’,’balance’])
dfBidOffers = pd.DataFrame(bidOffers, columns=[‘ticker’,’exchange’,’bid’, ‘offer’, ‘created’])
dfBidOffers[“spread”] = dfBidOffers[“bid”] – dfBidOffers[“offer”]
dfSummary = dfExchangeBalances.merge(dfBidOffers, how=’left’, left_on=[‘symbol’,’exchange’], right_on=[‘ticker’,’exchange’])
dfSummary = dfSummary.sort_values(by=[‘symbol’,’exchange’,’created’], ascending=[True, True, False])

>>> dfExchangeBalances
symbol exchange balance
0 ETHBTC binance 10
1 LTCBTC binance 10
2 XRPBTC binance 10
3 ETHBTC bitfinex 10
4 LTCBTC bitfinex 10
5 XRPBTC bitfinex 10
>>> dfBidOffers
symbol exchange bid offer
0 ETHBTC binance 0.003500 0.035100
1 LTCBTC binance 0.009000 0.092000
2 XRPBTC binance 0.000080 0.000078
3 ETHBTC bitfinex 0.003522 0.035300
4 LTCBTC bitfinex 0.009300 0.095000
5 XRPBTC bitfinex 0.000083 0.000085
>>> dfBidOffers
symbol exchange … created spread
0 ETHBTC binance … 2018-09-01 08:15:00 -3.160000e-02
1 LTCBTC binance … 2018-09-01 08:15:00 -8.300000e-02
2 XRPBTC binance … 2018-09-01 08:15:00 -1.000000e-06
3 ETHBTC bitfinex … 2018-09-01 08:15:00 -3.177800e-02
4 LTCBTC bitfinex … 2018-09-01 08:15:00 -8.570000e-02
5 XRPBTC bitfinex … 2018-09-01 08:15:00 -2.000000e-06

UNION – it’s Pandas.concat (Note you can concat vertically or horizontally depending on “axis” specification)
“concat” vs “merge”?
https://www.tutorialspoint.com/python_pandas/python_pandas_concatenation.htm

Reindex:
https://pandas.pydata.org/pandas-docs/stable/advanced.html
https://chrisalbon.com/python/data_wrangling/pandas_dataframe_reindexing/

Example reindex columns (not rows):
import pandas as pd
import numpy as np

columns=[‘BucketLabel’,’price’,’QTY’,’BidOrAsk’]
pdBidsBinance = pd.DataFrame(columns=columns)
pdBidsBinance.loc[0] = [’40-41′, 40.38, 100, ‘BUY’]
pdBidsBinance.loc[1] = [’40-41′, 40.381, 200, ‘BUY’]
pdBidsBinance.loc[2] = [’40-41′, 40.51, 300, ‘BUY’]
pdBidsBinance.loc[3] = [’41-42′, 41.3, 150, ‘BUY’]
pdBidsBinance.loc[4] = [’41-42′, 41.51, 100, ‘BUY’]
pdBidsBinance.loc[5] = [’41-42′, 41.81, 200, ‘BUY’]
pdBidsBinance.loc[6] = [’42-43′, 42.78, 300, ‘BUY’]
pdBidsBinance.loc[7] = [’42-43′, 42.31, 200, ‘BUY’]
pdBidsBinance.loc[8] = [’42-43′, 42.88, 500, ‘BUY’]

pdBidsBinance = pdBidsBinance.reindex(columns=[“BucketLabel”, “price”, “QTY”]) # exclude “BidOrAsk” (or exclude other useless fields you dont need)
pdBidsBinance[“TradeConsideration”] = pdBidsBinance[“price”] * pdBidsBinance[“QTY”]
pdBidsBinance.set_index(“BucketLabel”, inplace=True)
pdBidsBinance.columns = pd.MultiIndex.from_product([[“Binance”],[“price”, “QTY”, “TradeConsideration”]])

pdAsksKraken = pd.DataFrame(columns=columns)
pdAsksKraken.loc[0] = [’40-41′, 40.28, 200, ‘SELL’]
pdAsksKraken.loc[1] = [’40-41′, 40.181, 200, ‘SELL’]
pdAsksKraken.loc[2] = [’40-41′, 40.31, 100, ‘SELL’]
pdAsksKraken.loc[3] = [’41-42′, 41.1, 500, ‘SELL’]
pdAsksKraken.loc[4] = [’41-42′, 41.21, 150, ‘SELL’]
pdAsksKraken.loc[5] = [’41-42′, 41.21, 700, ‘SELL’]
pdAsksKraken.loc[6] = [’42-43′, 42.68, 100, ‘SELL’]
pdAsksKraken.loc[7] = [’42-43′, 42.11, 200, ‘SELL’]
pdAsksKraken.loc[8] = [’42-43′, 42.3, 300, ‘SELL’]

pdAsksKraken = pdAsksKraken.reindex(columns=[“BucketLabel”, “price”, “QTY”]) # exclude “BidOrAsk” (or exclude other useless fields you dont need)
pdAsksKraken[“TradeConsideration”] = pdAsksKraken[“price”] * pdAsksKraken[“QTY”]
pdAsksKraken.set_index(“BucketLabel”, inplace=True)
pdAsksKraken.columns = pd.MultiIndex.from_product([[“Kraken”],[“price”, “QTY”, “TradeConsideration”]])

pdTrade = pd.concat([pdBidsBinance, pdAsksKraken], axis=1)

pdTrade[“Summary”, “Spread”] = pdTrade.loc[:,(“Binance”,”price”)] – pdTrade.loc[:,(“Kraken”,”price”)]

>>> pdTrade.head()
Binance Kraken Summary
price QTY TradeConsideration price QTY TradeConsideration Spread
BucketLabel
40-41 40.380 100 4038 40.280 200 8056 0.1
40-41 40.381 200 8076.2 40.181 200 8036.2 0.2
40-41 40.510 300 12153 40.310 100 4031 0.2
41-42 41.300 150 6195 41.100 500 20550 0.2
41-42 41.510 100 4151 41.210 150 6181.5 0.3
>>>

filtering
https://chrisalbon.com/python/data_wrangling/pandas_selecting_rows_on_conditions/
dfSummary = dfSummary[(dfSummary[‘created’]>datetime(2018, 9, 1, 8, 15)) & (dfSummary[‘exchange’] == “binance”)]

Alternatively, you can use “query” (Can even chain queries)
dfSummary = dfSummary.query(“‘2018-09-01 08:15:00′<created<=’2018-09-01 09:15:00’ & exchange==’binance'”)

CAUTION: You have to Bracket each condition.

lookup:
https://stackoverflow.com/questions/52583677/pandas-dataframe-lookup

slice loc/iloc
dfSummary.loc[:,[‘symbol’,’exchange’,’spread’]].itertuples()

iterations
for item in dfSummary.loc[:,[‘symbol’,’exchange’,’spread’]].itertuples():
# [0] 2 int
# [1] ‘ETHBTC’ str
# [2] ‘binance’ str
# [3] -0.031599999999999996 float
itemIndex = item[0]
symbol = item[1]
exchange = item[2]
spread = item[3]

aggregate/group by
groups = dfSummary.groupby([“symbol”, “exchange”])

for group in groups:
groupKey = group[0]
symbol = groupKey[0]
exchange = groupKey[1]
maxBid = group[1][“bid”].max()

Single aggregate measure:
dfSummary = dfSummary.groupby([“symbol”, “exchange”])[“spread”].max()

>>> dfSummary
symbol exchange
ETHBTC binance -3.160000e-02
bitfinex -3.177800e-02
LTCBTC binance -8.300000e-02
bitfinex -8.570000e-02
XRPBTC binance -1.000000e-06
bitfinex -2.000000e-06
Name: spread, dtype: float64

https://www.shanelynn.ie/summarising-aggregation-and-grouping-data-in-python-pandas/
Multiple aggregate measures:
dfSummary = dfSummary.groupby([“symbol”, “exchange”]).agg({“spread”: “max”, “bid”:”min”})
dfSummary.columns = [ “maxSpread”, “minBid”]
>>> dfSummary
maxSpread minBid
symbol exchange
ETHBTC binance -3.160000e-02 0.003500
bitfinex -3.177800e-02 0.003522
LTCBTC binance -8.300000e-02 0.009000
bitfinex -8.570000e-02 0.009300
XRPBTC binance -1.000000e-06 0.000077
bitfinex -2.000000e-06 0.000083
>>>

Pivot: pd.agg vs pd.pivot_table?
They are same, see example below. “dtPivot2” is same as “dfGrouped”
http://pbpython.com/pandas-pivot-table-explained.html
https://stackoverflow.com/questions/34702815/pandas-group-by-and-pivot-table-difference

dfOriginal = pd.DataFrame({“a”: [1,2,3,1,2,3], “b”:[1,1,1,2,2,2], “c”:np.random.rand(6)})
dfPivot = pd.pivot_table(dfOriginal, index=[“a”], columns=[“b”], values=[“c”], aggfunc=np.sum)
dfPivot2 = pd.pivot_table(dfOriginal, index=[“a”,”b”], values=[“c”], aggfunc=np.sum)
dfGrouped = dfOriginal.groupby([‘a’,’b’])[‘c’].sum()

>>> dfOriginal
a b c
0 1 1 0.486374
1 2 1 0.020761
2 3 1 0.980307
3 1 2 0.105447
4 2 2 0.026814
5 3 2 0.546601
>>> dfPivot
c
b 1 2
a
1 0.486374 0.105447
2 0.020761 0.026814
3 0.980307 0.546601
>>> dfPivot2
c
a b
1 1 0.486374
2 0.105447
2 1 0.020761
2 0.026814
3 1 0.980307
2 0.546601
>>> dfGrouped
a b
1 1 0.486374
2 0.105447
2 1 0.020761
2 0.026814
3 1 0.980307
2 0.546601
Name: c, dtype: float64
>>>

if-then logic:
https://pandas.pydata.org/pandas-docs/stable/cookbook.html#cookbook-selection
In [1]: df = pd.DataFrame(
…: {‘AAA’ : [4,5,6,7], ‘BBB’ : [10,20,30,40],’CCC’ : [100,50,-30,-50]}); df
…:
Out[1]:
AAA BBB CCC
0 4 10 100
1 5 20 50
2 6 30 -30
3 7 40 -50

An if-then with assignment to 2 columns:

In [3]: df.loc[df.AAA >= 5,[‘BBB’,’CCC’]] = 555; df
Out[3]:
AAA BBB CCC
0 4 10 100
1 5 555 555
2 6 555 555
3 7 555 555

Better yet, use WHERE to accomplish this:
https://chrisalbon.com/python/data_wrangling/pandas_create_column_using_conditional/

pdPnl[“DTDRealizedPnl”] = pdPnl[“InceptionRealizedPnl”] – np.where(np.isnan(pdPnl[“InceptionRealizedPnl_tm1”]), 0, pdPnl[“InceptionRealizedPnl_tm1”])
pdPnl[“DTDUnrealizedPnl”] = pdPnl[“InceptionUnrealizedPnl”] – np.where(np.isnan(pdPnl[“InceptionUnrealizedPnl_tm1”]), 0, pdPnl[“InceptionUnrealizedPnl_tm1”])

masking:
c1 = dfSummary[‘exchange’] == “binance”
c2 = dfSummary[‘created’] >= datetime(2018, 9, 1, 8, 30)
criteria = c1 | c2
maskedSummary = dfSummary.mask(criteria)

>>> maskedSummary
symbol exchange balance … offer created spread
2 NaN NaN NaN … NaN NaT NaN
1 NaN NaN NaN … NaN NaT NaN
0 NaN NaN NaN … NaN NaT NaN
11 NaN NaN NaN … NaN NaT NaN
10 NaN NaN NaN … NaN NaT NaN
9 ETHBTC bitfinex 10.0 … 0.035300 2018-09-01 08:15:00 -0.031778
5 NaN NaN NaN … NaN NaT NaN
4 NaN NaN NaN … NaN NaT NaN
3 NaN NaN NaN … NaN NaT NaN
14 NaN NaN NaN … NaN NaT NaN
13 NaN NaN NaN … NaN NaT NaN
12 LTCBTC bitfinex 10.0 … 0.095000 2018-09-01 08:15:00 -0.085700
8 NaN NaN NaN … NaN NaT NaN
7 NaN NaN NaN … NaN NaT NaN
6 NaN NaN NaN … NaN NaT NaN
17 NaN NaN NaN … NaN NaT NaN
16 NaN NaN NaN … NaN NaT NaN
15 XRPBTC bitfinex 10.0 … 0.000085 2018-09-01 08:15:00 -0.000002

None vs np.nan handling: Be careful differences between numpy and pandas
import pandas as pd
import numpy as np

nums1 = [0, 10, 20, 30, 40, 50]
nums2 = [np.nan, 10, 20, 30, 40, 50]
nums3 = [None, 10, 20, 30, 40, 50]
nums4 = [num for num in nums1 if num!=0]

npNums1 = np.asarray(nums1, dtype = float)
npNums2 = np.asarray(nums2, dtype = float)
npNums3 = np.asarray(nums3, dtype = float)
npNums4 = np.asarray(nums4, dtype = float)

mean1 = npNums1.mean() # 25
mean2 = npNums2.mean() # nan (One item np.nan)
mean3 = npNums3.mean() # nan (One item is None)
mean4 = npNums4.mean() # 30 (Nan item filtered)

data = {
“nums1” : nums1,
“nums2”: nums2,
“nums3”: nums3
}

pdNums = pd.DataFrame.from_records(data)
mean1 = pdNums[“nums1”].mean() # 25
mean2 = pdNums[“nums2”].mean() # 30 (One item np.nan)
mean3 = pdNums[“nums3”].mean() # 30 (One item None)

print(“done”)

Multi-index
https://pandas.pydata.org/pandas-docs/stable/advanced.html

arrays = [[‘bar’, ‘bar’, ‘baz’, ‘baz’, ‘foo’, ‘foo’, ‘qux’, ‘qux’],[‘one’, ‘two’, ‘one’, ‘two’, ‘one’, ‘two’, ‘one’, ‘two’]]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=[‘first’, ‘second’])
s = pd.Series(np.random.randn(8), index=index)

>> s
first second
bar one 0.469112
two -0.282863
baz one -1.509059
two -1.135632
foo one 1.212112
two -0.173215
qux one 0.119209
two -1.044236
dtype: float64

formula fields
https://pythonprogramming.net/pandas-column-operations-calculations/
https://www.tutorialspoint.com/python_pandas/python_pandas_function_application.htm
Row function: https://stackoverflow.com/questions/13331698/how-to-apply-a-function-to-two-columns-of-pandas-dataframe
pipe (Table level): http://jose-coto.com/pipes-with-pandas
data = [[1,1,1],[2,2,2],[3,3,3]]
df = pd.DataFrame(data,columns=[‘col1′,’col2′,’col3’])
# x = 3, add 3 to each cell
df = df.pipe(lambda cellVal, x : cellVal + x, 3)

apply (Per row): http://jonathansoma.com/lede/foundations/classes/pandas%20columns%20and%20functions/apply-a-function-to-every-row-in-a-pandas-dataframe/
data = [[1,1,1],[2,2,2],[3,3,3]]
df = pd.DataFrame(data,columns=[‘col1′,’col2′,’col3’])
df = df.apply(lambda cellVal : cellVal * cellVal)
df = df[‘col3’].apply(lambda cellVal : cellVal * cellVal)
>>> df
col1 col2 col3
0 1 1 1
1 2 2 2
2 3 3 3
>>> df
col1 col2 col3
0 1 1 1
1 4 4 4
2 9 9 9
>>> df
0 1
1 16
2 81

def rowTransform(row):
return (row[“col1”] + row[“col3”]) *3

data = [[1,1,1],[2,2,2],[3,3,3]]
df = pd.DataFrame(data,columns=[‘col1′,’col2′,’col3’])
df[“col4”] = df[“col1”] + df[“col2”]
df[“col5”] = df.apply(rowTransform, axis=1)

dt = datetime(2018, 9, 1)

applymap (Per cell):
df = pd.DataFrame(np.random.randn(5,3),columns=[‘col1′,’col2′,’col3′])
df.applymap(lambda x:x*100)
print df.apply(np.mean)

Create Pandas DataFrame from Object
Step 1. ObjectUtil
import inspect

def objectPropertiesToDictionary(o, excludeSystemMembers = True):
result = {}
members = inspect.getmembers(o)
for member in members:
key = member[0]
value = member[1]
if excludeSystemMembers:
if “__” not in key:
result[key] = value
else:
result[key] = value
return result

Step 2. Convert list of objects to dictionary using python reflection (i.e. inspect package)
pdPnl = pd.DataFrame.from_records([ObjectUtil.objectPropertiesToDictionary(pnl) for pnl in profitLosses], columns=profitLosses[0].to_dict().keys())

From Pandas DataFrame to Dictionary:
https://stackoverflow.com/questions/26716616/convert-a-pandas-dataframe-to-a-dictionary

vlookup
https://stackoverflow.com/questions/25935431/pandas-lookup-based-on-value

Vectorization:
https://stackoverflow.com/questions/13893227/vectorized-look-up-of-values-in-pandas-dataframe
https://engineering.upside.com/a-beginners-guide-to-optimizing-pandas-code-for-speed-c09ef2c6a4d6
https://www.datascience.com/blog/straightening-loops-how-to-vectorize-data-aggregation-with-pandas-and-numpy/
https://realpython.com/numpy-array-programming/
https://stackoverflow.com/questions/52564186/python-pandas-lookup-another-row-calculated-field
https://stackoverflow.com/questions/52583677/pandas-dataframe-lookup

Memory usage optimization:
https://www.dataquest.io/blog/pandas-big-data/
http://pbpython.com/pandas_dtypes.html

BIG EXAMPLE:

@staticmethod
def calculateAnalyticsFromProfitLossSeries(profitLosses):
if not profitLosses:
return

# Why Pandas? It’s much quicker than looping in Python:
# https://engineering.upside.com/a-beginners-guide-to-optimizing-pandas-code-for-speed-c09ef2c6a4d6
# https://stackoverflow.com/questions/52564186/python-pandas-lookup-another-row-calculated-field
# Thoughts on performance code below:
# 1) ObjectUtil.objectPropertiesToDictionary uses reflection (i.e. inspect), it’s slow – but unless you hard code fields in ProfitLoss.to_dict(), you can’t get around this.
# 2) pdPnl.to_dict() in the end, and transformation back into list(ProfitLoss) also cost to using Pandas (as supposed to simply looping over list in Python)

# ProfitLoss fields:
# Id,InstrumentId,TestId,COB,Balance,MarkPrice,AverageCost,InceptionRealizedPnl,InceptionUnrealizedPnl,
# DTDRealizedPnl,DTDUnrealizedPnl,MTDRealizedPnl,MTDUnrealizedPnl,YTDRealizedPnl,YTDUnrealizedPnl
# SharpeRatio,MaxDrawDown,Created,Updated
pdPnl = pd.DataFrame.from_records([pnl.to_dict() for pnl in profitLosses])
pdPnl = pdPnl.merge(pdPnl, how=’left’, left_on=[“TM1”], right_on=[“COB”], suffixes = (”,’_tm1′))
pdPnl = pdPnl.merge(pdPnl, how=’left’, left_on=[“MonthStart”], right_on=[“COB”], suffixes = (”,’_MonthStart’))
pdPnl = pdPnl.merge(pdPnl, how=’left’, left_on=[“QuarterStart”], right_on=[“COB”], suffixes = (”,’_QuaterStart’))
pdPnl = pdPnl.merge(pdPnl, how=’left’, left_on=[“YearStart”], right_on=[“COB”], suffixes = (”,’_YearStart’))

# Vectorized
# Note, if for Day one trading where there’s no TM1 records, handle this appropriately.
# pdPnl[“InceptionRealizedPnl_tm1”] = np.where(np.isnan(pdPnl[“InceptionRealizedPnl_tm1”]), 0, pdPnl[“InceptionRealizedPnl_tm1”])
# pdPnl[“InceptionUnrealizedPnl_tm1”] = np.where(np.isnan(pdPnl[“InceptionUnrealizedPnl_tm1”]), 0, pdPnl[“InceptionUnrealizedPnl_tm1”])
# pdPnl[“DTDRealizedPnl”] = pdPnl[“InceptionRealizedPnl”] – pdPnl[“InceptionRealizedPnl_tm1”]
# pdPnl[“DTDUnrealizedPnl”] = pdPnl[“InceptionUnrealizedPnl”] – pdPnl[“InceptionUnrealizedPnl_tm1”]
pdPnl[“DTDRealizedPnl”] = pdPnl[“InceptionRealizedPnl”] – np.where(np.isnan(pdPnl[“InceptionRealizedPnl_tm1”]), 0, pdPnl[“InceptionRealizedPnl_tm1”])
pdPnl[“DTDUnrealizedPnl”] = pdPnl[“InceptionUnrealizedPnl”] – np.where(np.isnan(pdPnl[“InceptionUnrealizedPnl_tm1”]), 0, pdPnl[“InceptionUnrealizedPnl_tm1”])
pdPnl[“TotalDTD”] = pdPnl[“DTDRealizedPnl”] + pdPnl[“DTDUnrealizedPnl”]

# Annualizing DTD return: https://financetrain.com/how-to-calculate-annualized-returns/
pdPnl[“PercentDTDReturn”] = (pdPnl[“TotalDTD”]/pdPnl[“Balance”]) * 100
pdPnl[“PercentDTDReturn_Annualized”] = ((pdPnl[“PercentDTDReturn”]/100 + 1) ** 365 – 1) * 100

pdPnl[“MTDRealizedPnl”] = pdPnl[“InceptionRealizedPnl”] – pdPnl[“InceptionRealizedPnl_MonthStart”]
pdPnl[“MTDUnrealizedPnl”] = pdPnl[“InceptionUnrealizedPnl”] – pdPnl[“InceptionUnrealizedPnl_MonthStart”]
pdPnl[“YTDRealizedPnl”] = pdPnl[“InceptionRealizedPnl”] – pdPnl[“InceptionRealizedPnl_YearStart”]
pdPnl[“YTDUnrealizedPnl”] = pdPnl[“InceptionUnrealizedPnl”] – pdPnl[“InceptionUnrealizedPnl_YearStart”]

# Not yet vectorized
pdPnl[“SharpeRatio”] = pdPnl.apply(lambda rw : PnlCalculatorBase.computeSharpeRatio(pdPnl, rw[“COB”]), axis=1)
pdPnl[“MaxDrawDown”] = pdPnl.apply(lambda rw : PnlCalculatorBase.computeMaxDrawDown(pdPnl, rw[“COB”]), axis=1)

pnlDict = pdPnl.to_dict()
updatedProfitLosses = ProfitLoss.ProfitLoss.from_dict(pnlDict)
return updatedProfitLosses

Helpers:
@staticmethod
def computeSharpeRatio(pdPnl, cob):
val = None

# Please read below “BEWARE difference in how Numpy and Pandas handle Nan or None!”
# Pandas would filter Nan/None under PercentDTDReturn_Annualized automatically when compute “mean”, numpy doesn’t.
# Note also “std” depends on “mean”!
# numpy, on the other hand, dont filter automatically. Please refer to PnlCalculatorTests.testCalculateAnalyticsFromProfitLossSeries (Look bottom)
# pdPnl = pdPnl[(pdPnl[‘COB’]<=cob)]
pdPnl = pdPnl[(pdPnl[‘COB’]<=cob) & (pdPnl[“PercentDTDReturn_Annualized”] is not None) & (pdPnl[“PercentDTDReturn_Annualized”] != np.nan)]
pdPnl = pdPnl.loc[:,[“COB”, “PercentDTDReturn_Annualized”]]

# @todo, We don’t have risk free rate for Sharpe Ration calc. Here’s just total DTD avg return over standard deviation
# https://en.wikipedia.org/wiki/Sharpe_ratio
mean = pdPnl[“PercentDTDReturn_Annualized”].mean()
std = pdPnl[“PercentDTDReturn_Annualized”].std()
val = mean / std

return val

@staticmethod
def computeMaxDrawDown(pdPnl, cob):
val = None
pdPnl = pdPnl[(pdPnl[‘COB’]<=cob) & (pdPnl[“DTDRealizedPnl”]<0)]
val = pdPnl[“DTDRealizedPnl”].min()
return val

ObjectUtil.py
import sys
import logging
from random import *
from datetime import date
from datetime import datetime
from datetime import timedelta
import inspect

def objectPropertiesToDictionary(o, excludeSystemMembers = True):
result = {}
members = inspect.getmembers(o)
for member in members:
key = member[0]
value = member[1]
if excludeSystemMembers:
if “__” not in key:
result[key] = value
else:
result[key] = value
return result

ProfitLoss.py
import datetime
import time
import math

import pandas as pd
import numpy as np

from Util import ObjectUtil

class ProfitLoss(object):
def set(self, field, val):
setattr(self, field, val)

@staticmethod
def from_dict(dict):
if dict is None:
return None

profitLosses = []
for k, v in dict.items():
numPnl = len(v)
for i in range(0, numPnl):
pnl = ProfitLoss()
profitLosses.append(pnl)
break

for k, v in dict.items():
if k == “from_dict”:
break

i = 0
for val in v.values():
if isinstance(val, pd.Timestamp):
val = datetime.datetime(val.year, val.month, val.day)

val = None if val == np.nan else val

if isinstance(val, float) and math.isnan(val):
val = None

profitLosses[i].set(k, val)
i+=1

return profitLosses

Unit testing:

def testCalculateAnalyticsFromProfitLossSeries(self):
# Clean table firsst, or AssertEqual wont work
self.dao.deleteProfitLoss(self.source, self.symbol, cob = None, testId = None)

instrumentId = self.dao.getInstrumentId(self.source, self.symbol)

NUM_DAYS_HISTORY = 375 # over one year
NUM_DUMMY_TRADES = 5

history_end = datetime.today().replace(hour=0, minute=0, second=0, microsecond=0)
history_start = history_end – timedelta(days=NUM_DAYS_HISTORY)
num_days_history = (history_end – history_start).days
num_days_history += 1
history_dates = [history_end.date() – timedelta(days=x) for x in range(0, num_days_history)]
history_dates = list(reversed(history_dates))

markPrice = 0.035
i = 0
for histCob in history_dates:
# If no buy+sell, then you wouldnt get “RealizedPnl”
dummyTrades = MarketDataUtil.generateDummyTrades(NUM_DUMMY_TRADES, self.source, self.symbol, histCob, self.strategyId, self.instrumentId, self.tradeQuantities, self.tradePrices, testId = None, randomBuySell = True)

pnlTuple = PnlCalculatorBase.PnlCalculatorBase.replayPnlCore(histCob, dummyTrades, markPrice)
if pnlTuple is not None:
pnl = PnlCalculatorBase.PnlCalculatorBase.pnlTupleToProfitLoss(pnlTuple)
pnl.COB = histCob
pnl.InstrumentId = instrumentId
pnl.TestId = None

self.dao.persistProfitLoss(histCob, self.source, self.symbol, pnl, testId = None)
markPrice += 0.0001
i+=1

profitLosses = self.dao.fetchProfitLoss(self.source, self.symbol, testId = None)

self.assertEqual(len(profitLosses), NUM_DAYS_HISTORY +1)

updatedProfitLosses = PnlCalculatorBase.PnlCalculatorBase.calculateAnalyticsFromProfitLossSeries(profitLosses)
for updatedPnl in updatedProfitLosses:
self.dao.persistProfitLoss(updatedPnl.COB, self.source, self.symbol, updatedPnl, testId = None)

# manual checking – DTD/MTD/YTD pnl
lastPnl = updatedProfitLosses[len(updatedProfitLosses)-1]
TM1 = lastPnl.COB – timedelta(days=1)
MonthStart = lastPnl.COB.replace(day=1)
YearStart = datetime(lastPnl.COB.year, 1, 1)

tm1Pnl = list(filter(lambda pnl : pnl.COB == TM1, updatedProfitLosses))[0]
monthStartPnl = list(filter(lambda pnl : pnl.COB == MonthStart, updatedProfitLosses))[0]
yearStartPnl = list(filter(lambda pnl : pnl.COB == YearStart, updatedProfitLosses))[0]

expectedDTDRealizedPnl = lastPnl.InceptionRealizedPnl – tm1Pnl.InceptionRealizedPnl
expectedDTDUnrealizedPnl = lastPnl.InceptionUnrealizedPnl – tm1Pnl.InceptionUnrealizedPnl
expectedMTDRealizedPnl = lastPnl.InceptionRealizedPnl – monthStartPnl.InceptionRealizedPnl
expectedMTDUnrealizedPnl = lastPnl.InceptionUnrealizedPnl – monthStartPnl.InceptionUnrealizedPnl
expectedYTDRealizedPnl = lastPnl.InceptionRealizedPnl – yearStartPnl.InceptionRealizedPnl
expectedYTDUnrealizedPnl = lastPnl.InceptionUnrealizedPnl – yearStartPnl.InceptionUnrealizedPnl

self.assertAlmostEqual(lastPnl.DTDRealizedPnl, expectedDTDRealizedPnl, 8)
self.assertAlmostEqual(lastPnl.DTDUnrealizedPnl, expectedDTDUnrealizedPnl, 8)
self.assertAlmostEqual(lastPnl.MTDRealizedPnl, expectedMTDRealizedPnl, 8)
self.assertAlmostEqual(lastPnl.MTDUnrealizedPnl, expectedMTDUnrealizedPnl, 8)
self.assertAlmostEqual(lastPnl.YTDRealizedPnl, expectedYTDRealizedPnl, 8)
self.assertAlmostEqual(lastPnl.YTDUnrealizedPnl, expectedYTDUnrealizedPnl, 8)

# manual checking – MaxDrawdown
maxDrawdown = 0
for updatedPnl in updatedProfitLosses:
if updatedPnl.DTDRealizedPnl is not None and updatedPnl.DTDRealizedPnl < maxDrawdown:
maxDrawdown = updatedPnl.DTDRealizedPnl
self.assertAlmostEqual(lastPnl.MaxDrawDown, maxDrawdown, 8)

# manual checking – Sharpe
DTDTotalPnlEntries = list(map(lambda pnl : (pnl.DTDRealizedPnl if pnl.DTDRealizedPnl is not None else 0) + (pnl.DTDUnrealizedPnl if pnl.DTDUnrealizedPnl is not None else 0), updatedProfitLosses))
npDTDTotalPnlEntries = np.asarray(DTDTotalPnlEntries, dtype = float)
mean = npDTDTotalPnlEntries.mean()
std = npDTDTotalPnlEntries.std()
expectedSharpeRatio = mean / std

self.assertAlmostEqual(lastPnl.SharpeRatio, expectedSharpeRatio, 7)

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s