Month: October 2018

Interview Prep: Pandas

https://pandas.pydata.org/pandas-docs/stable/cookbook.html#cookbook-selection
https://gridwizard.wordpress.com/2018/10/09/interview-prep-pandas/

pandas DataFrame
Creating a DataFrame, and some rows:
Example 1. No index (Or default index = 0,1,2,3..etc)
columns=[‘Date’,’direction’,’size’,’ticker’,’tradePrices’]
orders = pd.DataFrame(columns=columns)
orders.loc[0] = [‘2011-01-10′,’Buy’,1500,’AAPL’,339.44]
orders.loc[1] = [‘2011-01-13′,’Sell’,1500,’AAPL’,342.64]
orders.loc[2] = [‘2011-01-13′,’Buy’,4000,’IBM’,143.92]
orders.loc[3] = [‘2011-01-26′,’Buy’,1000,’GOOG’,616.50]
orders.loc[4] = [‘2011-02-02′,’Sell’,4000,’XOM’,79.46]
orders.loc[5] = [‘2011-02-10′,’Buy’,4000,’XOM’,79.68]
orders.loc[6] = [‘2011-03-03′,’Sell’,1000,’GOOG’,609.56]
orders.loc[7] = [‘2011-03-03′,’Sell’,2200,’IBM’,158.73]
orders.loc[8] = [‘2011-06-03′,’Sell’,3300,’IBM’,160.97]
orders.loc[9] = [‘2011-05-03′,’Buy’,1500,’IBM’,167.84]
orders.loc[10] = [‘2011-06-10′,’Buy’,1200,’AAPL’,323.03]
orders.loc[11] = [‘2011-08-01′,’Buy’,55,’GOOG’,606.77]
orders.loc[12] = [‘2011-08-01′,’Sell’,55,’GOOG’,606.77]
orders.loc[13] = [‘2011-12-20′,’Sell’,1200,’AAPL’,392.46]

To set a value of a particular cell:
orders.iat[0,1] = “Sell”

Or:
orders.at[0,”direction”] = “Buy”

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.iat.html
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.at.html

Example 2. Index = dates
columns=[‘AAPL’,’GOOG’,’IBM’,’XOM’]
index = [‘2011-01-10′,’2011-01-13′,’2011-01-26′,’2011-02-02′,’2011-02-10′,’2011-03-03′,’2011-05-03′,’2011-06-03′,’2011-06-10′,’2011-08-01′,’2011-12-20’]
prices = pd.DataFrame(columns=columns, index=index)
prices.iloc[0]=[339.441,614.219,142.781,71.571]
prices.iloc[1]=[342.642,616.698,143.922,73.083]
prices.iloc[2]=[340.823,616.507,155.743,75.895]
prices.iloc[3]=[341.294,612.006,157.934,79.467]
prices.iloc[4]=[351.425,616.445,159.325,79.689]
prices.iloc[5]=[356.406,609.564,158.736,82.192]
prices.iloc[6]=[345.147,533.893,167.847,82.004]
prices.iloc[7]=[340.428,523.082,160.978,78.196]
prices.iloc[8]=[323.039,509.511,159.149,76.848]
prices.iloc[9]=[393.261,606.779,176.281,76.671]
prices.iloc[10]=[392.462,630.378,184.142,79.973]

orders[‘streetPrices’] = prices.lookup(orders.Date, orders.ticker)

Basic example, showing:
a) initialize DataFrame
b) Calculated field “spread” = bid – offer
c) Merge Join DataFrames
d) Sort multiple fields

exchangeBalances = [
[‘ETHBTC’,’binance’,10],
[‘LTCBTC’,’binance’,10],
[‘XRPBTC’,’binance’,10],
[‘ETHBTC’,’bitfinex’,10],
[‘LTCBTC’,’bitfinex’,10],
[‘XRPBTC’,’bitfinex’,10]
]
bidOffers = [
[‘ETHBTC’,’binance’, 0.0035, 0.0351, datetime(2018, 9, 1, 8, 15)], [‘LTCBTC’,’binance’,0.009,0.092, datetime(2018, 9, 1, 8, 15)], [‘XRPBTC’,’binance’,0.000077, 0.000078, datetime(2018, 9, 1, 8, 15)], [‘ETHBTC’,’bitfinex’, 0.003522, 0.0353, datetime(2018, 9, 1, 8, 15)], [‘LTCBTC’,’bitfinex’,0.0093,0.095, datetime(2018, 9, 1, 8, 15)], [‘XRPBTC’,’bitfinex’,0.000083, 0.000085, datetime(2018, 9, 1, 8, 15)],
[‘ETHBTC’,’binance’, 0.0035, 0.0351, datetime(2018, 9, 1, 8, 30)], [‘LTCBTC’,’binance’,0.009,0.092, datetime(2018, 9, 1, 8, 30)], [‘XRPBTC’,’binance’,0.000077, 0.000078, datetime(2018, 9, 1, 8, 30)], [‘ETHBTC’,’bitfinex’, 0.003522, 0.0353, datetime(2018, 9, 1, 8, 30)], [‘LTCBTC’,’bitfinex’,0.0093,0.095, datetime(2018, 9, 1, 8, 30)], [‘XRPBTC’,’bitfinex’,0.000083, 0.000085, datetime(2018, 9, 1, 8, 30)],
[‘ETHBTC’,’binance’, 0.0035, 0.0351, datetime(2018, 9, 1, 8, 45)], [‘LTCBTC’,’binance’,0.009,0.092, datetime(2018, 9, 1, 8, 45)], [‘XRPBTC’,’binance’,0.000077, 0.000078, datetime(2018, 9, 1, 8, 45)], [‘ETHBTC’,’bitfinex’, 0.003522, 0.0353, datetime(2018, 9, 1, 8, 45)], [‘LTCBTC’,’bitfinex’,0.0093,0.095, datetime(2018, 9, 1, 8, 45)], [‘XRPBTC’,’bitfinex’,0.000083, 0.000085, datetime(2018, 9, 1, 8, 45)]
]
dfExchangeBalances = pd.DataFrame(exchangeBalances, columns=[‘symbol’,’exchange’,’balance’])
dfBidOffers = pd.DataFrame(bidOffers, columns=[‘ticker’,’exchange’,’bid’, ‘offer’, ‘created’])
dfBidOffers[“spread”] = dfBidOffers[“bid”] – dfBidOffers[“offer”]
dfSummary = dfExchangeBalances.merge(dfBidOffers, how=’left’, left_on=[‘symbol’,’exchange’], right_on=[‘ticker’,’exchange’])
dfSummary = dfSummary.sort_values(by=[‘symbol’,’exchange’,’created’], ascending=[True, True, False])

>>> dfExchangeBalances
symbol exchange balance
0 ETHBTC binance 10
1 LTCBTC binance 10
2 XRPBTC binance 10
3 ETHBTC bitfinex 10
4 LTCBTC bitfinex 10
5 XRPBTC bitfinex 10
>>> dfBidOffers
symbol exchange bid offer
0 ETHBTC binance 0.003500 0.035100
1 LTCBTC binance 0.009000 0.092000
2 XRPBTC binance 0.000080 0.000078
3 ETHBTC bitfinex 0.003522 0.035300
4 LTCBTC bitfinex 0.009300 0.095000
5 XRPBTC bitfinex 0.000083 0.000085
>>> dfBidOffers
symbol exchange … created spread
0 ETHBTC binance … 2018-09-01 08:15:00 -3.160000e-02
1 LTCBTC binance … 2018-09-01 08:15:00 -8.300000e-02
2 XRPBTC binance … 2018-09-01 08:15:00 -1.000000e-06
3 ETHBTC bitfinex … 2018-09-01 08:15:00 -3.177800e-02
4 LTCBTC bitfinex … 2018-09-01 08:15:00 -8.570000e-02
5 XRPBTC bitfinex … 2018-09-01 08:15:00 -2.000000e-06

UNION – it’s Pandas.concat (Note you can concat vertically or horizontally depending on “axis” specification)
“concat” vs “merge”?
https://www.tutorialspoint.com/python_pandas/python_pandas_concatenation.htm

Reindex:
https://pandas.pydata.org/pandas-docs/stable/advanced.html
https://chrisalbon.com/python/data_wrangling/pandas_dataframe_reindexing/

Example reindex columns (not rows):
import pandas as pd
import numpy as np

columns=[‘BucketLabel’,’price’,’QTY’,’BidOrAsk’]
pdBidsBinance = pd.DataFrame(columns=columns)
pdBidsBinance.loc[0] = [’40-41′, 40.38, 100, ‘BUY’]
pdBidsBinance.loc[1] = [’40-41′, 40.381, 200, ‘BUY’]
pdBidsBinance.loc[2] = [’40-41′, 40.51, 300, ‘BUY’]
pdBidsBinance.loc[3] = [’41-42′, 41.3, 150, ‘BUY’]
pdBidsBinance.loc[4] = [’41-42′, 41.51, 100, ‘BUY’]
pdBidsBinance.loc[5] = [’41-42′, 41.81, 200, ‘BUY’]
pdBidsBinance.loc[6] = [’42-43′, 42.78, 300, ‘BUY’]
pdBidsBinance.loc[7] = [’42-43′, 42.31, 200, ‘BUY’]
pdBidsBinance.loc[8] = [’42-43′, 42.88, 500, ‘BUY’]

pdBidsBinance = pdBidsBinance.reindex(columns=[“BucketLabel”, “price”, “QTY”]) # exclude “BidOrAsk” (or exclude other useless fields you dont need)
pdBidsBinance[“TradeConsideration”] = pdBidsBinance[“price”] * pdBidsBinance[“QTY”]
pdBidsBinance.set_index(“BucketLabel”, inplace=True)
pdBidsBinance.columns = pd.MultiIndex.from_product([[“Binance”],[“price”, “QTY”, “TradeConsideration”]])

pdAsksKraken = pd.DataFrame(columns=columns)
pdAsksKraken.loc[0] = [’40-41′, 40.28, 200, ‘SELL’]
pdAsksKraken.loc[1] = [’40-41′, 40.181, 200, ‘SELL’]
pdAsksKraken.loc[2] = [’40-41′, 40.31, 100, ‘SELL’]
pdAsksKraken.loc[3] = [’41-42′, 41.1, 500, ‘SELL’]
pdAsksKraken.loc[4] = [’41-42′, 41.21, 150, ‘SELL’]
pdAsksKraken.loc[5] = [’41-42′, 41.21, 700, ‘SELL’]
pdAsksKraken.loc[6] = [’42-43′, 42.68, 100, ‘SELL’]
pdAsksKraken.loc[7] = [’42-43′, 42.11, 200, ‘SELL’]
pdAsksKraken.loc[8] = [’42-43′, 42.3, 300, ‘SELL’]

pdAsksKraken = pdAsksKraken.reindex(columns=[“BucketLabel”, “price”, “QTY”]) # exclude “BidOrAsk” (or exclude other useless fields you dont need)
pdAsksKraken[“TradeConsideration”] = pdAsksKraken[“price”] * pdAsksKraken[“QTY”]
pdAsksKraken.set_index(“BucketLabel”, inplace=True)
pdAsksKraken.columns = pd.MultiIndex.from_product([[“Kraken”],[“price”, “QTY”, “TradeConsideration”]])

pdTrade = pd.concat([pdBidsBinance, pdAsksKraken], axis=1)

pdTrade[“Summary”, “Spread”] = pdTrade.loc[:,(“Binance”,”price”)] – pdTrade.loc[:,(“Kraken”,”price”)]

>>> pdTrade.head()
Binance Kraken Summary
price QTY TradeConsideration price QTY TradeConsideration Spread
BucketLabel
40-41 40.380 100 4038 40.280 200 8056 0.1
40-41 40.381 200 8076.2 40.181 200 8036.2 0.2
40-41 40.510 300 12153 40.310 100 4031 0.2
41-42 41.300 150 6195 41.100 500 20550 0.2
41-42 41.510 100 4151 41.210 150 6181.5 0.3
>>>

filtering
https://chrisalbon.com/python/data_wrangling/pandas_selecting_rows_on_conditions/
dfSummary = dfSummary[(dfSummary[‘created’]>datetime(2018, 9, 1, 8, 15)) & (dfSummary[‘exchange’] == “binance”)]

Alternatively, you can use “query” (Can even chain queries)
dfSummary = dfSummary.query(“‘2018-09-01 08:15:00′<created<=’2018-09-01 09:15:00’ & exchange==’binance'”)

CAUTION: You have to Bracket each condition.

lookup:
https://stackoverflow.com/questions/52583677/pandas-dataframe-lookup

slice loc/iloc
dfSummary.loc[:,[‘symbol’,’exchange’,’spread’]].itertuples()

iterations
for item in dfSummary.loc[:,[‘symbol’,’exchange’,’spread’]].itertuples():
# [0] 2 int
# [1] ‘ETHBTC’ str
# [2] ‘binance’ str
# [3] -0.031599999999999996 float
itemIndex = item[0]
symbol = item[1]
exchange = item[2]
spread = item[3]

aggregate/group by
groups = dfSummary.groupby([“symbol”, “exchange”])

for group in groups:
groupKey = group[0]
symbol = groupKey[0]
exchange = groupKey[1]
maxBid = group[1][“bid”].max()

Single aggregate measure:
dfSummary = dfSummary.groupby([“symbol”, “exchange”])[“spread”].max()

>>> dfSummary
symbol exchange
ETHBTC binance -3.160000e-02
bitfinex -3.177800e-02
LTCBTC binance -8.300000e-02
bitfinex -8.570000e-02
XRPBTC binance -1.000000e-06
bitfinex -2.000000e-06
Name: spread, dtype: float64

https://www.shanelynn.ie/summarising-aggregation-and-grouping-data-in-python-pandas/
Multiple aggregate measures:
dfSummary = dfSummary.groupby([“symbol”, “exchange”]).agg({“spread”: “max”, “bid”:”min”})
dfSummary.columns = [ “maxSpread”, “minBid”]
>>> dfSummary
maxSpread minBid
symbol exchange
ETHBTC binance -3.160000e-02 0.003500
bitfinex -3.177800e-02 0.003522
LTCBTC binance -8.300000e-02 0.009000
bitfinex -8.570000e-02 0.009300
XRPBTC binance -1.000000e-06 0.000077
bitfinex -2.000000e-06 0.000083
>>>

Pivot: pd.agg vs pd.pivot_table?
They are same, see example below. “dtPivot2” is same as “dfGrouped”
http://pbpython.com/pandas-pivot-table-explained.html
https://stackoverflow.com/questions/34702815/pandas-group-by-and-pivot-table-difference

dfOriginal = pd.DataFrame({“a”: [1,2,3,1,2,3], “b”:[1,1,1,2,2,2], “c”:np.random.rand(6)})
dfPivot = pd.pivot_table(dfOriginal, index=[“a”], columns=[“b”], values=[“c”], aggfunc=np.sum)
dfPivot2 = pd.pivot_table(dfOriginal, index=[“a”,”b”], values=[“c”], aggfunc=np.sum)
dfGrouped = dfOriginal.groupby([‘a’,’b’])[‘c’].sum()

>>> dfOriginal
a b c
0 1 1 0.486374
1 2 1 0.020761
2 3 1 0.980307
3 1 2 0.105447
4 2 2 0.026814
5 3 2 0.546601
>>> dfPivot
c
b 1 2
a
1 0.486374 0.105447
2 0.020761 0.026814
3 0.980307 0.546601
>>> dfPivot2
c
a b
1 1 0.486374
2 0.105447
2 1 0.020761
2 0.026814
3 1 0.980307
2 0.546601
>>> dfGrouped
a b
1 1 0.486374
2 0.105447
2 1 0.020761
2 0.026814
3 1 0.980307
2 0.546601
Name: c, dtype: float64
>>>

if-then logic:
https://pandas.pydata.org/pandas-docs/stable/cookbook.html#cookbook-selection
In [1]: df = pd.DataFrame(
…: {‘AAA’ : [4,5,6,7], ‘BBB’ : [10,20,30,40],’CCC’ : [100,50,-30,-50]}); df
…:
Out[1]:
AAA BBB CCC
0 4 10 100
1 5 20 50
2 6 30 -30
3 7 40 -50

An if-then with assignment to 2 columns:

In [3]: df.loc[df.AAA >= 5,[‘BBB’,’CCC’]] = 555; df
Out[3]:
AAA BBB CCC
0 4 10 100
1 5 555 555
2 6 555 555
3 7 555 555

Better yet, use WHERE to accomplish this:
https://chrisalbon.com/python/data_wrangling/pandas_create_column_using_conditional/

pdPnl[“DTDRealizedPnl”] = pdPnl[“InceptionRealizedPnl”] – np.where(np.isnan(pdPnl[“InceptionRealizedPnl_tm1”]), 0, pdPnl[“InceptionRealizedPnl_tm1”])
pdPnl[“DTDUnrealizedPnl”] = pdPnl[“InceptionUnrealizedPnl”] – np.where(np.isnan(pdPnl[“InceptionUnrealizedPnl_tm1”]), 0, pdPnl[“InceptionUnrealizedPnl_tm1”])

masking:
c1 = dfSummary[‘exchange’] == “binance”
c2 = dfSummary[‘created’] >= datetime(2018, 9, 1, 8, 30)
criteria = c1 | c2
maskedSummary = dfSummary.mask(criteria)

>>> maskedSummary
symbol exchange balance … offer created spread
2 NaN NaN NaN … NaN NaT NaN
1 NaN NaN NaN … NaN NaT NaN
0 NaN NaN NaN … NaN NaT NaN
11 NaN NaN NaN … NaN NaT NaN
10 NaN NaN NaN … NaN NaT NaN
9 ETHBTC bitfinex 10.0 … 0.035300 2018-09-01 08:15:00 -0.031778
5 NaN NaN NaN … NaN NaT NaN
4 NaN NaN NaN … NaN NaT NaN
3 NaN NaN NaN … NaN NaT NaN
14 NaN NaN NaN … NaN NaT NaN
13 NaN NaN NaN … NaN NaT NaN
12 LTCBTC bitfinex 10.0 … 0.095000 2018-09-01 08:15:00 -0.085700
8 NaN NaN NaN … NaN NaT NaN
7 NaN NaN NaN … NaN NaT NaN
6 NaN NaN NaN … NaN NaT NaN
17 NaN NaN NaN … NaN NaT NaN
16 NaN NaN NaN … NaN NaT NaN
15 XRPBTC bitfinex 10.0 … 0.000085 2018-09-01 08:15:00 -0.000002

None vs np.nan handling: Be careful differences between numpy and pandas
import pandas as pd
import numpy as np

nums1 = [0, 10, 20, 30, 40, 50]
nums2 = [np.nan, 10, 20, 30, 40, 50]
nums3 = [None, 10, 20, 30, 40, 50]
nums4 = [num for num in nums1 if num!=0]

npNums1 = np.asarray(nums1, dtype = float)
npNums2 = np.asarray(nums2, dtype = float)
npNums3 = np.asarray(nums3, dtype = float)
npNums4 = np.asarray(nums4, dtype = float)

mean1 = npNums1.mean() # 25
mean2 = npNums2.mean() # nan (One item np.nan)
mean3 = npNums3.mean() # nan (One item is None)
mean4 = npNums4.mean() # 30 (Nan item filtered)

data = {
“nums1” : nums1,
“nums2”: nums2,
“nums3”: nums3
}

pdNums = pd.DataFrame.from_records(data)
mean1 = pdNums[“nums1”].mean() # 25
mean2 = pdNums[“nums2”].mean() # 30 (One item np.nan)
mean3 = pdNums[“nums3”].mean() # 30 (One item None)

print(“done”)

Multi-index
https://pandas.pydata.org/pandas-docs/stable/advanced.html

arrays = [[‘bar’, ‘bar’, ‘baz’, ‘baz’, ‘foo’, ‘foo’, ‘qux’, ‘qux’],[‘one’, ‘two’, ‘one’, ‘two’, ‘one’, ‘two’, ‘one’, ‘two’]]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=[‘first’, ‘second’])
s = pd.Series(np.random.randn(8), index=index)

>> s
first second
bar one 0.469112
two -0.282863
baz one -1.509059
two -1.135632
foo one 1.212112
two -0.173215
qux one 0.119209
two -1.044236
dtype: float64

formula fields
https://pythonprogramming.net/pandas-column-operations-calculations/
https://www.tutorialspoint.com/python_pandas/python_pandas_function_application.htm
Row function: https://stackoverflow.com/questions/13331698/how-to-apply-a-function-to-two-columns-of-pandas-dataframe
pipe (Table level): http://jose-coto.com/pipes-with-pandas
data = [[1,1,1],[2,2,2],[3,3,3]]
df = pd.DataFrame(data,columns=[‘col1′,’col2′,’col3’])
# x = 3, add 3 to each cell
df = df.pipe(lambda cellVal, x : cellVal + x, 3)

apply (Per row): http://jonathansoma.com/lede/foundations/classes/pandas%20columns%20and%20functions/apply-a-function-to-every-row-in-a-pandas-dataframe/
data = [[1,1,1],[2,2,2],[3,3,3]]
df = pd.DataFrame(data,columns=[‘col1′,’col2′,’col3’])
df = df.apply(lambda cellVal : cellVal * cellVal)
df = df[‘col3’].apply(lambda cellVal : cellVal * cellVal)
>>> df
col1 col2 col3
0 1 1 1
1 2 2 2
2 3 3 3
>>> df
col1 col2 col3
0 1 1 1
1 4 4 4
2 9 9 9
>>> df
0 1
1 16
2 81

def rowTransform(row):
return (row[“col1”] + row[“col3”]) *3

data = [[1,1,1],[2,2,2],[3,3,3]]
df = pd.DataFrame(data,columns=[‘col1′,’col2′,’col3’])
df[“col4”] = df[“col1”] + df[“col2”]
df[“col5”] = df.apply(rowTransform, axis=1)

dt = datetime(2018, 9, 1)

applymap (Per cell):
df = pd.DataFrame(np.random.randn(5,3),columns=[‘col1′,’col2′,’col3′])
df.applymap(lambda x:x*100)
print df.apply(np.mean)

Create Pandas DataFrame from Object
Step 1. ObjectUtil
import inspect

def objectPropertiesToDictionary(o, excludeSystemMembers = True):
result = {}
members = inspect.getmembers(o)
for member in members:
key = member[0]
value = member[1]
if excludeSystemMembers:
if “__” not in key:
result[key] = value
else:
result[key] = value
return result

Step 2. Convert list of objects to dictionary using python reflection (i.e. inspect package)
pdPnl = pd.DataFrame.from_records([ObjectUtil.objectPropertiesToDictionary(pnl) for pnl in profitLosses], columns=profitLosses[0].to_dict().keys())

From Pandas DataFrame to Dictionary:
https://stackoverflow.com/questions/26716616/convert-a-pandas-dataframe-to-a-dictionary

vlookup
https://stackoverflow.com/questions/25935431/pandas-lookup-based-on-value

Vectorization:
https://stackoverflow.com/questions/13893227/vectorized-look-up-of-values-in-pandas-dataframe
https://engineering.upside.com/a-beginners-guide-to-optimizing-pandas-code-for-speed-c09ef2c6a4d6
https://www.datascience.com/blog/straightening-loops-how-to-vectorize-data-aggregation-with-pandas-and-numpy/
https://realpython.com/numpy-array-programming/
https://stackoverflow.com/questions/52564186/python-pandas-lookup-another-row-calculated-field
https://stackoverflow.com/questions/52583677/pandas-dataframe-lookup

Memory usage optimization:
https://www.dataquest.io/blog/pandas-big-data/
http://pbpython.com/pandas_dtypes.html

BIG EXAMPLE:

@staticmethod
def calculateAnalyticsFromProfitLossSeries(profitLosses):
if not profitLosses:
return

# Why Pandas? It’s much quicker than looping in Python:
# https://engineering.upside.com/a-beginners-guide-to-optimizing-pandas-code-for-speed-c09ef2c6a4d6
# https://stackoverflow.com/questions/52564186/python-pandas-lookup-another-row-calculated-field
# Thoughts on performance code below:
# 1) ObjectUtil.objectPropertiesToDictionary uses reflection (i.e. inspect), it’s slow – but unless you hard code fields in ProfitLoss.to_dict(), you can’t get around this.
# 2) pdPnl.to_dict() in the end, and transformation back into list(ProfitLoss) also cost to using Pandas (as supposed to simply looping over list in Python)

# ProfitLoss fields:
# Id,InstrumentId,TestId,COB,Balance,MarkPrice,AverageCost,InceptionRealizedPnl,InceptionUnrealizedPnl,
# DTDRealizedPnl,DTDUnrealizedPnl,MTDRealizedPnl,MTDUnrealizedPnl,YTDRealizedPnl,YTDUnrealizedPnl
# SharpeRatio,MaxDrawDown,Created,Updated
pdPnl = pd.DataFrame.from_records([pnl.to_dict() for pnl in profitLosses])
pdPnl = pdPnl.merge(pdPnl, how=’left’, left_on=[“TM1”], right_on=[“COB”], suffixes = (”,’_tm1′))
pdPnl = pdPnl.merge(pdPnl, how=’left’, left_on=[“MonthStart”], right_on=[“COB”], suffixes = (”,’_MonthStart’))
pdPnl = pdPnl.merge(pdPnl, how=’left’, left_on=[“QuarterStart”], right_on=[“COB”], suffixes = (”,’_QuaterStart’))
pdPnl = pdPnl.merge(pdPnl, how=’left’, left_on=[“YearStart”], right_on=[“COB”], suffixes = (”,’_YearStart’))

# Vectorized
# Note, if for Day one trading where there’s no TM1 records, handle this appropriately.
# pdPnl[“InceptionRealizedPnl_tm1”] = np.where(np.isnan(pdPnl[“InceptionRealizedPnl_tm1”]), 0, pdPnl[“InceptionRealizedPnl_tm1”])
# pdPnl[“InceptionUnrealizedPnl_tm1”] = np.where(np.isnan(pdPnl[“InceptionUnrealizedPnl_tm1”]), 0, pdPnl[“InceptionUnrealizedPnl_tm1”])
# pdPnl[“DTDRealizedPnl”] = pdPnl[“InceptionRealizedPnl”] – pdPnl[“InceptionRealizedPnl_tm1”]
# pdPnl[“DTDUnrealizedPnl”] = pdPnl[“InceptionUnrealizedPnl”] – pdPnl[“InceptionUnrealizedPnl_tm1”]
pdPnl[“DTDRealizedPnl”] = pdPnl[“InceptionRealizedPnl”] – np.where(np.isnan(pdPnl[“InceptionRealizedPnl_tm1”]), 0, pdPnl[“InceptionRealizedPnl_tm1”])
pdPnl[“DTDUnrealizedPnl”] = pdPnl[“InceptionUnrealizedPnl”] – np.where(np.isnan(pdPnl[“InceptionUnrealizedPnl_tm1”]), 0, pdPnl[“InceptionUnrealizedPnl_tm1”])
pdPnl[“TotalDTD”] = pdPnl[“DTDRealizedPnl”] + pdPnl[“DTDUnrealizedPnl”]

# Annualizing DTD return: https://financetrain.com/how-to-calculate-annualized-returns/
pdPnl[“PercentDTDReturn”] = (pdPnl[“TotalDTD”]/pdPnl[“Balance”]) * 100
pdPnl[“PercentDTDReturn_Annualized”] = ((pdPnl[“PercentDTDReturn”]/100 + 1) ** 365 – 1) * 100

pdPnl[“MTDRealizedPnl”] = pdPnl[“InceptionRealizedPnl”] – pdPnl[“InceptionRealizedPnl_MonthStart”]
pdPnl[“MTDUnrealizedPnl”] = pdPnl[“InceptionUnrealizedPnl”] – pdPnl[“InceptionUnrealizedPnl_MonthStart”]
pdPnl[“YTDRealizedPnl”] = pdPnl[“InceptionRealizedPnl”] – pdPnl[“InceptionRealizedPnl_YearStart”]
pdPnl[“YTDUnrealizedPnl”] = pdPnl[“InceptionUnrealizedPnl”] – pdPnl[“InceptionUnrealizedPnl_YearStart”]

# Not yet vectorized
pdPnl[“SharpeRatio”] = pdPnl.apply(lambda rw : PnlCalculatorBase.computeSharpeRatio(pdPnl, rw[“COB”]), axis=1)
pdPnl[“MaxDrawDown”] = pdPnl.apply(lambda rw : PnlCalculatorBase.computeMaxDrawDown(pdPnl, rw[“COB”]), axis=1)

pnlDict = pdPnl.to_dict()
updatedProfitLosses = ProfitLoss.ProfitLoss.from_dict(pnlDict)
return updatedProfitLosses

Helpers:
@staticmethod
def computeSharpeRatio(pdPnl, cob):
val = None

# Please read below “BEWARE difference in how Numpy and Pandas handle Nan or None!”
# Pandas would filter Nan/None under PercentDTDReturn_Annualized automatically when compute “mean”, numpy doesn’t.
# Note also “std” depends on “mean”!
# numpy, on the other hand, dont filter automatically. Please refer to PnlCalculatorTests.testCalculateAnalyticsFromProfitLossSeries (Look bottom)
# pdPnl = pdPnl[(pdPnl[‘COB’]<=cob)]
pdPnl = pdPnl[(pdPnl[‘COB’]<=cob) & (pdPnl[“PercentDTDReturn_Annualized”] is not None) & (pdPnl[“PercentDTDReturn_Annualized”] != np.nan)]
pdPnl = pdPnl.loc[:,[“COB”, “PercentDTDReturn_Annualized”]]

# @todo, We don’t have risk free rate for Sharpe Ration calc. Here’s just total DTD avg return over standard deviation
# https://en.wikipedia.org/wiki/Sharpe_ratio
mean = pdPnl[“PercentDTDReturn_Annualized”].mean()
std = pdPnl[“PercentDTDReturn_Annualized”].std()
val = mean / std

return val

@staticmethod
def computeMaxDrawDown(pdPnl, cob):
val = None
pdPnl = pdPnl[(pdPnl[‘COB’]<=cob) & (pdPnl[“DTDRealizedPnl”]<0)]
val = pdPnl[“DTDRealizedPnl”].min()
return val

ObjectUtil.py
import sys
import logging
from random import *
from datetime import date
from datetime import datetime
from datetime import timedelta
import inspect

def objectPropertiesToDictionary(o, excludeSystemMembers = True):
result = {}
members = inspect.getmembers(o)
for member in members:
key = member[0]
value = member[1]
if excludeSystemMembers:
if “__” not in key:
result[key] = value
else:
result[key] = value
return result

ProfitLoss.py
import datetime
import time
import math

import pandas as pd
import numpy as np

from Util import ObjectUtil

class ProfitLoss(object):
def set(self, field, val):
setattr(self, field, val)

@staticmethod
def from_dict(dict):
if dict is None:
return None

profitLosses = []
for k, v in dict.items():
numPnl = len(v)
for i in range(0, numPnl):
pnl = ProfitLoss()
profitLosses.append(pnl)
break

for k, v in dict.items():
if k == “from_dict”:
break

i = 0
for val in v.values():
if isinstance(val, pd.Timestamp):
val = datetime.datetime(val.year, val.month, val.day)

val = None if val == np.nan else val

if isinstance(val, float) and math.isnan(val):
val = None

profitLosses[i].set(k, val)
i+=1

return profitLosses

Unit testing:

def testCalculateAnalyticsFromProfitLossSeries(self):
# Clean table firsst, or AssertEqual wont work
self.dao.deleteProfitLoss(self.source, self.symbol, cob = None, testId = None)

instrumentId = self.dao.getInstrumentId(self.source, self.symbol)

NUM_DAYS_HISTORY = 375 # over one year
NUM_DUMMY_TRADES = 5

history_end = datetime.today().replace(hour=0, minute=0, second=0, microsecond=0)
history_start = history_end – timedelta(days=NUM_DAYS_HISTORY)
num_days_history = (history_end – history_start).days
num_days_history += 1
history_dates = [history_end.date() – timedelta(days=x) for x in range(0, num_days_history)]
history_dates = list(reversed(history_dates))

markPrice = 0.035
i = 0
for histCob in history_dates:
# If no buy+sell, then you wouldnt get “RealizedPnl”
dummyTrades = MarketDataUtil.generateDummyTrades(NUM_DUMMY_TRADES, self.source, self.symbol, histCob, self.strategyId, self.instrumentId, self.tradeQuantities, self.tradePrices, testId = None, randomBuySell = True)

pnlTuple = PnlCalculatorBase.PnlCalculatorBase.replayPnlCore(histCob, dummyTrades, markPrice)
if pnlTuple is not None:
pnl = PnlCalculatorBase.PnlCalculatorBase.pnlTupleToProfitLoss(pnlTuple)
pnl.COB = histCob
pnl.InstrumentId = instrumentId
pnl.TestId = None

self.dao.persistProfitLoss(histCob, self.source, self.symbol, pnl, testId = None)
markPrice += 0.0001
i+=1

profitLosses = self.dao.fetchProfitLoss(self.source, self.symbol, testId = None)

self.assertEqual(len(profitLosses), NUM_DAYS_HISTORY +1)

updatedProfitLosses = PnlCalculatorBase.PnlCalculatorBase.calculateAnalyticsFromProfitLossSeries(profitLosses)
for updatedPnl in updatedProfitLosses:
self.dao.persistProfitLoss(updatedPnl.COB, self.source, self.symbol, updatedPnl, testId = None)

# manual checking – DTD/MTD/YTD pnl
lastPnl = updatedProfitLosses[len(updatedProfitLosses)-1]
TM1 = lastPnl.COB – timedelta(days=1)
MonthStart = lastPnl.COB.replace(day=1)
YearStart = datetime(lastPnl.COB.year, 1, 1)

tm1Pnl = list(filter(lambda pnl : pnl.COB == TM1, updatedProfitLosses))[0]
monthStartPnl = list(filter(lambda pnl : pnl.COB == MonthStart, updatedProfitLosses))[0]
yearStartPnl = list(filter(lambda pnl : pnl.COB == YearStart, updatedProfitLosses))[0]

expectedDTDRealizedPnl = lastPnl.InceptionRealizedPnl – tm1Pnl.InceptionRealizedPnl
expectedDTDUnrealizedPnl = lastPnl.InceptionUnrealizedPnl – tm1Pnl.InceptionUnrealizedPnl
expectedMTDRealizedPnl = lastPnl.InceptionRealizedPnl – monthStartPnl.InceptionRealizedPnl
expectedMTDUnrealizedPnl = lastPnl.InceptionUnrealizedPnl – monthStartPnl.InceptionUnrealizedPnl
expectedYTDRealizedPnl = lastPnl.InceptionRealizedPnl – yearStartPnl.InceptionRealizedPnl
expectedYTDUnrealizedPnl = lastPnl.InceptionUnrealizedPnl – yearStartPnl.InceptionUnrealizedPnl

self.assertAlmostEqual(lastPnl.DTDRealizedPnl, expectedDTDRealizedPnl, 8)
self.assertAlmostEqual(lastPnl.DTDUnrealizedPnl, expectedDTDUnrealizedPnl, 8)
self.assertAlmostEqual(lastPnl.MTDRealizedPnl, expectedMTDRealizedPnl, 8)
self.assertAlmostEqual(lastPnl.MTDUnrealizedPnl, expectedMTDUnrealizedPnl, 8)
self.assertAlmostEqual(lastPnl.YTDRealizedPnl, expectedYTDRealizedPnl, 8)
self.assertAlmostEqual(lastPnl.YTDUnrealizedPnl, expectedYTDUnrealizedPnl, 8)

# manual checking – MaxDrawdown
maxDrawdown = 0
for updatedPnl in updatedProfitLosses:
if updatedPnl.DTDRealizedPnl is not None and updatedPnl.DTDRealizedPnl < maxDrawdown:
maxDrawdown = updatedPnl.DTDRealizedPnl
self.assertAlmostEqual(lastPnl.MaxDrawDown, maxDrawdown, 8)

# manual checking – Sharpe
DTDTotalPnlEntries = list(map(lambda pnl : (pnl.DTDRealizedPnl if pnl.DTDRealizedPnl is not None else 0) + (pnl.DTDUnrealizedPnl if pnl.DTDUnrealizedPnl is not None else 0), updatedProfitLosses))
npDTDTotalPnlEntries = np.asarray(DTDTotalPnlEntries, dtype = float)
mean = npDTDTotalPnlEntries.mean()
std = npDTDTotalPnlEntries.std()
expectedSharpeRatio = mean / std

self.assertAlmostEqual(lastPnl.SharpeRatio, expectedSharpeRatio, 7)

Advertisements

Interview Prep: Python core

Python Interview Prep
https://gridwizard.wordpress.com/2018/10/09/interview-prep-python-core/

Python v3 vs v2? V3 released 2008, it was breaking change.
http://sebastianraschka.com/Articles/2014_python_2_3_key_diff.html

Python vs C# vs Java?
https://docs.google.com/spreadsheets/d/1paTdgWUtA0uyXpPwoT3ZKIq58U0GVWQQCXQWTABh_c8/edit?usp=sharing

Python performance:
https://www.quora.com/Is-it-possible-that-Python-can-run-faster-than-C-Why
Pypy vs CPython: https://pypy.org/
Vectorization:
https://engineering.upside.com/a-beginners-guide-to-optimizing-pandas-code-for-speed-c09ef2c6a4d6?gi=1d738b7536d4
https://realpython.com/numpy-array-programming/
Ctype: http://www.maxburstein.com/blog/speeding-up-your-python-code/

What’s “__init__.py”? Designate a folder is a python “package”. “__init__.py” can be an empty file: https://stackoverflow.com/questions/448271/what-is-init-py-for

__init__ vs __new__?
__new__ is called BEFORE instantiation and does not take “self” as parameter
__init__ is called AFTER instantiation, thus can take “self” as parameter
If __new__ method return something else other than instance of class, then instances __init__ method will not be invoked!

http://howto.lintel.in/python-__new__-magic-method-explained/
https://spyhce.com/blog/understanding-new-and-init
https://www.quora.com/What-difference-among-methods-__new__-__init__-and-__call__-in-Python-metaclass

What’s __name__?
https://www.geeksforgeeks.org/what-does-the-if-__name__-__main__-do/
print “Always executed”

if __name__ == “__main__”:
print “Executed when invoked directly”
else:
print “Executed when imported”

Constructor and Destructor?
class TestClass:
def __init__(self):
print (“constructor”)
def __del__(self):
print (“destructor”)

if __name__ == “__main__”:
obj = TestClass()
del obj
https://helloacm.com/constructor-and-destructor-in-python-classes/

“del” vs setting to “None”? There’s no difference.
Note, however, exceptions from “__del__” are ignored by python.
https://python-3-patterns-idioms-test.readthedocs.io/en/latest/InitializationAndCleanup.html
https://www.holger-peters.de/an-interesting-fact-about-the-python-garbage-collector.html

Also, in event of circular reference, __del__ would not even be invoked. Example, below, after setting both “a” and “b” to None, __del__ are not called.
>>> a = A()
>>> b = A()
>>> a.other = b
>>> b.other = a
>>> a = None
>>> b = None

However, since A implements __del__, Python refuses to clean them, arguing that it cannot not tell, which __del__ method to call first. Instead of doing the wrong thing (invoking them in the wrong sequence), Python decides to rather do nothing

What’s “__call__”? http://hplgit.github.io/primer.html/doc/pub/class/._class-solarized003.html
__call__ is like a function call operator. Once you implement __call__ in your Class, you can invoke the Class instance as a function call. For example:
class Foo:
def __call__(self, a, b):
pass
f = Foo()
f(‘hello’,’world’) # This invokes the __call__

Another example,
class Derivative(object):
def __init__(self, f, h=1E-5):
self.f = f
self.h = float(h)

def __call__(self, x):
f, h = self.f, self.h # make short forms
return (f(x+h) – f(x))/h
>>> from math import sin, cos, pi
>>> df = Derivative(sin)
>>> x = pi
>>> df(x)
-1.000000082740371

Inheritance? super? See next example on raising exception. GraphicCommandException extends GraphicExceptionBase
http://amyboyle.ninja/Python-Inheritance

Raise exception?
Example,
Exceptions classes:
class GraphicExceptionBase(ValueError):
def __init__(self, commandString):
self.commandString = commandString

from mygraphics import GraphicExceptionBase as gex
class GraphicCommandException(gex.GraphicExceptionBase):
# For things like invalid command line format

def __init__(self, commandString):
super(GraphicCommandException, self).__init__(commandString)

“GraphicCommandParser.py” under “mygraphics” package (For demo, this is not a class):
import re
from mygraphics import GraphicCommandBase as gcmd
from mygraphics import GraphicCommandException as gex

def parse(commandString):
_REEXPR_COMMAND_CREATE_CANVAS = “^(C){1,1} (\d+) (\d+)”
_REEXPR_COMMAND_DRAW_LINE = “^(L){1,1} (\d+) (\d+) (\d+) (\d+)”
_REEXPR_COMMAND_DRAW_RECT = “^(R){1,1} (\d+) (\d+) (\d+) (\d+)”
_REEXPR_COMMAND_BUCKET_FILL = “^(B){1,1} (\d+) (\d+) (.){1}”
_REEXPR_COMMAND_QUIT = “^(Q){1,1}”

cmd = gcmd.GraphicCommandBase(commandString)

expr = _REEXPR_COMMAND_CREATE_CANVAS
match = re.match(expr, commandString, re.I)
if match:
cmd.command = match.group(1)
cmd.w = int(match.group(2))
cmd.h = int(match.group(3))
return cmd
… more code …

“GraphicRenderEngine.py” under “mygraphics” package (For demo, this is a class. Note all methods, first parameter is “self”):
class GraphicRenderEngine(object):
_DOT = “x”
_HORIZONTAL_BOUND = “-”
_VERTICAL_BOUND = “|”

def __init__(self):
self.created = datetime.datetime.now()

… more code …

def draw(self, graphicCommand):
… more code …

Then from main.py
import mygraphics
from mygraphics import GraphicRenderEngine as gengine
from mygraphics import GraphicCommandParser as gparser <– note, this is not a class, “parse” is just a function defined in it
from mygraphics import GraphicCommandBase as gcmd
from mygraphics import GraphicCommandException as gcmdex
from mygraphics import GraphicRenderException as grndex
from mygraphics import GraphicRenderOutofBoundException as grndobex

engine = gengine.GraphicRenderEngine() <– GraphicRenderEngine is an instance of a class
keepGoing = True
while keepGoing:
try:
commandString = input(“Enter draw command, C = Create Canvas, L = draw Line, R = draw Rectangle, B = Bucket Fill, Q = Quit: “)
graphicCommand = gparser.parse(commandString)
if graphicCommand.command!=”Q”:
engine.draw(graphicCommand)
else:
keepGoing = False
except gcmdex.GraphicCommandException:
gparser.help()
except grndobex.GraphicRenderOutofBoundException:
print (“Graphic rendering exception: OutofBound error”)
except grndex.GraphicRenderException:
print (“Graphic rendering exception”)
else:
pass
finally:
pass

Static member and methods?
Static member:
class Example:
staticVariable = 5 # Access through class

print Example.staticVariable # prints 5

# Access through an instance
instance = Example()
print instance.staticVariable # still 5

# Change within an instance
instance.staticVariable = 6
print instance.staticVariable # 6
print Example.staticVariable # 5

# Change through
class Example.staticVariable = 7
print instance.staticVariable # still 6
print Example.staticVariable # now 7

Static method (Decorate method with @staticmethod, note no “self” parameter in method signature): https://docs.python.org/2/library/functions.html#staticmethod
class MyClass(object):
@staticmethod
def the_static_method(x):
print x
To invoke it,
MyClass.the_static_method(2)

Warning, this won’t work as __myName__ isnt static property:
class MyClass(object):
__myName__ = “John”

@staticmethod
def static_print(x):
print(x + ” ” + __myName__) <– This will blow up!

MyClass.static_print(“hello”)

To fix this:
class MyClass(object):
__myName__ = “John”

@staticmethod
def static_print(x):
print(x + ” ” + MyClass.__myName__)

http://radek.io/2011/07/21/static-variables-and-methods-in-python/

@property: https://www.quora.com/What-are-the-purposes-of-staticmethod-and-property-decorators

Python has event/delegates like C#?
Example 1 (Least preferable if multiple handlers): Just pass your handler function
https://stackoverflow.com/questions/2184263/eventhandler-event-delegate-based-programming-in-python-any-example-would-appr
def do_work_and_notify(done_handler):
# Do some work here…
done_handler()

def send_email_on_completion():
email_send(‘joe@example.com’, ‘you are done’)

do_work_and_notify(send_email_on_completion)

Example 2: Use decorator (Different wrapped functions has different number of arguments. You need different wrapper/decorators)
https://programmingideaswithjake.wordpress.com/2015/05/23/python-decorator-for-simplifying-delegate-pattern/

Example 3. Use a publisher/subscriber library (But is that little heavy you need a library for simple things like this?)
pypubsub
http://pypubsub.readthedocs.io/en/stable/usage/usage_basic.html#quick-start

pydispatcher
http://pydispatcher.sourceforge.net/

Python threading
Basic:
https://www.tutorialspoint.com/python/python_multithreading.htm
lock, Rlock, event and Conditions?
https://hackernoon.com/synchronization-primitives-in-python-564f89fee732
Mutex in multiprocessing?
https://stackoverflow.com/questions/28664720/how-to-create-global-lock-semaphore-with-multiprocessing-pool-in-python
Centralizing lock.acquire and avoid timeout=-1 (avoid deadlocks)
https://stackoverflow.com/questions/52645864/python-centralizing-lock-acquire

concurrent.futures.ProcessPoolExecutor:
https://docs.python.org/dev/library/concurrent.futures.html
http://masnun.com/2016/03/29/python-a-quick-introduction-to-the-concurrent-futures-module.html
“multiprocessing.Pool” instead of ProcessPoolExecutor to avoid “Queue objects should only be shared between processes through inheritance”
https://stackoverflow.com/questions/9908781/sharing-a-result-queue-among-several-processes
https://stackoverflow.com/questions/8804830/python-multiprocessing-pickling-error
https://docs.python.org/3/library/pickle.html#what-can-be-pickled-and-unpickled
“pathos” Why use pathos? We want parallelism, however, concurrent.futures.ProcessPoolExecutor or multiprocessing.Pool which imposes constraints for example thread func need be static method or top level function outside your class.
https://pypi.org/project/pathos/
https://kampta.github.io/Parallel-Processing-in-Python/ <– Must read!
https://stackoverflow.com/questions/51345135/python-pathos-error-errorrootclass-runtimeerror

Example 1. Basic
import threading
import time

class myThread (threading.Thread):
def __init__(self, threadID, name, counter):
threading.Thread.__init__(self)
self.threadID = threadID
self.name = name
self.counter = counter
def run(self):
print “Starting ” + self.name
# Get lock to synchronize threads
threadLock.acquire()
print_time(self.name, self.counter, 3)
# Free lock to release next thread
threadLock.release()

def print_time(threadName, delay, counter):
while counter:
time.sleep(delay)
print “%s: %s” % (threadName, time.ctime(time.time()))
counter -= 1

threadLock = threading.Lock()
threads = []

# Create new threads
thread1 = myThread(1, “Thread-1”, 1)
thread2 = myThread(2, “Thread-2”, 2)

# Start new Threads
thread1.start()
thread2.start()

# Add threads to thread list
threads.append(thread1)
threads.append(thread2)

# Wait for all threads to complete
for t in threads:
t.join()
print “Exiting Main Thread”

Example 2. concurrent.futures import ProcessPoolExecutor
from concurrent.futures import ProcessPoolExecutor
from time import sleep

def return_after_5_secs(message):
sleep(5)
return message

pool = ProcessPoolExecutor(3)

future = pool.submit(return_after_5_secs, (“hello”))
print(future.done())
sleep(5)
print(future.done())
print(“Result: ” + future.result())

Example 3. Use multiprocessing.Pool instead of ProcessPoolExecutor to avoid “Queue objects should only be shared between processes through inheritance”
https://stackoverflow.com/questions/9908781/sharing-a-result-queue-among-several-processes
https://stackoverflow.com/questions/8804830/python-multiprocessing-pickling-error
https://docs.python.org/3/library/pickle.html#what-can-be-pickled-and-unpickled
import multiprocessing
def worker(name, que):
que.put(“%d is done” % name)

if __name__ == ‘__main__’:
pool = multiprocessing.Pool(processes=3)
m = multiprocessing.Manager()
q = m.Queue()
workers = pool.apply_async(worker, (33, q))

abc and Abstract classes:
from abc import ABC, abstractmethod

class AbstractClassExample(ABC):

def __init__(self, value):
self.value = value
super().__init__()

@abstractmethod
def do_something(self):
pass

class DoAdd42(AbstractClassExample):
def do_something(self):
return self.value + 42

class DoMul42(AbstractClassExample):

def do_something(self):
return self.value * 42

x = DoAdd42(10)
y = DoMul42(10)
print(x.do_something())
print(y.do_something())

https://www.python-course.eu/python3_abstract_classes.php

Reflection? inspect import getmembers/isclass/isabstract
import inspect
import example
for name, data in inspect.getmembers(example, inspect.isclass):
print ‘%s :’ % name, repr(data)

https://pymotw.com/2/inspect/
https://bip.weizmann.ac.il/course/python/PyMOTW/PyMOTW/docs/inspect/index.html
http://archive.oreilly.com/oreillyschool/courses/Python4/Python4-09.html

isinstance of?
class Foo:
a = 5

fooInstance = Foo()
print(isinstance(fooInstance, Foo)) –> True
print(isinstance(fooInstance, (list, tuple))) –> False
print(isinstance(fooInstance, (list, tuple, Foo))) –> True

https://www.programiz.com/python-programming/methods/built-in/isinstance

Python decorators:

From Util.py:
import logging
import datetime

def timing_log_noarg(func):
def wrapper(arg1):
start = datetime.datetime.now()
result = func(arg1)
finish = datetime.datetime.now()
msg = str(func) + ” invoked. start: ” + str(start) + “, finish: ” + str(finish)
logging.info(msg)
return result
return wrapper
def timing_log_2arg(func):
def wrapper(arg1, arg2, arg3):
start = datetime.datetime.now()
result = func(arg1, arg2, arg3)
finish = datetime.datetime.now()
msg = str(func) + ” invoked. start: ” + str(start) + “, finish: ” + str(finish) + ” ” + arg2 + ” ” + arg3
logging.info(msg)
return result
return wrapper

To use it:
from Util import Util as util

class TimeConsumingServiceService(object):
… more code …
def __init__(self, homeDir):
… more code

@util.timing_log_noarg <– well, one arg, that’s “self”
def reload(self):
self.someLengthyOperation()

@util.timing_log_2arg
def load(self, param1, param2): <– well, total three arguments, including “self”
self.someOtherLengthyOperation()

pass by value vs pass by reference? — it’s “Passing reference by value”
Arguments are passed neither by value and nor by reference in Python
– instead they are passed by assignment.
The parameter passed in is actually a reference to an object, as opposed to reference to a fixed memory location but the reference is passed by value.

https://www.quora.com/Are-arguments-passed-by-value-or-by-reference-in-Python

Garbage Collection
https://rushter.com/blog/python-garbage-collector/
https://rushter.com/blog/python-memory-managment/
https://www.quora.com/How-does-garbage-collection-in-Python-work-What-are-the-pros-and-cons
https://pythoninternal.wordpress.com/2014/08/04/the-garbage-collector/

Two mechanisms:
a. Reference Counting (Basic, not-optional)
The reference count increases:
assignment operator
argument passing
appending an object to a list
If reference counting field reaches zero, CPython automatically calls the object-specific deallocation function.

b. Reference Cycle (Optional, when call gc.collect – for collection classes only, exclude tuples)
CPython has an algorithm to detect those reference cycles, implemented in the function collect. First of all, it only focuses on container objects (i.e. objects that can contain a reference to one or more objects): arrays, dictionaries, user class instances, etc. As an extra optimization, the GC ignores tuples containing only immutable types (int, strings, … or tuples containing only immutable types)

What’s gc_ref? Each Python object has a field – *gc_ref*, which is (I believe) set to NULL for non-container objects. For container objects it is set equal to the number of non container objects that reference it.
Any container object with a *gc_ref* count greater than 1 has references that are not container objects. So they are REACHEABLE and are removed from consideration of being unreachable memory islands. Any container object reachable by an object known to be reachable does not need to be freed.
The remaining container objects are UN-REACHEABLE (except by each other) and should be freed.
https://stackoverflow.com/questions/10962393/how-does-pythons-garbage-collector-detect-circular-references/10962484

What’s “weak reference”?
https://mindtrove.info/python-weak-references/
With “strong reference”, object is not deallocated until BOTH references “a” and “b” are “del”
>>> a = Foo()
>>> b = a
>>> del a
>>> del b
destroyed
With “weak reference”, object is deallocated as soon as ONE of two references is “del”.
import weakref
>>> a = Foo()
created
>>> b = weakref.ref(a)

Note, Also, in event of circular reference, __del__ would not even be invoked. Example, below, after setting both “a” and “b” to None, __del__ are not called.
https://www.holger-peters.de/an-interesting-fact-about-the-python-garbage-collector.html
>>> a = A()
>>> b = A()
>>> a.other = b
>>> b.other = a
>>> a = None
>>> b = None

__iter__?
The __iter__ method is what makes an object iterable. Behind the scenes, the iter function calls __iter__ method on the given object.
The return value of __iter__ is an iterator. It should have a next method and raise StopIteration when there are no more elements.

class yrange:
def __init__(self, n):
self.i = 0
self.n = n

def __iter__(self):
return self

def next(self):
if self.i < self.n:
i = self.i
self.i += 1
return i
else:
raise StopIteration()

https://www.programiz.com/python-programming/iterator
https://anandology.com/python-practice-book/iterators.html
The Python yield keyword explained

Generator/Yield return: Lazy evaluation of a list. For example range(10000) is not generator, it returns a list of ten thousand integers all in memory. xrange(10000) however, return one integer a time.
for i in range(0, 20):
for i in xrange(0, 20):

https://wiki.python.org/moin/Generators

# Using the generator pattern (an iterable)
class firstn(object):
def __init__(self, n):
self.n = n
self.num, self.nums = 0, []

def __iter__(self):
return self

# Python 3 compatibility
def __next__(self):
return self.next()

def next(self):
if self.num < self.n:
cur, self.num = self.num, self.num+1
return cur
else:
raise StopIteration()

sum_of_first_n = sum(firstn(1000000))

# a generator that yields items instead of returning a list
def firstn(n):
num = 0
while num < n:
yield num
num += 1

sum_of_first_n = sum(firstn(1000000))

lambda map/filter/reduce: https://www.python-course.eu/python3_lambda.php
biggerThanThreshold = lambda x : x > 5
filtered = filter(biggerThanThreshold, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
transformed = map(sqSomeNum, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

sorted – “sorted(list)” vs “list.sort”??! sorted() returns a new sorted list, leaving the original list unaffected. list.sort() sorts the list in-place, mutating the list indices, and returns None (like all in-place operations).
https://docs.python.org/2/howto/sorting.html
https://stackoverflow.com/questions/22442378/what-is-the-difference-between-sortedlist-vs-list-sort

>>> sorted([5, 2, 3, 1, 4])
[1, 2, 3, 4, 5]

>>> a = [5, 2, 3, 1, 4]
>>> a.sort()
>>> a
[1, 2, 3, 4, 5]

>>> sorted({1: ‘D’, 2: ‘B’, 3: ‘B’, 4: ‘E’, 5: ‘A’})
[1, 2, 3, 4, 5]

>>> sorted(“This is a test string from Andrew”.split(), key=str.lower)
[‘a’, ‘Andrew’, ‘from’, ‘is’, ‘string’, ‘test’, ‘This’]

>>> student_tuples = [
… (‘john’, ‘A’, 15),
… (‘jane’, ‘B’, 12),
… (‘dave’, ‘B’, 10),
… ]
>>> sorted(student_tuples, key=lambda student: student[2]) # sort by age
[(‘dave’, ‘B’, 10), (‘jane’, ‘B’, 12), (‘john’, ‘A’, 15)]

>>> class Student:
… def __init__(self, name, grade, age):
… self.name = name
… self.grade = grade
… self.age = age
… def __repr__(self):
… return repr((self.name, self.grade, self.age))
>>> student_objects = [
… Student(‘john’, ‘A’, 15),
… Student(‘jane’, ‘B’, 12),
… Student(‘dave’, ‘B’, 10),
… ]
>>> sorted(student_objects, key=lambda student: student.age) # sort by age
[(‘dave’, ‘B’, 10), (‘jane’, ‘B’, 12), (‘john’, ‘A’, 15)]
Multiline Lambda? Not supported, just use “def” write a normal function.

List Comprehension: http://www.pythonforbeginners.com/basics/list-comprehensions-in-python
Old way:
biggerThanThreshold = lambda x : x > 5
squares = []
for x in range(10):
if biggerThanThreshold(x):
squares.append(x**2)
FP way:
biggerThanThreshold = lambda x : x > 5
squares = [x**2 for x in range(10) if biggerThanThreshold(x)]
Here x**2 is map expression which transform x to sq(x)
biggerThanThreshold is your filter expression.

Python and LINQ? List Comprehsion + generators http://mark-dot-net.blogspot.com/2014/03/python-equivalents-of-linq-methods.html

HttpRequest and Web scrapping – use urllib for HttpRequest, then BeautifulSoup to parse HTML
class DataLoaderBase(object):
_SOURCE_NAME = “Undefined”

def load():
data = {}
return data

import urllib.request
from Data import DataLoaderBase as dl
class WebScrapperBase(dl.DataLoaderBase):
def load(self, url):
request = urllib.request.Request(url)
response = urllib.request.urlopen(request)
rawData = response.read()
return rawData

JSON config loading?
Example JSON config:
{
“countries”: [ “germany”, “unitedkingdom”, “france”, “china”, “unitedstates”, “japan” ],
“webScrapperConfig”: {
“baseUrl”: “https://somewhere.com&#8221;,
“queryVar”: “$COUNTRY$/forecast”
}
}
Now the loading part:
import json
import os

class SomeService(object):
def __init__(self, homeDir):
path = homeDir + “\\Config\\someConfig.json”
with open(path) as json_data_file:
data = json.load(json_data_file)
self.countries = data[“countries”]
self.webScrapperConfig.baseUrl = data[“webScrapperConfig”][“baseUrl”]
self.webScrapperConfig.queryVar = data[“webScrapperConfig”][“queryVar”]

Unit Tests:
Mocks vs Fakes vs Stubs
https://www.telerik.com/blogs/fakes-stubs-and-mocks
https://stackoverflow.com/questions/346372/whats-the-difference-between-faking-mocking-and-stubbing
https://blog.pragmatists.com/test-doubles-fakes-mocks-and-stubs-1a7491dfa3da
https://martinfowler.com/articles/mocksArentStubs.html
https://www.c-sharpcorner.com/UploadFile/dacca2/understand-stub-mock-and-fake-in-unit-testing/

From Martin Fowler,
In automated testing it is common to use objects that look and behave like their production equivalents, but are actually simplified. This reduces complexity, allows to verify code independently from the rest of the system and sometimes it is even necessary to execute self validating tests at all. A Test Double is a generic term used for these objects.

a. Fake objects actually have working implementations, but usually take some shortcut which makes them not suitable for production (an in memory database is a good example).
b. Spies are stubs that also record some information based on how they were called. One form of this might be an email service that records how many messages it was sent.
c. Stubs provide canned answers to calls made during the test, usually not responding at all to anything outside what’s programmed in for the test.
(i.e. pre-programmed return values – i.e. OUTPUT)
d. Mocks are what we are talking about here: objects pre-programmed with expectations which form a specification of the calls they are expected to receive.
(i.e. Just check if mocks can process expected INPUT)

Example,
Stubbing:
– Example 1: Given fixed set of entitlements/permissions, access control service should return expected list of views that’s visible to user. If no entitlement, list of permissible/visible list should be empty. We stub entitlement service that feeds our system (entitlement service will be stubbed to return pre-canned entitlements).
– Example 2. Different views takes same set of raw data from DAO but aggregate/grouped/filters/calculated differently. We stub DAO’s (DAO will be stubbed to return pre-canned business data) and isolate test on calculation logic.

Mocking:
Submission of adjustments – SubmissionService.submit() will a few things: Post adjustment to database and dispatch notification emails. we mock both dao and email service, with unit test only to validate that SubmissionService.submit successfully calls both dao and email service. Downstream DAOs and email services are mocked, unit test only validates that these downstream services are invoked by service layer as expected. Nothing saved to database and no email sent from service layer unit tests.

[See above Example 5 and 6 for stubbing] – upstream DAOs and entitlement services are stubbed to return pre-canned business data and entitlements that service layer uses.
[See above Example 7 for mocking] – downstream DAOs and email services are mocked, unit test only validates that these downstream services are invoked by service layer as expected. Nothing saved to database and no email sent from service layer unit tests.

Example,
from unittest.mock import patch, Mock
import unittest

import os

from Data import WebScrapperBase as webscrapper
from Data import FeedLoader as fd
# Test command parsing only
class TradingEconomicsScrapperBaseTests(unittest.TestCase):

def setUp(self):
someConfig = {}
someConfig [“baseUrl”] = “https://somewhere.com&#8221;
someConfig [“countryProjectionUrl”] = “$COUNTRY$/forecast”
self.feedLoaderWrapper = Data.FeedLoaderWrapper(someConfig)

testDir = os.path.dirname(__file__)
path = testDir + “\\raw.HTML”
with open(path) as f:
data = f.read()
self.rawHTML = data

def tearDown(self):
pass

@patch(‘Data.FeedLoaderWrapper’)
def testParseCountryProjection_MockedHTML(self, mockFeedLoaderWrapper ):
mockFeedLoaderWrapper.fetchHTML.return_value = self.rawHTML
data = self. feedLoaderWrapper .parseCountryProjection(“united-states”, mockFeedLoaderWrapper.fetchHTML())
self.assertIsNotNone(data)
self.assertEqual(data.Country, “united-states”)
self.assertEqual(data.DataSource, self.feedLoader._SOURCE_NAME)
self.assertEqual(len(data.Data), 13)

def testParseCountryProjection_NoMock(self):
data = self.feedLoaderWrapper.load(self.tradingEconLoader.DATA_TYPE_COUNTRY_PROJECTION, “united-states”) <– internally, Data. FeedLoaderWrapper calls FeedLoaderWrapper.fetchHTML (Mock patched this method in prev test case, but not here), then FeedLoaderWrapper.parseCountryProjection
self.assertIsNotNone(data)
self.assertEqual(data.Country, “united-states”)
self.assertEqual(data.DataSource, self.tradingEconLoader._SOURCE_NAME)
self.assertEqual(len(data.Data), 13)
Python and spring: https://docs.spring.io/spring-python/1.2.x/sphinx/html/

Understanding UnboundLocalError in Python
https://eli.thegreenplace.net/2011/05/15/understanding-unboundlocalerror-in-python