Interview Prep: Pandas

https://pandas.pydata.org/pandas-docs/stable/cookbook.html#cookbook-selection
https://gridwizard.wordpress.com/2018/10/09/interview-prep-pandas/

pandas DataFrame
Creating a DataFrame, and some rows:
Example 1. No index (Or default index = 0,1,2,3..etc)
columns=[‘Date’,’direction’,’size’,’ticker’,’tradePrices’]
orders = pd.DataFrame(columns=columns)
orders.loc[0] = [‘2011-01-10′,’Buy’,1500,’AAPL’,339.44]
orders.loc[1] = [‘2011-01-13′,’Sell’,1500,’AAPL’,342.64]
orders.loc[2] = [‘2011-01-13′,’Buy’,4000,’IBM’,143.92]
orders.loc[3] = [‘2011-01-26′,’Buy’,1000,’GOOG’,616.50]
orders.loc[4] = [‘2011-02-02′,’Sell’,4000,’XOM’,79.46]
orders.loc[5] = [‘2011-02-10′,’Buy’,4000,’XOM’,79.68]
orders.loc[6] = [‘2011-03-03′,’Sell’,1000,’GOOG’,609.56]
orders.loc[7] = [‘2011-03-03′,’Sell’,2200,’IBM’,158.73]
orders.loc[8] = [‘2011-06-03′,’Sell’,3300,’IBM’,160.97]
orders.loc[9] = [‘2011-05-03′,’Buy’,1500,’IBM’,167.84]
orders.loc[10] = [‘2011-06-10′,’Buy’,1200,’AAPL’,323.03]
orders.loc[11] = [‘2011-08-01′,’Buy’,55,’GOOG’,606.77]
orders.loc[12] = [‘2011-08-01′,’Sell’,55,’GOOG’,606.77]
orders.loc[13] = [‘2011-12-20′,’Sell’,1200,’AAPL’,392.46]

To set a value of a particular cell:
orders.iat[0,1] = “Sell”

Or:
orders.at[0,”direction”] = “Buy”

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.iat.html
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.at.html

Example 2. Index = dates
columns=[‘AAPL’,’GOOG’,’IBM’,’XOM’]
index = [‘2011-01-10′,’2011-01-13′,’2011-01-26′,’2011-02-02′,’2011-02-10′,’2011-03-03′,’2011-05-03′,’2011-06-03′,’2011-06-10′,’2011-08-01′,’2011-12-20’]
prices = pd.DataFrame(columns=columns, index=index)
prices.iloc[0]=[339.441,614.219,142.781,71.571]
prices.iloc[1]=[342.642,616.698,143.922,73.083]
prices.iloc[2]=[340.823,616.507,155.743,75.895]
prices.iloc[3]=[341.294,612.006,157.934,79.467]
prices.iloc[4]=[351.425,616.445,159.325,79.689]
prices.iloc[5]=[356.406,609.564,158.736,82.192]
prices.iloc[6]=[345.147,533.893,167.847,82.004]
prices.iloc[7]=[340.428,523.082,160.978,78.196]
prices.iloc[8]=[323.039,509.511,159.149,76.848]
prices.iloc[9]=[393.261,606.779,176.281,76.671]
prices.iloc[10]=[392.462,630.378,184.142,79.973]

orders[‘streetPrices’] = prices.lookup(orders.Date, orders.ticker)

Basic example, showing:
a) initialize DataFrame
b) Calculated field “spread” = bid – offer
c) Merge Join DataFrames
d) Sort multiple fields

exchangeBalances = [
[‘ETHBTC’,’binance’,10],
[‘LTCBTC’,’binance’,10],
[‘XRPBTC’,’binance’,10],
[‘ETHBTC’,’bitfinex’,10],
[‘LTCBTC’,’bitfinex’,10],
[‘XRPBTC’,’bitfinex’,10]
]
bidOffers = [
[‘ETHBTC’,’binance’, 0.0035, 0.0351, datetime(2018, 9, 1, 8, 15)], [‘LTCBTC’,’binance’,0.009,0.092, datetime(2018, 9, 1, 8, 15)], [‘XRPBTC’,’binance’,0.000077, 0.000078, datetime(2018, 9, 1, 8, 15)], [‘ETHBTC’,’bitfinex’, 0.003522, 0.0353, datetime(2018, 9, 1, 8, 15)], [‘LTCBTC’,’bitfinex’,0.0093,0.095, datetime(2018, 9, 1, 8, 15)], [‘XRPBTC’,’bitfinex’,0.000083, 0.000085, datetime(2018, 9, 1, 8, 15)],
[‘ETHBTC’,’binance’, 0.0035, 0.0351, datetime(2018, 9, 1, 8, 30)], [‘LTCBTC’,’binance’,0.009,0.092, datetime(2018, 9, 1, 8, 30)], [‘XRPBTC’,’binance’,0.000077, 0.000078, datetime(2018, 9, 1, 8, 30)], [‘ETHBTC’,’bitfinex’, 0.003522, 0.0353, datetime(2018, 9, 1, 8, 30)], [‘LTCBTC’,’bitfinex’,0.0093,0.095, datetime(2018, 9, 1, 8, 30)], [‘XRPBTC’,’bitfinex’,0.000083, 0.000085, datetime(2018, 9, 1, 8, 30)],
[‘ETHBTC’,’binance’, 0.0035, 0.0351, datetime(2018, 9, 1, 8, 45)], [‘LTCBTC’,’binance’,0.009,0.092, datetime(2018, 9, 1, 8, 45)], [‘XRPBTC’,’binance’,0.000077, 0.000078, datetime(2018, 9, 1, 8, 45)], [‘ETHBTC’,’bitfinex’, 0.003522, 0.0353, datetime(2018, 9, 1, 8, 45)], [‘LTCBTC’,’bitfinex’,0.0093,0.095, datetime(2018, 9, 1, 8, 45)], [‘XRPBTC’,’bitfinex’,0.000083, 0.000085, datetime(2018, 9, 1, 8, 45)]
]
dfExchangeBalances = pd.DataFrame(exchangeBalances, columns=[‘symbol’,’exchange’,’balance’])
dfBidOffers = pd.DataFrame(bidOffers, columns=[‘ticker’,’exchange’,’bid’, ‘offer’, ‘created’])
dfBidOffers[“spread”] = dfBidOffers[“bid”] – dfBidOffers[“offer”]
dfSummary = dfExchangeBalances.merge(dfBidOffers, how=’left’, left_on=[‘symbol’,’exchange’], right_on=[‘ticker’,’exchange’])
dfSummary = dfSummary.sort_values(by=[‘symbol’,’exchange’,’created’], ascending=[True, True, False])

>>> dfExchangeBalances
symbol exchange balance
0 ETHBTC binance 10
1 LTCBTC binance 10
2 XRPBTC binance 10
3 ETHBTC bitfinex 10
4 LTCBTC bitfinex 10
5 XRPBTC bitfinex 10
>>> dfBidOffers
symbol exchange bid offer
0 ETHBTC binance 0.003500 0.035100
1 LTCBTC binance 0.009000 0.092000
2 XRPBTC binance 0.000080 0.000078
3 ETHBTC bitfinex 0.003522 0.035300
4 LTCBTC bitfinex 0.009300 0.095000
5 XRPBTC bitfinex 0.000083 0.000085
>>> dfBidOffers
symbol exchange … created spread
0 ETHBTC binance … 2018-09-01 08:15:00 -3.160000e-02
1 LTCBTC binance … 2018-09-01 08:15:00 -8.300000e-02
2 XRPBTC binance … 2018-09-01 08:15:00 -1.000000e-06
3 ETHBTC bitfinex … 2018-09-01 08:15:00 -3.177800e-02
4 LTCBTC bitfinex … 2018-09-01 08:15:00 -8.570000e-02
5 XRPBTC bitfinex … 2018-09-01 08:15:00 -2.000000e-06

UNION – it’s Pandas.concat (Note you can concat vertically or horizontally depending on “axis” specification)
“concat” vs “merge”?
https://www.tutorialspoint.com/python_pandas/python_pandas_concatenation.htm

Reindex:
https://pandas.pydata.org/pandas-docs/stable/advanced.html
https://chrisalbon.com/python/data_wrangling/pandas_dataframe_reindexing/

Example reindex columns (not rows):
import pandas as pd
import numpy as np

columns=[‘BucketLabel’,’price’,’QTY’,’BidOrAsk’]
pdBidsBinance = pd.DataFrame(columns=columns)
pdBidsBinance.loc[0] = [’40-41′, 40.38, 100, ‘BUY’]
pdBidsBinance.loc[1] = [’40-41′, 40.381, 200, ‘BUY’]
pdBidsBinance.loc[2] = [’40-41′, 40.51, 300, ‘BUY’]
pdBidsBinance.loc[3] = [’41-42′, 41.3, 150, ‘BUY’]
pdBidsBinance.loc[4] = [’41-42′, 41.51, 100, ‘BUY’]
pdBidsBinance.loc[5] = [’41-42′, 41.81, 200, ‘BUY’]
pdBidsBinance.loc[6] = [’42-43′, 42.78, 300, ‘BUY’]
pdBidsBinance.loc[7] = [’42-43′, 42.31, 200, ‘BUY’]
pdBidsBinance.loc[8] = [’42-43′, 42.88, 500, ‘BUY’]

pdBidsBinance = pdBidsBinance.reindex(columns=[“BucketLabel”, “price”, “QTY”]) # exclude “BidOrAsk” (or exclude other useless fields you dont need)
pdBidsBinance[“TradeConsideration”] = pdBidsBinance[“price”] * pdBidsBinance[“QTY”]
pdBidsBinance.set_index(“BucketLabel”, inplace=True)
pdBidsBinance.columns = pd.MultiIndex.from_product([[“Binance”],[“price”, “QTY”, “TradeConsideration”]])

pdAsksKraken = pd.DataFrame(columns=columns)
pdAsksKraken.loc[0] = [’40-41′, 40.28, 200, ‘SELL’]
pdAsksKraken.loc[1] = [’40-41′, 40.181, 200, ‘SELL’]
pdAsksKraken.loc[2] = [’40-41′, 40.31, 100, ‘SELL’]
pdAsksKraken.loc[3] = [’41-42′, 41.1, 500, ‘SELL’]
pdAsksKraken.loc[4] = [’41-42′, 41.21, 150, ‘SELL’]
pdAsksKraken.loc[5] = [’41-42′, 41.21, 700, ‘SELL’]
pdAsksKraken.loc[6] = [’42-43′, 42.68, 100, ‘SELL’]
pdAsksKraken.loc[7] = [’42-43′, 42.11, 200, ‘SELL’]
pdAsksKraken.loc[8] = [’42-43′, 42.3, 300, ‘SELL’]

pdAsksKraken = pdAsksKraken.reindex(columns=[“BucketLabel”, “price”, “QTY”]) # exclude “BidOrAsk” (or exclude other useless fields you dont need)
pdAsksKraken[“TradeConsideration”] = pdAsksKraken[“price”] * pdAsksKraken[“QTY”]
pdAsksKraken.set_index(“BucketLabel”, inplace=True)
pdAsksKraken.columns = pd.MultiIndex.from_product([[“Kraken”],[“price”, “QTY”, “TradeConsideration”]])

pdTrade = pd.concat([pdBidsBinance, pdAsksKraken], axis=1)

pdTrade[“Summary”, “Spread”] = pdTrade.loc[:,(“Binance”,”price”)] – pdTrade.loc[:,(“Kraken”,”price”)]

>>> pdTrade.head()
Binance Kraken Summary
price QTY TradeConsideration price QTY TradeConsideration Spread
BucketLabel
40-41 40.380 100 4038 40.280 200 8056 0.1
40-41 40.381 200 8076.2 40.181 200 8036.2 0.2
40-41 40.510 300 12153 40.310 100 4031 0.2
41-42 41.300 150 6195 41.100 500 20550 0.2
41-42 41.510 100 4151 41.210 150 6181.5 0.3
>>>

filtering
https://chrisalbon.com/python/data_wrangling/pandas_selecting_rows_on_conditions/
dfSummary = dfSummary[(dfSummary[‘created’]>datetime(2018, 9, 1, 8, 15)) & (dfSummary[‘exchange’] == “binance”)]

Alternatively, you can use “query” (Can even chain queries)
dfSummary = dfSummary.query(“‘2018-09-01 08:15:00′<created<=’2018-09-01 09:15:00’ & exchange==’binance'”)

CAUTION: You have to Bracket each condition.

lookup:
https://stackoverflow.com/questions/52583677/pandas-dataframe-lookup

slice loc/iloc
dfSummary.loc[:,[‘symbol’,’exchange’,’spread’]].itertuples()

iterations
for item in dfSummary.loc[:,[‘symbol’,’exchange’,’spread’]].itertuples():
# [0] 2 int
# [1] ‘ETHBTC’ str
# [2] ‘binance’ str
# [3] -0.031599999999999996 float
itemIndex = item[0]
symbol = item[1]
exchange = item[2]
spread = item[3]

aggregate/group by
groups = dfSummary.groupby([“symbol”, “exchange”])

for group in groups:
groupKey = group[0]
symbol = groupKey[0]
exchange = groupKey[1]
maxBid = group[1][“bid”].max()

Single aggregate measure:
dfSummary = dfSummary.groupby([“symbol”, “exchange”])[“spread”].max()

>>> dfSummary
symbol exchange
ETHBTC binance -3.160000e-02
bitfinex -3.177800e-02
LTCBTC binance -8.300000e-02
bitfinex -8.570000e-02
XRPBTC binance -1.000000e-06
bitfinex -2.000000e-06
Name: spread, dtype: float64

https://www.shanelynn.ie/summarising-aggregation-and-grouping-data-in-python-pandas/
Multiple aggregate measures:
dfSummary = dfSummary.groupby([“symbol”, “exchange”]).agg({“spread”: “max”, “bid”:”min”})
dfSummary.columns = [ “maxSpread”, “minBid”]
>>> dfSummary
maxSpread minBid
symbol exchange
ETHBTC binance -3.160000e-02 0.003500
bitfinex -3.177800e-02 0.003522
LTCBTC binance -8.300000e-02 0.009000
bitfinex -8.570000e-02 0.009300
XRPBTC binance -1.000000e-06 0.000077
bitfinex -2.000000e-06 0.000083
>>>

Pivot: pd.agg vs pd.pivot_table?
They are same, see example below. “dtPivot2” is same as “dfGrouped”
http://pbpython.com/pandas-pivot-table-explained.html
https://stackoverflow.com/questions/34702815/pandas-group-by-and-pivot-table-difference

dfOriginal = pd.DataFrame({“a”: [1,2,3,1,2,3], “b”:[1,1,1,2,2,2], “c”:np.random.rand(6)})
dfPivot = pd.pivot_table(dfOriginal, index=[“a”], columns=[“b”], values=[“c”], aggfunc=np.sum)
dfPivot2 = pd.pivot_table(dfOriginal, index=[“a”,”b”], values=[“c”], aggfunc=np.sum)
dfGrouped = dfOriginal.groupby([‘a’,’b’])[‘c’].sum()

>>> dfOriginal
a b c
0 1 1 0.486374
1 2 1 0.020761
2 3 1 0.980307
3 1 2 0.105447
4 2 2 0.026814
5 3 2 0.546601
>>> dfPivot
c
b 1 2
a
1 0.486374 0.105447
2 0.020761 0.026814
3 0.980307 0.546601
>>> dfPivot2
c
a b
1 1 0.486374
2 0.105447
2 1 0.020761
2 0.026814
3 1 0.980307
2 0.546601
>>> dfGrouped
a b
1 1 0.486374
2 0.105447
2 1 0.020761
2 0.026814
3 1 0.980307
2 0.546601
Name: c, dtype: float64
>>>

if-then logic:
https://pandas.pydata.org/pandas-docs/stable/cookbook.html#cookbook-selection
In [1]: df = pd.DataFrame(
…: {‘AAA’ : [4,5,6,7], ‘BBB’ : [10,20,30,40],’CCC’ : [100,50,-30,-50]}); df
…:
Out[1]:
AAA BBB CCC
0 4 10 100
1 5 20 50
2 6 30 -30
3 7 40 -50

An if-then with assignment to 2 columns:

In [3]: df.loc[df.AAA >= 5,[‘BBB’,’CCC’]] = 555; df
Out[3]:
AAA BBB CCC
0 4 10 100
1 5 555 555
2 6 555 555
3 7 555 555

Better yet, use WHERE to accomplish this:
https://chrisalbon.com/python/data_wrangling/pandas_create_column_using_conditional/

pdPnl[“DTDRealizedPnl”] = pdPnl[“InceptionRealizedPnl”] – np.where(np.isnan(pdPnl[“InceptionRealizedPnl_tm1”]), 0, pdPnl[“InceptionRealizedPnl_tm1”])
pdPnl[“DTDUnrealizedPnl”] = pdPnl[“InceptionUnrealizedPnl”] – np.where(np.isnan(pdPnl[“InceptionUnrealizedPnl_tm1”]), 0, pdPnl[“InceptionUnrealizedPnl_tm1”])

masking:
c1 = dfSummary[‘exchange’] == “binance”
c2 = dfSummary[‘created’] >= datetime(2018, 9, 1, 8, 30)
criteria = c1 | c2
maskedSummary = dfSummary.mask(criteria)

>>> maskedSummary
symbol exchange balance … offer created spread
2 NaN NaN NaN … NaN NaT NaN
1 NaN NaN NaN … NaN NaT NaN
0 NaN NaN NaN … NaN NaT NaN
11 NaN NaN NaN … NaN NaT NaN
10 NaN NaN NaN … NaN NaT NaN
9 ETHBTC bitfinex 10.0 … 0.035300 2018-09-01 08:15:00 -0.031778
5 NaN NaN NaN … NaN NaT NaN
4 NaN NaN NaN … NaN NaT NaN
3 NaN NaN NaN … NaN NaT NaN
14 NaN NaN NaN … NaN NaT NaN
13 NaN NaN NaN … NaN NaT NaN
12 LTCBTC bitfinex 10.0 … 0.095000 2018-09-01 08:15:00 -0.085700
8 NaN NaN NaN … NaN NaT NaN
7 NaN NaN NaN … NaN NaT NaN
6 NaN NaN NaN … NaN NaT NaN
17 NaN NaN NaN … NaN NaT NaN
16 NaN NaN NaN … NaN NaT NaN
15 XRPBTC bitfinex 10.0 … 0.000085 2018-09-01 08:15:00 -0.000002

None vs np.nan handling: Be careful differences between numpy and pandas
import pandas as pd
import numpy as np

nums1 = [0, 10, 20, 30, 40, 50]
nums2 = [np.nan, 10, 20, 30, 40, 50]
nums3 = [None, 10, 20, 30, 40, 50]
nums4 = [num for num in nums1 if num!=0]

npNums1 = np.asarray(nums1, dtype = float)
npNums2 = np.asarray(nums2, dtype = float)
npNums3 = np.asarray(nums3, dtype = float)
npNums4 = np.asarray(nums4, dtype = float)

mean1 = npNums1.mean() # 25
mean2 = npNums2.mean() # nan (One item np.nan)
mean3 = npNums3.mean() # nan (One item is None)
mean4 = npNums4.mean() # 30 (Nan item filtered)

data = {
“nums1” : nums1,
“nums2”: nums2,
“nums3”: nums3
}

pdNums = pd.DataFrame.from_records(data)
mean1 = pdNums[“nums1”].mean() # 25
mean2 = pdNums[“nums2”].mean() # 30 (One item np.nan)
mean3 = pdNums[“nums3”].mean() # 30 (One item None)

print(“done”)

Multi-index
https://pandas.pydata.org/pandas-docs/stable/advanced.html

arrays = [[‘bar’, ‘bar’, ‘baz’, ‘baz’, ‘foo’, ‘foo’, ‘qux’, ‘qux’],[‘one’, ‘two’, ‘one’, ‘two’, ‘one’, ‘two’, ‘one’, ‘two’]]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=[‘first’, ‘second’])
s = pd.Series(np.random.randn(8), index=index)

>> s
first second
bar one 0.469112
two -0.282863
baz one -1.509059
two -1.135632
foo one 1.212112
two -0.173215
qux one 0.119209
two -1.044236
dtype: float64

formula fields
https://pythonprogramming.net/pandas-column-operations-calculations/
https://www.tutorialspoint.com/python_pandas/python_pandas_function_application.htm
Row function: https://stackoverflow.com/questions/13331698/how-to-apply-a-function-to-two-columns-of-pandas-dataframe
pipe (Table level): http://jose-coto.com/pipes-with-pandas
data = [[1,1,1],[2,2,2],[3,3,3]]
df = pd.DataFrame(data,columns=[‘col1′,’col2′,’col3’])
# x = 3, add 3 to each cell
df = df.pipe(lambda cellVal, x : cellVal + x, 3)

apply (Per row): http://jonathansoma.com/lede/foundations/classes/pandas%20columns%20and%20functions/apply-a-function-to-every-row-in-a-pandas-dataframe/
data = [[1,1,1],[2,2,2],[3,3,3]]
df = pd.DataFrame(data,columns=[‘col1′,’col2′,’col3’])
df = df.apply(lambda cellVal : cellVal * cellVal)
df = df[‘col3’].apply(lambda cellVal : cellVal * cellVal)
>>> df
col1 col2 col3
0 1 1 1
1 2 2 2
2 3 3 3
>>> df
col1 col2 col3
0 1 1 1
1 4 4 4
2 9 9 9
>>> df
0 1
1 16
2 81

def rowTransform(row):
return (row[“col1”] + row[“col3”]) *3

data = [[1,1,1],[2,2,2],[3,3,3]]
df = pd.DataFrame(data,columns=[‘col1′,’col2′,’col3’])
df[“col4”] = df[“col1”] + df[“col2”]
df[“col5”] = df.apply(rowTransform, axis=1)

dt = datetime(2018, 9, 1)

applymap (Per cell):
df = pd.DataFrame(np.random.randn(5,3),columns=[‘col1′,’col2′,’col3′])
df.applymap(lambda x:x*100)
print df.apply(np.mean)

Create Pandas DataFrame from Object
Step 1. ObjectUtil
import inspect

def objectPropertiesToDictionary(o, excludeSystemMembers = True):
result = {}
members = inspect.getmembers(o)
for member in members:
key = member[0]
value = member[1]
if excludeSystemMembers:
if “__” not in key:
result[key] = value
else:
result[key] = value
return result

Step 2. Convert list of objects to dictionary using python reflection (i.e. inspect package)
pdPnl = pd.DataFrame.from_records([ObjectUtil.objectPropertiesToDictionary(pnl) for pnl in profitLosses], columns=profitLosses[0].to_dict().keys())

From Pandas DataFrame to Dictionary:
https://stackoverflow.com/questions/26716616/convert-a-pandas-dataframe-to-a-dictionary

vlookup
https://stackoverflow.com/questions/25935431/pandas-lookup-based-on-value

Vectorization:
https://stackoverflow.com/questions/13893227/vectorized-look-up-of-values-in-pandas-dataframe
https://engineering.upside.com/a-beginners-guide-to-optimizing-pandas-code-for-speed-c09ef2c6a4d6
https://www.datascience.com/blog/straightening-loops-how-to-vectorize-data-aggregation-with-pandas-and-numpy/
https://realpython.com/numpy-array-programming/
https://stackoverflow.com/questions/52564186/python-pandas-lookup-another-row-calculated-field
https://stackoverflow.com/questions/52583677/pandas-dataframe-lookup

Memory usage optimization:
https://www.dataquest.io/blog/pandas-big-data/
http://pbpython.com/pandas_dtypes.html

BIG EXAMPLE:

@staticmethod
def calculateAnalyticsFromProfitLossSeries(profitLosses):
if not profitLosses:
return

# Why Pandas? It’s much quicker than looping in Python:
# https://engineering.upside.com/a-beginners-guide-to-optimizing-pandas-code-for-speed-c09ef2c6a4d6
# https://stackoverflow.com/questions/52564186/python-pandas-lookup-another-row-calculated-field
# Thoughts on performance code below:
# 1) ObjectUtil.objectPropertiesToDictionary uses reflection (i.e. inspect), it’s slow – but unless you hard code fields in ProfitLoss.to_dict(), you can’t get around this.
# 2) pdPnl.to_dict() in the end, and transformation back into list(ProfitLoss) also cost to using Pandas (as supposed to simply looping over list in Python)

# ProfitLoss fields:
# Id,InstrumentId,TestId,COB,Balance,MarkPrice,AverageCost,InceptionRealizedPnl,InceptionUnrealizedPnl,
# DTDRealizedPnl,DTDUnrealizedPnl,MTDRealizedPnl,MTDUnrealizedPnl,YTDRealizedPnl,YTDUnrealizedPnl
# SharpeRatio,MaxDrawDown,Created,Updated
pdPnl = pd.DataFrame.from_records([pnl.to_dict() for pnl in profitLosses])
pdPnl = pdPnl.merge(pdPnl, how=’left’, left_on=[“TM1”], right_on=[“COB”], suffixes = (”,’_tm1′))
pdPnl = pdPnl.merge(pdPnl, how=’left’, left_on=[“MonthStart”], right_on=[“COB”], suffixes = (”,’_MonthStart’))
pdPnl = pdPnl.merge(pdPnl, how=’left’, left_on=[“QuarterStart”], right_on=[“COB”], suffixes = (”,’_QuaterStart’))
pdPnl = pdPnl.merge(pdPnl, how=’left’, left_on=[“YearStart”], right_on=[“COB”], suffixes = (”,’_YearStart’))

# Vectorized
# Note, if for Day one trading where there’s no TM1 records, handle this appropriately.
# pdPnl[“InceptionRealizedPnl_tm1”] = np.where(np.isnan(pdPnl[“InceptionRealizedPnl_tm1”]), 0, pdPnl[“InceptionRealizedPnl_tm1”])
# pdPnl[“InceptionUnrealizedPnl_tm1”] = np.where(np.isnan(pdPnl[“InceptionUnrealizedPnl_tm1”]), 0, pdPnl[“InceptionUnrealizedPnl_tm1”])
# pdPnl[“DTDRealizedPnl”] = pdPnl[“InceptionRealizedPnl”] – pdPnl[“InceptionRealizedPnl_tm1”]
# pdPnl[“DTDUnrealizedPnl”] = pdPnl[“InceptionUnrealizedPnl”] – pdPnl[“InceptionUnrealizedPnl_tm1”]
pdPnl[“DTDRealizedPnl”] = pdPnl[“InceptionRealizedPnl”] – np.where(np.isnan(pdPnl[“InceptionRealizedPnl_tm1”]), 0, pdPnl[“InceptionRealizedPnl_tm1”])
pdPnl[“DTDUnrealizedPnl”] = pdPnl[“InceptionUnrealizedPnl”] – np.where(np.isnan(pdPnl[“InceptionUnrealizedPnl_tm1”]), 0, pdPnl[“InceptionUnrealizedPnl_tm1”])
pdPnl[“TotalDTD”] = pdPnl[“DTDRealizedPnl”] + pdPnl[“DTDUnrealizedPnl”]

# Annualizing DTD return: https://financetrain.com/how-to-calculate-annualized-returns/
pdPnl[“PercentDTDReturn”] = (pdPnl[“TotalDTD”]/pdPnl[“Balance”]) * 100
pdPnl[“PercentDTDReturn_Annualized”] = ((pdPnl[“PercentDTDReturn”]/100 + 1) ** 365 – 1) * 100

pdPnl[“MTDRealizedPnl”] = pdPnl[“InceptionRealizedPnl”] – pdPnl[“InceptionRealizedPnl_MonthStart”]
pdPnl[“MTDUnrealizedPnl”] = pdPnl[“InceptionUnrealizedPnl”] – pdPnl[“InceptionUnrealizedPnl_MonthStart”]
pdPnl[“YTDRealizedPnl”] = pdPnl[“InceptionRealizedPnl”] – pdPnl[“InceptionRealizedPnl_YearStart”]
pdPnl[“YTDUnrealizedPnl”] = pdPnl[“InceptionUnrealizedPnl”] – pdPnl[“InceptionUnrealizedPnl_YearStart”]

# Not yet vectorized
pdPnl[“SharpeRatio”] = pdPnl.apply(lambda rw : PnlCalculatorBase.computeSharpeRatio(pdPnl, rw[“COB”]), axis=1)
pdPnl[“MaxDrawDown”] = pdPnl.apply(lambda rw : PnlCalculatorBase.computeMaxDrawDown(pdPnl, rw[“COB”]), axis=1)

pnlDict = pdPnl.to_dict()
updatedProfitLosses = ProfitLoss.ProfitLoss.from_dict(pnlDict)
return updatedProfitLosses

Helpers:
@staticmethod
def computeSharpeRatio(pdPnl, cob):
val = None

# Please read below “BEWARE difference in how Numpy and Pandas handle Nan or None!”
# Pandas would filter Nan/None under PercentDTDReturn_Annualized automatically when compute “mean”, numpy doesn’t.
# Note also “std” depends on “mean”!
# numpy, on the other hand, dont filter automatically. Please refer to PnlCalculatorTests.testCalculateAnalyticsFromProfitLossSeries (Look bottom)
# pdPnl = pdPnl[(pdPnl[‘COB’]<=cob)]
pdPnl = pdPnl[(pdPnl[‘COB’]<=cob) & (pdPnl[“PercentDTDReturn_Annualized”] is not None) & (pdPnl[“PercentDTDReturn_Annualized”] != np.nan)]
pdPnl = pdPnl.loc[:,[“COB”, “PercentDTDReturn_Annualized”]]

# @todo, We don’t have risk free rate for Sharpe Ration calc. Here’s just total DTD avg return over standard deviation
# https://en.wikipedia.org/wiki/Sharpe_ratio
mean = pdPnl[“PercentDTDReturn_Annualized”].mean()
std = pdPnl[“PercentDTDReturn_Annualized”].std()
val = mean / std

return val

@staticmethod
def computeMaxDrawDown(pdPnl, cob):
val = None
pdPnl = pdPnl[(pdPnl[‘COB’]<=cob) & (pdPnl[“DTDRealizedPnl”]<0)]
val = pdPnl[“DTDRealizedPnl”].min()
return val

ObjectUtil.py
import sys
import logging
from random import *
from datetime import date
from datetime import datetime
from datetime import timedelta
import inspect

def objectPropertiesToDictionary(o, excludeSystemMembers = True):
result = {}
members = inspect.getmembers(o)
for member in members:
key = member[0]
value = member[1]
if excludeSystemMembers:
if “__” not in key:
result[key] = value
else:
result[key] = value
return result

ProfitLoss.py
import datetime
import time
import math

import pandas as pd
import numpy as np

from Util import ObjectUtil

class ProfitLoss(object):
def set(self, field, val):
setattr(self, field, val)

@staticmethod
def from_dict(dict):
if dict is None:
return None

profitLosses = []
for k, v in dict.items():
numPnl = len(v)
for i in range(0, numPnl):
pnl = ProfitLoss()
profitLosses.append(pnl)
break

for k, v in dict.items():
if k == “from_dict”:
break

i = 0
for val in v.values():
if isinstance(val, pd.Timestamp):
val = datetime.datetime(val.year, val.month, val.day)

val = None if val == np.nan else val

if isinstance(val, float) and math.isnan(val):
val = None

profitLosses[i].set(k, val)
i+=1

return profitLosses

Unit testing:

def testCalculateAnalyticsFromProfitLossSeries(self):
# Clean table firsst, or AssertEqual wont work
self.dao.deleteProfitLoss(self.source, self.symbol, cob = None, testId = None)

instrumentId = self.dao.getInstrumentId(self.source, self.symbol)

NUM_DAYS_HISTORY = 375 # over one year
NUM_DUMMY_TRADES = 5

history_end = datetime.today().replace(hour=0, minute=0, second=0, microsecond=0)
history_start = history_end – timedelta(days=NUM_DAYS_HISTORY)
num_days_history = (history_end – history_start).days
num_days_history += 1
history_dates = [history_end.date() – timedelta(days=x) for x in range(0, num_days_history)]
history_dates = list(reversed(history_dates))

markPrice = 0.035
i = 0
for histCob in history_dates:
# If no buy+sell, then you wouldnt get “RealizedPnl”
dummyTrades = MarketDataUtil.generateDummyTrades(NUM_DUMMY_TRADES, self.source, self.symbol, histCob, self.strategyId, self.instrumentId, self.tradeQuantities, self.tradePrices, testId = None, randomBuySell = True)

pnlTuple = PnlCalculatorBase.PnlCalculatorBase.replayPnlCore(histCob, dummyTrades, markPrice)
if pnlTuple is not None:
pnl = PnlCalculatorBase.PnlCalculatorBase.pnlTupleToProfitLoss(pnlTuple)
pnl.COB = histCob
pnl.InstrumentId = instrumentId
pnl.TestId = None

self.dao.persistProfitLoss(histCob, self.source, self.symbol, pnl, testId = None)
markPrice += 0.0001
i+=1

profitLosses = self.dao.fetchProfitLoss(self.source, self.symbol, testId = None)

self.assertEqual(len(profitLosses), NUM_DAYS_HISTORY +1)

updatedProfitLosses = PnlCalculatorBase.PnlCalculatorBase.calculateAnalyticsFromProfitLossSeries(profitLosses)
for updatedPnl in updatedProfitLosses:
self.dao.persistProfitLoss(updatedPnl.COB, self.source, self.symbol, updatedPnl, testId = None)

# manual checking – DTD/MTD/YTD pnl
lastPnl = updatedProfitLosses[len(updatedProfitLosses)-1]
TM1 = lastPnl.COB – timedelta(days=1)
MonthStart = lastPnl.COB.replace(day=1)
YearStart = datetime(lastPnl.COB.year, 1, 1)

tm1Pnl = list(filter(lambda pnl : pnl.COB == TM1, updatedProfitLosses))[0]
monthStartPnl = list(filter(lambda pnl : pnl.COB == MonthStart, updatedProfitLosses))[0]
yearStartPnl = list(filter(lambda pnl : pnl.COB == YearStart, updatedProfitLosses))[0]

expectedDTDRealizedPnl = lastPnl.InceptionRealizedPnl – tm1Pnl.InceptionRealizedPnl
expectedDTDUnrealizedPnl = lastPnl.InceptionUnrealizedPnl – tm1Pnl.InceptionUnrealizedPnl
expectedMTDRealizedPnl = lastPnl.InceptionRealizedPnl – monthStartPnl.InceptionRealizedPnl
expectedMTDUnrealizedPnl = lastPnl.InceptionUnrealizedPnl – monthStartPnl.InceptionUnrealizedPnl
expectedYTDRealizedPnl = lastPnl.InceptionRealizedPnl – yearStartPnl.InceptionRealizedPnl
expectedYTDUnrealizedPnl = lastPnl.InceptionUnrealizedPnl – yearStartPnl.InceptionUnrealizedPnl

self.assertAlmostEqual(lastPnl.DTDRealizedPnl, expectedDTDRealizedPnl, 8)
self.assertAlmostEqual(lastPnl.DTDUnrealizedPnl, expectedDTDUnrealizedPnl, 8)
self.assertAlmostEqual(lastPnl.MTDRealizedPnl, expectedMTDRealizedPnl, 8)
self.assertAlmostEqual(lastPnl.MTDUnrealizedPnl, expectedMTDUnrealizedPnl, 8)
self.assertAlmostEqual(lastPnl.YTDRealizedPnl, expectedYTDRealizedPnl, 8)
self.assertAlmostEqual(lastPnl.YTDUnrealizedPnl, expectedYTDUnrealizedPnl, 8)

# manual checking – MaxDrawdown
maxDrawdown = 0
for updatedPnl in updatedProfitLosses:
if updatedPnl.DTDRealizedPnl is not None and updatedPnl.DTDRealizedPnl < maxDrawdown:
maxDrawdown = updatedPnl.DTDRealizedPnl
self.assertAlmostEqual(lastPnl.MaxDrawDown, maxDrawdown, 8)

# manual checking – Sharpe
DTDTotalPnlEntries = list(map(lambda pnl : (pnl.DTDRealizedPnl if pnl.DTDRealizedPnl is not None else 0) + (pnl.DTDUnrealizedPnl if pnl.DTDUnrealizedPnl is not None else 0), updatedProfitLosses))
npDTDTotalPnlEntries = np.asarray(DTDTotalPnlEntries, dtype = float)
mean = npDTDTotalPnlEntries.mean()
std = npDTDTotalPnlEntries.std()
expectedSharpeRatio = mean / std

self.assertAlmostEqual(lastPnl.SharpeRatio, expectedSharpeRatio, 7)

Advertisements

Interview Prep: Python core

Python Interview Prep
https://gridwizard.wordpress.com/2018/10/09/interview-prep-python-core/

Python v3 vs v2? V3 released 2008, it was breaking change.
http://sebastianraschka.com/Articles/2014_python_2_3_key_diff.html

Python vs C# vs Java?
https://docs.google.com/spreadsheets/d/1paTdgWUtA0uyXpPwoT3ZKIq58U0GVWQQCXQWTABh_c8/edit?usp=sharing

Python performance:
https://www.quora.com/Is-it-possible-that-Python-can-run-faster-than-C-Why
Pypy vs CPython: https://pypy.org/
Vectorization:
https://engineering.upside.com/a-beginners-guide-to-optimizing-pandas-code-for-speed-c09ef2c6a4d6?gi=1d738b7536d4
https://realpython.com/numpy-array-programming/
Ctype: http://www.maxburstein.com/blog/speeding-up-your-python-code/

What’s “__init__.py”? Designate a folder is a python “package”. “__init__.py” can be an empty file: https://stackoverflow.com/questions/448271/what-is-init-py-for

__init__ vs __new__?
__new__ is called BEFORE instantiation and does not take “self” as parameter
__init__ is called AFTER instantiation, thus can take “self” as parameter
If __new__ method return something else other than instance of class, then instances __init__ method will not be invoked!

http://howto.lintel.in/python-__new__-magic-method-explained/
https://spyhce.com/blog/understanding-new-and-init
https://www.quora.com/What-difference-among-methods-__new__-__init__-and-__call__-in-Python-metaclass

What’s __name__?
https://www.geeksforgeeks.org/what-does-the-if-__name__-__main__-do/
print “Always executed”

if __name__ == “__main__”:
print “Executed when invoked directly”
else:
print “Executed when imported”

Constructor and Destructor?
class TestClass:
def __init__(self):
print (“constructor”)
def __del__(self):
print (“destructor”)

if __name__ == “__main__”:
obj = TestClass()
del obj
https://helloacm.com/constructor-and-destructor-in-python-classes/

“del” vs setting to “None”? There’s no difference.
Note, however, exceptions from “__del__” are ignored by python.
https://python-3-patterns-idioms-test.readthedocs.io/en/latest/InitializationAndCleanup.html
https://www.holger-peters.de/an-interesting-fact-about-the-python-garbage-collector.html

Also, in event of circular reference, __del__ would not even be invoked. Example, below, after setting both “a” and “b” to None, __del__ are not called.
>>> a = A()
>>> b = A()
>>> a.other = b
>>> b.other = a
>>> a = None
>>> b = None

However, since A implements __del__, Python refuses to clean them, arguing that it cannot not tell, which __del__ method to call first. Instead of doing the wrong thing (invoking them in the wrong sequence), Python decides to rather do nothing

What’s “__call__”? http://hplgit.github.io/primer.html/doc/pub/class/._class-solarized003.html
__call__ is like a function call operator. Once you implement __call__ in your Class, you can invoke the Class instance as a function call. For example:
class Foo:
def __call__(self, a, b):
pass
f = Foo()
f(‘hello’,’world’) # This invokes the __call__

Another example,
class Derivative(object):
def __init__(self, f, h=1E-5):
self.f = f
self.h = float(h)

def __call__(self, x):
f, h = self.f, self.h # make short forms
return (f(x+h) – f(x))/h
>>> from math import sin, cos, pi
>>> df = Derivative(sin)
>>> x = pi
>>> df(x)
-1.000000082740371

Inheritance? super? See next example on raising exception. GraphicCommandException extends GraphicExceptionBase
http://amyboyle.ninja/Python-Inheritance

Raise exception?
Example,
Exceptions classes:
class GraphicExceptionBase(ValueError):
def __init__(self, commandString):
self.commandString = commandString

from mygraphics import GraphicExceptionBase as gex
class GraphicCommandException(gex.GraphicExceptionBase):
# For things like invalid command line format

def __init__(self, commandString):
super(GraphicCommandException, self).__init__(commandString)

“GraphicCommandParser.py” under “mygraphics” package (For demo, this is not a class):
import re
from mygraphics import GraphicCommandBase as gcmd
from mygraphics import GraphicCommandException as gex

def parse(commandString):
_REEXPR_COMMAND_CREATE_CANVAS = “^(C){1,1} (\d+) (\d+)”
_REEXPR_COMMAND_DRAW_LINE = “^(L){1,1} (\d+) (\d+) (\d+) (\d+)”
_REEXPR_COMMAND_DRAW_RECT = “^(R){1,1} (\d+) (\d+) (\d+) (\d+)”
_REEXPR_COMMAND_BUCKET_FILL = “^(B){1,1} (\d+) (\d+) (.){1}”
_REEXPR_COMMAND_QUIT = “^(Q){1,1}”

cmd = gcmd.GraphicCommandBase(commandString)

expr = _REEXPR_COMMAND_CREATE_CANVAS
match = re.match(expr, commandString, re.I)
if match:
cmd.command = match.group(1)
cmd.w = int(match.group(2))
cmd.h = int(match.group(3))
return cmd
… more code …

“GraphicRenderEngine.py” under “mygraphics” package (For demo, this is a class. Note all methods, first parameter is “self”):
class GraphicRenderEngine(object):
_DOT = “x”
_HORIZONTAL_BOUND = “-”
_VERTICAL_BOUND = “|”

def __init__(self):
self.created = datetime.datetime.now()

… more code …

def draw(self, graphicCommand):
… more code …

Then from main.py
import mygraphics
from mygraphics import GraphicRenderEngine as gengine
from mygraphics import GraphicCommandParser as gparser <– note, this is not a class, “parse” is just a function defined in it
from mygraphics import GraphicCommandBase as gcmd
from mygraphics import GraphicCommandException as gcmdex
from mygraphics import GraphicRenderException as grndex
from mygraphics import GraphicRenderOutofBoundException as grndobex

engine = gengine.GraphicRenderEngine() <– GraphicRenderEngine is an instance of a class
keepGoing = True
while keepGoing:
try:
commandString = input(“Enter draw command, C = Create Canvas, L = draw Line, R = draw Rectangle, B = Bucket Fill, Q = Quit: “)
graphicCommand = gparser.parse(commandString)
if graphicCommand.command!=”Q”:
engine.draw(graphicCommand)
else:
keepGoing = False
except gcmdex.GraphicCommandException:
gparser.help()
except grndobex.GraphicRenderOutofBoundException:
print (“Graphic rendering exception: OutofBound error”)
except grndex.GraphicRenderException:
print (“Graphic rendering exception”)
else:
pass
finally:
pass

Static member and methods?
Static member:
class Example:
staticVariable = 5 # Access through class

print Example.staticVariable # prints 5

# Access through an instance
instance = Example()
print instance.staticVariable # still 5

# Change within an instance
instance.staticVariable = 6
print instance.staticVariable # 6
print Example.staticVariable # 5

# Change through
class Example.staticVariable = 7
print instance.staticVariable # still 6
print Example.staticVariable # now 7

Static method (Decorate method with @staticmethod, note no “self” parameter in method signature): https://docs.python.org/2/library/functions.html#staticmethod
class MyClass(object):
@staticmethod
def the_static_method(x):
print x
To invoke it,
MyClass.the_static_method(2)

Warning, this won’t work as __myName__ isnt static property:
class MyClass(object):
__myName__ = “John”

@staticmethod
def static_print(x):
print(x + ” ” + __myName__) <– This will blow up!

MyClass.static_print(“hello”)

To fix this:
class MyClass(object):
__myName__ = “John”

@staticmethod
def static_print(x):
print(x + ” ” + MyClass.__myName__)

http://radek.io/2011/07/21/static-variables-and-methods-in-python/

@property: https://www.quora.com/What-are-the-purposes-of-staticmethod-and-property-decorators

Python has event/delegates like C#?
Example 1 (Least preferable if multiple handlers): Just pass your handler function
https://stackoverflow.com/questions/2184263/eventhandler-event-delegate-based-programming-in-python-any-example-would-appr
def do_work_and_notify(done_handler):
# Do some work here…
done_handler()

def send_email_on_completion():
email_send(‘joe@example.com’, ‘you are done’)

do_work_and_notify(send_email_on_completion)

Example 2: Use decorator (Different wrapped functions has different number of arguments. You need different wrapper/decorators)
https://programmingideaswithjake.wordpress.com/2015/05/23/python-decorator-for-simplifying-delegate-pattern/

Example 3. Use a publisher/subscriber library (But is that little heavy you need a library for simple things like this?)
pypubsub
http://pypubsub.readthedocs.io/en/stable/usage/usage_basic.html#quick-start

pydispatcher
http://pydispatcher.sourceforge.net/

Python threading
Basic:
https://www.tutorialspoint.com/python/python_multithreading.htm
lock, Rlock, event and Conditions?
https://hackernoon.com/synchronization-primitives-in-python-564f89fee732
Mutex in multiprocessing?
https://stackoverflow.com/questions/28664720/how-to-create-global-lock-semaphore-with-multiprocessing-pool-in-python
Centralizing lock.acquire and avoid timeout=-1 (avoid deadlocks)
https://stackoverflow.com/questions/52645864/python-centralizing-lock-acquire

concurrent.futures.ProcessPoolExecutor:
https://docs.python.org/dev/library/concurrent.futures.html
http://masnun.com/2016/03/29/python-a-quick-introduction-to-the-concurrent-futures-module.html
“multiprocessing.Pool” instead of ProcessPoolExecutor to avoid “Queue objects should only be shared between processes through inheritance”
https://stackoverflow.com/questions/9908781/sharing-a-result-queue-among-several-processes
https://stackoverflow.com/questions/8804830/python-multiprocessing-pickling-error
https://docs.python.org/3/library/pickle.html#what-can-be-pickled-and-unpickled
“pathos” Why use pathos? We want parallelism, however, concurrent.futures.ProcessPoolExecutor or multiprocessing.Pool which imposes constraints for example thread func need be static method or top level function outside your class.
https://pypi.org/project/pathos/
https://kampta.github.io/Parallel-Processing-in-Python/ <– Must read!
https://stackoverflow.com/questions/51345135/python-pathos-error-errorrootclass-runtimeerror

Example 1. Basic
import threading
import time

class myThread (threading.Thread):
def __init__(self, threadID, name, counter):
threading.Thread.__init__(self)
self.threadID = threadID
self.name = name
self.counter = counter
def run(self):
print “Starting ” + self.name
# Get lock to synchronize threads
threadLock.acquire()
print_time(self.name, self.counter, 3)
# Free lock to release next thread
threadLock.release()

def print_time(threadName, delay, counter):
while counter:
time.sleep(delay)
print “%s: %s” % (threadName, time.ctime(time.time()))
counter -= 1

threadLock = threading.Lock()
threads = []

# Create new threads
thread1 = myThread(1, “Thread-1”, 1)
thread2 = myThread(2, “Thread-2”, 2)

# Start new Threads
thread1.start()
thread2.start()

# Add threads to thread list
threads.append(thread1)
threads.append(thread2)

# Wait for all threads to complete
for t in threads:
t.join()
print “Exiting Main Thread”

Example 2. concurrent.futures import ProcessPoolExecutor
from concurrent.futures import ProcessPoolExecutor
from time import sleep

def return_after_5_secs(message):
sleep(5)
return message

pool = ProcessPoolExecutor(3)

future = pool.submit(return_after_5_secs, (“hello”))
print(future.done())
sleep(5)
print(future.done())
print(“Result: ” + future.result())

Example 3. Use multiprocessing.Pool instead of ProcessPoolExecutor to avoid “Queue objects should only be shared between processes through inheritance”
https://stackoverflow.com/questions/9908781/sharing-a-result-queue-among-several-processes
https://stackoverflow.com/questions/8804830/python-multiprocessing-pickling-error
https://docs.python.org/3/library/pickle.html#what-can-be-pickled-and-unpickled
import multiprocessing
def worker(name, que):
que.put(“%d is done” % name)

if __name__ == ‘__main__’:
pool = multiprocessing.Pool(processes=3)
m = multiprocessing.Manager()
q = m.Queue()
workers = pool.apply_async(worker, (33, q))

abc and Abstract classes:
from abc import ABC, abstractmethod

class AbstractClassExample(ABC):

def __init__(self, value):
self.value = value
super().__init__()

@abstractmethod
def do_something(self):
pass

class DoAdd42(AbstractClassExample):
def do_something(self):
return self.value + 42

class DoMul42(AbstractClassExample):

def do_something(self):
return self.value * 42

x = DoAdd42(10)
y = DoMul42(10)
print(x.do_something())
print(y.do_something())

https://www.python-course.eu/python3_abstract_classes.php

Reflection? inspect import getmembers/isclass/isabstract
import inspect
import example
for name, data in inspect.getmembers(example, inspect.isclass):
print ‘%s :’ % name, repr(data)

https://pymotw.com/2/inspect/
https://bip.weizmann.ac.il/course/python/PyMOTW/PyMOTW/docs/inspect/index.html
http://archive.oreilly.com/oreillyschool/courses/Python4/Python4-09.html

isinstance of?
class Foo:
a = 5

fooInstance = Foo()
print(isinstance(fooInstance, Foo)) –> True
print(isinstance(fooInstance, (list, tuple))) –> False
print(isinstance(fooInstance, (list, tuple, Foo))) –> True

https://www.programiz.com/python-programming/methods/built-in/isinstance

Python decorators:

From Util.py:
import logging
import datetime

def timing_log_noarg(func):
def wrapper(arg1):
start = datetime.datetime.now()
result = func(arg1)
finish = datetime.datetime.now()
msg = str(func) + ” invoked. start: ” + str(start) + “, finish: ” + str(finish)
logging.info(msg)
return result
return wrapper
def timing_log_2arg(func):
def wrapper(arg1, arg2, arg3):
start = datetime.datetime.now()
result = func(arg1, arg2, arg3)
finish = datetime.datetime.now()
msg = str(func) + ” invoked. start: ” + str(start) + “, finish: ” + str(finish) + ” ” + arg2 + ” ” + arg3
logging.info(msg)
return result
return wrapper

To use it:
from Util import Util as util

class TimeConsumingServiceService(object):
… more code …
def __init__(self, homeDir):
… more code

@util.timing_log_noarg <– well, one arg, that’s “self”
def reload(self):
self.someLengthyOperation()

@util.timing_log_2arg
def load(self, param1, param2): <– well, total three arguments, including “self”
self.someOtherLengthyOperation()

pass by value vs pass by reference? — it’s “Passing reference by value”
Arguments are passed neither by value and nor by reference in Python
– instead they are passed by assignment.
The parameter passed in is actually a reference to an object, as opposed to reference to a fixed memory location but the reference is passed by value.

https://www.quora.com/Are-arguments-passed-by-value-or-by-reference-in-Python

Garbage Collection
https://rushter.com/blog/python-garbage-collector/
https://rushter.com/blog/python-memory-managment/
https://www.quora.com/How-does-garbage-collection-in-Python-work-What-are-the-pros-and-cons
https://pythoninternal.wordpress.com/2014/08/04/the-garbage-collector/

Two mechanisms:
a. Reference Counting (Basic, not-optional)
The reference count increases:
assignment operator
argument passing
appending an object to a list
If reference counting field reaches zero, CPython automatically calls the object-specific deallocation function.

b. Reference Cycle (Optional, when call gc.collect – for collection classes only, exclude tuples)
CPython has an algorithm to detect those reference cycles, implemented in the function collect. First of all, it only focuses on container objects (i.e. objects that can contain a reference to one or more objects): arrays, dictionaries, user class instances, etc. As an extra optimization, the GC ignores tuples containing only immutable types (int, strings, … or tuples containing only immutable types)

What’s gc_ref? Each Python object has a field – *gc_ref*, which is (I believe) set to NULL for non-container objects. For container objects it is set equal to the number of non container objects that reference it.
Any container object with a *gc_ref* count greater than 1 has references that are not container objects. So they are REACHEABLE and are removed from consideration of being unreachable memory islands. Any container object reachable by an object known to be reachable does not need to be freed.
The remaining container objects are UN-REACHEABLE (except by each other) and should be freed.
https://stackoverflow.com/questions/10962393/how-does-pythons-garbage-collector-detect-circular-references/10962484

What’s “weak reference”?
https://mindtrove.info/python-weak-references/
With “strong reference”, object is not deallocated until BOTH references “a” and “b” are “del”
>>> a = Foo()
>>> b = a
>>> del a
>>> del b
destroyed
With “weak reference”, object is deallocated as soon as ONE of two references is “del”.
import weakref
>>> a = Foo()
created
>>> b = weakref.ref(a)

Note, Also, in event of circular reference, __del__ would not even be invoked. Example, below, after setting both “a” and “b” to None, __del__ are not called.
https://www.holger-peters.de/an-interesting-fact-about-the-python-garbage-collector.html
>>> a = A()
>>> b = A()
>>> a.other = b
>>> b.other = a
>>> a = None
>>> b = None

__iter__?
The __iter__ method is what makes an object iterable. Behind the scenes, the iter function calls __iter__ method on the given object.
The return value of __iter__ is an iterator. It should have a next method and raise StopIteration when there are no more elements.

class yrange:
def __init__(self, n):
self.i = 0
self.n = n

def __iter__(self):
return self

def next(self):
if self.i < self.n:
i = self.i
self.i += 1
return i
else:
raise StopIteration()

https://www.programiz.com/python-programming/iterator
https://anandology.com/python-practice-book/iterators.html
The Python yield keyword explained

Generator/Yield return: Lazy evaluation of a list. For example range(10000) is not generator, it returns a list of ten thousand integers all in memory. xrange(10000) however, return one integer a time.
for i in range(0, 20):
for i in xrange(0, 20):

https://wiki.python.org/moin/Generators

# Using the generator pattern (an iterable)
class firstn(object):
def __init__(self, n):
self.n = n
self.num, self.nums = 0, []

def __iter__(self):
return self

# Python 3 compatibility
def __next__(self):
return self.next()

def next(self):
if self.num < self.n:
cur, self.num = self.num, self.num+1
return cur
else:
raise StopIteration()

sum_of_first_n = sum(firstn(1000000))

# a generator that yields items instead of returning a list
def firstn(n):
num = 0
while num < n:
yield num
num += 1

sum_of_first_n = sum(firstn(1000000))

lambda map/filter/reduce: https://www.python-course.eu/python3_lambda.php
biggerThanThreshold = lambda x : x > 5
filtered = filter(biggerThanThreshold, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
transformed = map(sqSomeNum, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

sorted – “sorted(list)” vs “list.sort”??! sorted() returns a new sorted list, leaving the original list unaffected. list.sort() sorts the list in-place, mutating the list indices, and returns None (like all in-place operations).
https://docs.python.org/2/howto/sorting.html
https://stackoverflow.com/questions/22442378/what-is-the-difference-between-sortedlist-vs-list-sort

>>> sorted([5, 2, 3, 1, 4])
[1, 2, 3, 4, 5]

>>> a = [5, 2, 3, 1, 4]
>>> a.sort()
>>> a
[1, 2, 3, 4, 5]

>>> sorted({1: ‘D’, 2: ‘B’, 3: ‘B’, 4: ‘E’, 5: ‘A’})
[1, 2, 3, 4, 5]

>>> sorted(“This is a test string from Andrew”.split(), key=str.lower)
[‘a’, ‘Andrew’, ‘from’, ‘is’, ‘string’, ‘test’, ‘This’]

>>> student_tuples = [
… (‘john’, ‘A’, 15),
… (‘jane’, ‘B’, 12),
… (‘dave’, ‘B’, 10),
… ]
>>> sorted(student_tuples, key=lambda student: student[2]) # sort by age
[(‘dave’, ‘B’, 10), (‘jane’, ‘B’, 12), (‘john’, ‘A’, 15)]

>>> class Student:
… def __init__(self, name, grade, age):
… self.name = name
… self.grade = grade
… self.age = age
… def __repr__(self):
… return repr((self.name, self.grade, self.age))
>>> student_objects = [
… Student(‘john’, ‘A’, 15),
… Student(‘jane’, ‘B’, 12),
… Student(‘dave’, ‘B’, 10),
… ]
>>> sorted(student_objects, key=lambda student: student.age) # sort by age
[(‘dave’, ‘B’, 10), (‘jane’, ‘B’, 12), (‘john’, ‘A’, 15)]
Multiline Lambda? Not supported, just use “def” write a normal function.

List Comprehension: http://www.pythonforbeginners.com/basics/list-comprehensions-in-python
Old way:
biggerThanThreshold = lambda x : x > 5
squares = []
for x in range(10):
if biggerThanThreshold(x):
squares.append(x**2)
FP way:
biggerThanThreshold = lambda x : x > 5
squares = [x**2 for x in range(10) if biggerThanThreshold(x)]
Here x**2 is map expression which transform x to sq(x)
biggerThanThreshold is your filter expression.

Python and LINQ? List Comprehsion + generators http://mark-dot-net.blogspot.com/2014/03/python-equivalents-of-linq-methods.html

HttpRequest and Web scrapping – use urllib for HttpRequest, then BeautifulSoup to parse HTML
class DataLoaderBase(object):
_SOURCE_NAME = “Undefined”

def load():
data = {}
return data

import urllib.request
from Data import DataLoaderBase as dl
class WebScrapperBase(dl.DataLoaderBase):
def load(self, url):
request = urllib.request.Request(url)
response = urllib.request.urlopen(request)
rawData = response.read()
return rawData

JSON config loading?
Example JSON config:
{
“countries”: [ “germany”, “unitedkingdom”, “france”, “china”, “unitedstates”, “japan” ],
“webScrapperConfig”: {
“baseUrl”: “https://somewhere.com&#8221;,
“queryVar”: “$COUNTRY$/forecast”
}
}
Now the loading part:
import json
import os

class SomeService(object):
def __init__(self, homeDir):
path = homeDir + “\\Config\\someConfig.json”
with open(path) as json_data_file:
data = json.load(json_data_file)
self.countries = data[“countries”]
self.webScrapperConfig.baseUrl = data[“webScrapperConfig”][“baseUrl”]
self.webScrapperConfig.queryVar = data[“webScrapperConfig”][“queryVar”]

Unit Tests:
Mocks vs Fakes vs Stubs
https://www.telerik.com/blogs/fakes-stubs-and-mocks
https://stackoverflow.com/questions/346372/whats-the-difference-between-faking-mocking-and-stubbing
https://blog.pragmatists.com/test-doubles-fakes-mocks-and-stubs-1a7491dfa3da
https://martinfowler.com/articles/mocksArentStubs.html
https://www.c-sharpcorner.com/UploadFile/dacca2/understand-stub-mock-and-fake-in-unit-testing/

From Martin Fowler,
In automated testing it is common to use objects that look and behave like their production equivalents, but are actually simplified. This reduces complexity, allows to verify code independently from the rest of the system and sometimes it is even necessary to execute self validating tests at all. A Test Double is a generic term used for these objects.

a. Fake objects actually have working implementations, but usually take some shortcut which makes them not suitable for production (an in memory database is a good example).
b. Spies are stubs that also record some information based on how they were called. One form of this might be an email service that records how many messages it was sent.
c. Stubs provide canned answers to calls made during the test, usually not responding at all to anything outside what’s programmed in for the test.
(i.e. pre-programmed return values – i.e. OUTPUT)
d. Mocks are what we are talking about here: objects pre-programmed with expectations which form a specification of the calls they are expected to receive.
(i.e. Just check if mocks can process expected INPUT)

Example,
Stubbing:
– Example 1: Given fixed set of entitlements/permissions, access control service should return expected list of views that’s visible to user. If no entitlement, list of permissible/visible list should be empty. We stub entitlement service that feeds our system (entitlement service will be stubbed to return pre-canned entitlements).
– Example 2. Different views takes same set of raw data from DAO but aggregate/grouped/filters/calculated differently. We stub DAO’s (DAO will be stubbed to return pre-canned business data) and isolate test on calculation logic.

Mocking:
Submission of adjustments – SubmissionService.submit() will a few things: Post adjustment to database and dispatch notification emails. we mock both dao and email service, with unit test only to validate that SubmissionService.submit successfully calls both dao and email service. Downstream DAOs and email services are mocked, unit test only validates that these downstream services are invoked by service layer as expected. Nothing saved to database and no email sent from service layer unit tests.

[See above Example 5 and 6 for stubbing] – upstream DAOs and entitlement services are stubbed to return pre-canned business data and entitlements that service layer uses.
[See above Example 7 for mocking] – downstream DAOs and email services are mocked, unit test only validates that these downstream services are invoked by service layer as expected. Nothing saved to database and no email sent from service layer unit tests.

Example,
from unittest.mock import patch, Mock
import unittest

import os

from Data import WebScrapperBase as webscrapper
from Data import FeedLoader as fd
# Test command parsing only
class TradingEconomicsScrapperBaseTests(unittest.TestCase):

def setUp(self):
someConfig = {}
someConfig [“baseUrl”] = “https://somewhere.com&#8221;
someConfig [“countryProjectionUrl”] = “$COUNTRY$/forecast”
self.feedLoaderWrapper = Data.FeedLoaderWrapper(someConfig)

testDir = os.path.dirname(__file__)
path = testDir + “\\raw.HTML”
with open(path) as f:
data = f.read()
self.rawHTML = data

def tearDown(self):
pass

@patch(‘Data.FeedLoaderWrapper’)
def testParseCountryProjection_MockedHTML(self, mockFeedLoaderWrapper ):
mockFeedLoaderWrapper.fetchHTML.return_value = self.rawHTML
data = self. feedLoaderWrapper .parseCountryProjection(“united-states”, mockFeedLoaderWrapper.fetchHTML())
self.assertIsNotNone(data)
self.assertEqual(data.Country, “united-states”)
self.assertEqual(data.DataSource, self.feedLoader._SOURCE_NAME)
self.assertEqual(len(data.Data), 13)

def testParseCountryProjection_NoMock(self):
data = self.feedLoaderWrapper.load(self.tradingEconLoader.DATA_TYPE_COUNTRY_PROJECTION, “united-states”) <– internally, Data. FeedLoaderWrapper calls FeedLoaderWrapper.fetchHTML (Mock patched this method in prev test case, but not here), then FeedLoaderWrapper.parseCountryProjection
self.assertIsNotNone(data)
self.assertEqual(data.Country, “united-states”)
self.assertEqual(data.DataSource, self.tradingEconLoader._SOURCE_NAME)
self.assertEqual(len(data.Data), 13)
Python and spring: https://docs.spring.io/spring-python/1.2.x/sphinx/html/

Understanding UnboundLocalError in Python
https://eli.thegreenplace.net/2011/05/15/understanding-unboundlocalerror-in-python

SimpleFlowDiagramLib: Simple C# library to Serialize Graph to Xml (And Vice Versa)

In continuation from previous article https://gridwizard.wordpress.com/2015/03/25/simple-c-library-to-render-graph-to-flowchart/, we’d explore “SimpleFlowDiagramLib” capability to serialize Graph to Xml (And Vice versa). And, why do we want to do that? For example, wire down a graph to/from Web Services consumed by Java client for example.

Again,
Source code: https://github.com/gridwizard/SimpleFlowDiagram

using System;
using System.IO;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

using SimpleFlowDiagramLib;

namespace DemoSimpleFlowDiagramLib
{
    class Program
    {
        
static void Main(string[] args) { // STEP 1. Generate the nodes, render it to XML format (GraphXml) IList[Node] Nodes = new List(); GenerateNodes(Nodes, 3, 3); SimpleFlowDiagramGeneratorCompatibleGraphRender XmlConverter = new SimpleFlowDiagramGeneratorCompatibleGraphRender(); string GraphXml = XmlConverter.RenderGraph(Nodes); string GraphXmlFilePath = "GraphXml.xml"; System.IO.File.WriteAllText(GraphXmlFilePath, GraphXml); Console.WriteLine("Finished writing graph to XML format compatible with SimpleFlowDiagramGenerator.exe"); // STEP 2. Read back from GraphXml MemoryStream Stream = new MemoryStream(); StreamWriter writer = new StreamWriter(Stream); writer.Write(GraphXml); writer.Flush(); Stream.Position = 0; System.Xml.XmlReader XmlRdr = System.Xml.XmlReader.Create(Stream); IList[Node] ResurrectedNodes = XmlConverter.ReadGraphXml(XmlRdr); DiagramCanvasEngine.GenerateLayout( ResurrectedNodes, Node.DEFAULT_NODE_HEIGHT / 2, CanvasDefinition.LayoutDirection.LeftToRight ); // STEP 3. Render the nodes to HTML file - this is exactly what "SimpleFlowDiagramGenerator.exe" does. It reads input xml file which defines the nodes. Then render flowchart to HTML file. IGraphRender Html5Render = new Html5GraphRender(); Html5Render.RenderGraph(Canvas, ResurrectedNodes, DisplaySettings, "Flowchart.html"); Console.WriteLine("Finished render to HTML5 to Flowchart.html"); return; }
public static void GenerateNodes( IList Nodes, int NumRootNodes, int MaxTreeDepth ) { Node RootNode; for (int i = 0; i < NumRootNodes; i++) { RootNode = new Node(); RootNode.NodeHeader = "Root_" + i; RootNode.NodeDetail = "Some detail ..."; RootNode.NodeHyperLink = "http://somewhere.com"; RootNode.Depth = 0; Nodes.Add(RootNode); GenerateSingleGraph(Nodes, RootNode, MaxTreeDepth); } return; } public static void GenerateSingleGraph( IList Nodes, Node RootNode, int MaxTreeDepth ) { int CurrentDepth = 0; RecursiveGenerateGraph(Nodes, RootNode, MaxTreeDepth, ref CurrentDepth); return; } public static void RecursiveGenerateGraph( IList Nodes, Node Node, int MaxTreeDepth, ref int CurrentDepth ) { CurrentDepth++; Random rnd = new Random(DateTime.Now.Second); if (CurrentDepth < MaxTreeDepth) { int NumChildren = rnd.Next(5); for (int i = 0; i < NumChildren; i++) { Node Child = new Node(); Child.NodeHeader = Node.NodeHeader + "." + "Child_Level" + CurrentDepth + "_Num" + i; Child.NodeDetail = "Some detail ..."; Child.NodeHyperLink = "http://somewhere.com"; Child.Depth = CurrentDepth; Child.ParentNodes.Add(Node); Node.ChildNodes.Add(Child); Nodes.Add(Child); int CopyCurrentDepeth = CurrentDepth; RecursiveGenerateGraph(Nodes, Child, MaxTreeDepth, ref CopyCurrentDepeth); } } return; } } }

Happy Coding!

You may also want to check out how to convert DataTable to/from HTML Table – https://gridwizard.wordpress.com/2014/12/17/datatable-to-from-html-table

Simple C# Library to render graph to Flowchart

Simple C# Library to render graph to Flowchart – currently, only render to HTML5 (Intention to support Visio in future).

You can render your graph horizontally (Left to Right), or vertically (Top down) – This is, however, Device Independent, and agnostic of whether you want to render to HTML5, Winform, WPF… The library automatically center parent nodes and calculate Node.x/y and overall canvas size (In case if you want to render it to surfaces other than HTML5 – for example, Visio, WPF, Winform…etc).

Top-to-Bottom
SimpleFlowDiagramLib.Demo.TopToBottom

Note that we didn’t scale text to fit the boxes – this is because automatic scaling would make text so small you can’t read it, thereby making things even worse. This said, if it’s absolutely necessary, Also, notice parent nodes are horizontally-center aligned.

Left-to-Right
SimpleFlowDiagramLib.Demo.LeftToRight

Parent nodes are vertically-center aligned

Source code: https://github.com/gridwizard/SimpleFlowDiagram

Usage:
Can’t be simpler to use, bulk of code in bottom create dummy data for illustration purpose.
a. Node.x/y calculated after call to DiagramCanvasEngine.GenerateLayout (You can use “Nodes” to render on other non-HTML5 surfaces)
b. Html5Render.RenderGraph renders to HTML5

using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

using SimpleFlowDiagramLib;

namespace DemoSimpleFlowDiagramLib
{
    class Program
    {
        
static void Main(string[] args) { IList Nodes = new List(); GenerateNodes(Nodes, 3, 3); Console.WriteLine("Finished generating dummy nodes, # Nodes: " + Nodes.Count); // Node.x/Node.y + Canvas size calculated (You can adjust programmatically as you see fit afterwards) CanvasDefinition Canvas = DiagramCanvasEngine.GenerateLayout( Nodes, Node.DEFAULT_NODE_HEIGHT / 2, CanvasDefinition.LayoutDirection.LeftToRight ); Console.WriteLine("Finished calculating layout"); GraphDisplayFormatSettings DisplaySettings = new GraphDisplayFormatSettings(); // You can override display font, fore/back color ...etc DisplaySettings.NodeHeaderSettings.ForeColorName = "Black"; DisplaySettings.NodeDetailSettings.ForeColorName = "Black"; IGraphRender Html5Render = new Html5GraphRender(); Html5Render.RenderGraph(Canvas, Nodes, DisplaySettings, "Flowchart.html"); Console.WriteLine("Finished render to HTML5"); return; }
public static void GenerateNodes( IList Nodes, int NumRootNodes, int MaxTreeDepth ) { Node RootNode; for (int i = 0; i < NumRootNodes; i++) { RootNode = new Node(); RootNode.NodeHeader = "Root_" + i; RootNode.NodeDetail = "Some detail ..."; Nodes.Add(RootNode); GenerateSingleGraph(Nodes, RootNode, MaxTreeDepth); } return; } public static void GenerateSingleGraph( IList Nodes, Node RootNode, int MaxTreeDepth ) { int CurrentDepth = 0; RecursiveGenerateGraph(Nodes, RootNode, MaxTreeDepth, ref CurrentDepth); return; } public static void RecursiveGenerateGraph( IList Nodes, Node Node, int MaxTreeDepth, ref int CurrentDepth ) { CurrentDepth++; Random rnd = new Random(DateTime.Now.Second); if (CurrentDepth < MaxTreeDepth) { int NumChildren = rnd.Next(5); for (int i = 0; i < NumChildren; i++) { Node Child = new Node(); Child.NodeHeader = Node.NodeHeader + "." + "Child_Level" + CurrentDepth + "_Num" + i; Child.NodeDetail = "Some detail ..."; Child.ParentNodes.Add(Node); Node.ChildNodes.Add(Child); Nodes.Add(Child); int CopyCurrentDepeth = CurrentDepth; RecursiveGenerateGraph(Nodes, Child, MaxTreeDepth, ref CopyCurrentDepeth); } } return; } } }

Happy Coding!

Next, SimpleFlowDiagramLib – LIBRARY TO SERIALIZE GRAPH TO XML (AND VICE VERSA) – https://gridwizard.wordpress.com/2015/03/31/simpleflowdiagramlib-simple-c-library-to-serialize-graph-to-xml-and-vice-versa

Java and dotnet Interop

This article is about Java-dotnet Interop. We’ll explore what options we have for different scenario where interop is required.

First, when we say “Java-dotnet Interop”, there are two possibilities:

1. Java -to- dotnet communications

2. dotnet -to-Java communications

Secondly, we assume, if you’re developing in Java, you’d run it on Linux (Or simply put, if your application written in Java, why would it run on Windows?)

Given above, what are our options?

 

1. Socket

Anand Manikiam has written a piece on this subject, http://www.codeproject.com/Articles/11602/Java-and-Net-interop-using-Sockets

The pros for this approach are:

a. No middle-ware

b. Fast

The cons are:

a. Resiliency

b. Casting complex object/classes from byte[]?

c. Message security? Encryption? Anti-tampering? DOS? If not implemented this be Intranet application only.

 

2. Web Services

I’ve written an article of consuming Java-ws from dotnet:

https://gridwizard.wordpress.com/2014/12/26/java-ws-and-dotnet-interop-example/

You will also find plenty of discussions on consuming WCF-from-Java:

http://www.codeproject.com/Articles/777036/Consuming-WCF-Service-in-Java-Client

The pros for this approach are:

a. No middle-ware

b. Higher level of compatibility with code coded in more languages (C++/SOAP, Python, R …etc)

The cons are:

a. Less fast than socket

b. Resiliency

c. Message security? Encryption? Anti-tampering? DOS? If not implemented this be Intranet application only.

d. Slower than Socket! (Web Services overhead)

 

3. Message Bus

RabbitMQ (http://www.rabbitmq.com) is all about Messaging. If you’re developing real time applications, RabbitMQ offers high performance battle tested communication platforms and it as an API for just about any language on the planet. C++, dotnet, Java, Perl, Python…

Pros are:

a. Resiliency – producers and consumers can die and crash at any moment.

b. Performance

cons:

a. You need install Middleware, and if you’re a software vendor, you’d need bundle installation of RabbitMQ with your application

 

4. Commercial Tools

Depending on what you’re building, if what you’re trying to build is a computing grid, then there are commercial tools which allows you to run jobs on basically any platform, coded in any language.

Appliedalgo.com for instances supports:

a. Scheduling, conditional job chaining and Workload Automation

b. Grid Computing – nodes/slaves on any platform/language

c. Automatic persistence of run history, parameters, input and results

(Even configure cell level validations by “IsNumber”, or use of user specified Regular Expression)

d. GUI for you to track run parameters, input and results

However, such tools inevitably introduces execution overhead. So depending on whether you’re …

a. Executing high number of light weight jobs –> Probably should not use any tool besides a Message bus such as RabbitMQ

b. Executing medium number of medium weight jobs –> Best application of Workload Automation Data Platforms such as Appliedalgo.com

c. Executing low number of heavy weight jobs –> Best custom coded, persistence via BCP (There’s no other way for million rows or #bigdata processing)

 
But this would not be a viable option for instance if you’re building a hotel booking system with web tier built in ASP.NET and backend in Java with Java-ws

Happy Coding!

 

 

Install Tomcat on Fedora VM for @msdev

This is continuation from Previous article on how to create a Fedora VM, with SFTP installed/configured (https://gridwizard.wordpress.com/2014/12/26/setup-linux-vm-with-sftp-guide-for-msdev). This article is for @msdev who’re unfamiliar with Linux environment.

STEP 1. Download Java SDK (it includes JRE). Then install Java on linux box under path /usr/java/jdk and /usr/java/jre
You may download from your Windows development box, then follow these instructions (https://gridwizard.wordpress.com/2014/12/26/setup-linux-vm-with-sftp-guide-for-msdev) to upload package to your Linux server box via SFTP.
@msdev, if you’re not familiar with Linux commands, the following commands may be handy.
cd /usr
mkdir java
cd /home/johndoe
mv jdk-8u25-linux-i586.gz /usr/java
tar -xvf jdk-8u25-linux-i586.gz
Also in case if you want to delete something:
rmdir ./SomeDirectory (SomeDirectory must be empty)
rm -rf ./SomeDirectory    (non-empty directory)
rm SomeFile (Delete a file)
Also to set environment variables for java Temporarily:
export PATH=
/usr/java/jdk1.8.0_25/bin:$PATH

To set the env var permanently, add the same line to ~/.bashrc.

export PATH=/usr/java/jdk1.8.0_25/bin:$PATH

STEP 2. Download and install Apache Tomcat under path /usr/apache/tomcat
Download Tar.gz package from here: http://tomcat.apache.org/download-70.cgi

To configure JAVA_HOME and CATALINA_HOME, place a setenv.sh in the the /usr/apache/tomcat/apache-tomcat-7.0.57/bin directory with

JAVA_HOME=/usr/java/jdk1.8.0_25/
JRE_HOME=/usr/java/jdk1.8.0_25/jre
CATALINA_HOME=/usr/apache/tomcat/apache-tomcat-7.0.57

http://stackoverflow.com/questions/1698913/how-to-set-java-home-in-tomcat-config

STEP 3. Start Tomcat

Navigate to /usr/apache/tomcat/apache-tomcat-7.0.57/bin

startup.sh

From your Windows box hosting the VM, you can access the default webpage hosted by Tomcat. Test from Browser:

http://192.168.56.102:8080/

8080 is Tomcat default port, which can be changed from server.xml in conf folder. http://www.mkyong.com/tomcat/how-to-change-tomcat-default-port

Next, we’ll discuss how to develop a simple Java-WS (Web Service), put it on Fedora VM (on VirtualBox), the consume it from a dotnet Console Application on Windows box.

Happy Coding!

Java-WS and dotnet Interop Example

This article will show how to create a simple Java Web Service hosted in Tomcat (running on Windows), and consume the Java-WS from dotnet.

STEP 1. Download Tomcat

http://tomcat.apache.org/tomcat-7.0-doc/setup.html#Windows

REF: https://www.youtube.com/watch?v=bP66y108xAc

STEP 2. Set Environment Variables from Computer (Right click) \ Properties \ Advanced system Settings:

JAVA_HOME                        C:\Program Files\Java\jdk1.7.0_45

CATALINA_HOME            C:\apache-tomcat-6.0.43

Open *new* command prompt (existing command prompt wouldn’t see the new additions), verify settings correct:

echo %JAVA_HOME%

                                echo %CATALINA_HOME%

STEP 3. Start Tomcat

From command prompt, navigate to C:\apache-tomcat-6.0.43\bin

Then, from command prompt type: startup.bat

STEP 4. Verify it’s running

From browser:

http://localhost:8080

                http://localhost:8080/examples/servlets/

From command prompt:

netstat -a

(You should notice port 8080 is taken)

 

STEP 5. Now, from NetBeans IDE, New Project

JavaWS-dotnet-interop.NetBeanIDE.NewProj

JavaWS-dotnet-interop.NetBeanIDE.NewProj2

 

STEP 6. Add new Web Service

Simply enter Web Service Name “HellowWorldWebService” and Package “com.helloworld”, keep everything else default:

JavaWS-dotnet-interop.NetBeanIDE.NewWs

JavaWS-dotnet-interop.NetBeanIDE.NewWs2

 

STEP 7. Test

Build the project:

JavaWS-dotnet-interop.NetBeanIDE.Build1

Then Deploy (To Tomcat),

JavaWS-dotnet-interop.NetBeanIDE.Deploy

And finally, “Test Web Service” (By default when you create the web service, it’d create a “Hello” method automatically)

JavaWS-dotnet-interop.NetBeanIDE.TestWebService

From your browser:

JavaWS-dotnet-interop.NetBeanIDE.TestWebService2

You’d later from Visual Studio add Service Reference to http://localhost:8080/HelloJavaWs/HellowWorldWebService?wsdl

 

STEP 8. At this point, with Tomcat running (And NetBeans IDE closed), you can create a new Console Application project from Visual Studio, then add Service Reference to http://localhost:8080/HelloJavaWs/HellowWorldWebService?wsdl

JavaWS-dotnet-interop.VS.Step1

JavaWS-dotnet-interop.VS.Step2

 

Now, run on debugger:

JavaWS-dotnet-interop.VS.Step3

So, you think we’re done? No… NetBeans has a bug which will haunt you when you try deploy to your production server!

http://forums.netbeans.org/topic7615.html

How to deploy Tomcat and the Java Web Service on a Linux box? First you may want to try how to setup a Fedora VM with FTP: https://gridwizard.wordpress.com/2014/12/26/setup-linux-vm-with-sftp-guide-for-msdev

Then, you’d need install Tomcat & Java on your Linux box: https://gridwizard.wordpress.com/2014/12/28/install-tomcat-on-fedora-vm-for-msdev

Then the rest of the steps are trivial – http://stackoverflow.com/questions/2511547/how-to-manually-deploy-a-web-service-on-tomcat-6

I recap the steps here from the above Stackoverflow post, thanks to Thanh Phong (Just in case if someone deletes it!)

1. create the following dir c:\java\src\ws

2. create thew following file c:\java\src\ws\Adder.java

// c:\java\src\ws\Adder.java
package ws;
import javax.jws.WebService;

@WebService
public class Adder {
 public double add( double value1, double value2 ) {
  return value1 + value2;
 }
}

3. standing at c:\java\src\ execute

c:\java\src> javac ws\Adder.java

file c:\java\src\ws\Adder.class will be generated

4. create the following directory structure with the following files

c:\tomcat6\webapps\adder_ws

META-INF
  context.xml
WEB-INF
  classes
    ws
      Adder.class
  lib
    activation.jar
    webservices-api.jar
    webservices-extra.jar
    webservices-extra-api.jar
    webservices-rt.jar
    webservices-tools.jar
  sun-jaxws.xml
  web.xml

@msdev, you can download the jar files here 
http://www.java2s.com/Code/Jar/CatalogJar.htm
http://download.java.net/maven/1/javax.activation/jars/

5. copy compiled file

copy c:\java\src\ws\Adder.class c:\tomcat6\webapps\adder_ws\WEB-INF\classes\ws\Adder.class

6. c:\tomcat6\webapps\adder_ws\META-INF\context.xml

<?xml version="1.0" encoding="UTF-8"?>
<Context antiJARLocking="true" path="/adder_ws"/>

7. c:\tomcat6\webapps\adder_ws\WEB-INF\web.xml

<?xml version="1.0" encoding="UTF-8"?>
<web-app version="2.5" xmlns="http://java.sun.com/xml/ns/javaee" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://java.sun.com/xml/ns/javaee http://java.sun.com/xml/ns/javaee/web-app_2_5.xsd">
    <listener>
        <listener-class>com.sun.xml.ws.transport.http.servlet.WSServletContextListener</listener-class>
    </listener>
    <servlet>
        <servlet-name>Adder</servlet-name>
        <servlet-class>com.sun.xml.ws.transport.http.servlet.WSServlet</servlet-class>
        <load-on-startup>1</load-on-startup>
    </servlet>
    <servlet-mapping>
        <servlet-name>Adder</servlet-name>
        <url-pattern>/add</url-pattern>
    </servlet-mapping>
<!-- not needed
    <session-config>
        <session-timeout>
            30
        </session-timeout>
    </session-config>
    <welcome-file-list>
        <welcome-file>index.jsp</welcome-file>
    </welcome-file-list>
-->
</web-app>

8. Config WEB-INF\sun-jaxws.xml

file : c:\tomcat6\webapps\adder_ws\WEB-INF\sun-jaxws.xml

<?xml version="1.0" encoding="UTF-8"?>
<endpoints version="2.0" xmlns="http://java.sun.com/xml/ns/jax-ws/ri/runtime">
  <endpoint implementation="ws.Adder" name="Adder" url-pattern="/add"/>
</endpoints>

9. Copy libraries

files at c:\tomcat6\webapps\adder_ws\WEB-INF\lib

copy netbeans files from

[netbeans dir]\enterprise\modules\ext\metro\*.*

and

[netbeans dir]\ide\modules\ext\jaxb\activation.jar

10. restart apache

Shutdown : c:\tomcat6\bin\shutdown.bat

Startup : c:\tomcat6\bin\startup.bat

11. Test

Open a web browser and go to http://localhost:8080/adder_ws/add?wsdl you can also use a tool like soapui (http://www.soapui.org/) to test the web service

Happy Coding!