Enterprise Architecture

Architecture for Financial Applications – Rethinking Object Oriented Paradigm

Multi-Tier Application Architecture isn’t a new concept to anyone who has done any sort of enterprise development to the point nobody ask about this during technical interviews anymore.
At minimum, there’re always three basic tiers whether you’re building web application or client server application:
a. Presentation
b. Application – Business logic
c. Data source
For financial applications, where do you put your calculations? That’s a matter of debate (but it shouldn’t).
I can’t tell you how many times I have seen applications gets built using the standard cookie cutter: DAO to load records from database into entities in Application tier, fashionably complying to every standard practice using OR mapper such as hibernate, Repositories and DAO with Spring and IoC container for every object. I’m not sure if people do this because they feel the need to comply with Golden OO design paradigm. Or too afraid to deviate from “Best Practice”. This pattern simply don’t apply to all scenario. Not just “edge cases”.
For starter,
a. What kind of calculation are you running? Derivatives risk and pricing? VAR? Stressing? Time series analysis, covariance calculations, factor construction in portfolio optimization? Theses are computationally intensive. Quant libraries generally in c++, Python,  Java. And typically load distributed and calculations, thus, done in “Application Tiers”.
Or are you running simple pnl updates, aggregate position level pnl/return/risk to book level? funding or trade allocation? Reconciliation? These are simple mathematics (no quant Library) : key concatenation/matching, simple aggregations. This brings us to next point.
b. Data volume, performance, and proximity to data source. If your calculations sources or operate on a lot of data, unless nature of calculation complex. Or that it requires quant libraries. There’s probably very little reason why these should be done in Application Tier. Databases are extremely good at keys concatenation /matching, aggregation and simple arithmetic. If data already in database, you’re processing more than a few thousand rows, performance gains can be realised by running these calculations in database/SQL BEFORE you move data from database tier to application tier. Even if you have multiple data sources (even message bus or non-SQL sources) : One can always build simple data feeds, consolidate into single database. Downside to this approach is, SQL not portable across different database vendors.
c. Support
If calculations done in SQL, this means production trouble shooting can be done without a debugger. What this further means is that Level One support don’t need bother developers. More importantly, fixes can be simple SQL patches – no need recompile and redeploy, which adds to the risk.
d. Simplicity, Agile, and Maintainability
Let’s keep things simple. You’re adding complexity everytime you add a bean, entity, DAO – especially you have to maintain these mundane pieces of code manually. Imagine the amount of work you need to do if new fields need to be added? Worse if additions are to be performed on multiple entities/DAO (Although we no longer need updates hbm files anymore thanks very much). Speaking of which, #dotnet is full of Agile magic. Linq-to-SQL [completely] automates the process of generating entities & DAO: https://www.youtube.com/watch?v=bsncc8dYIgY
You don’t even need hand code the entities (Unlike Java/Hibernate). Linq-to-SQL, however, does not support updates. However, one can simply delete old entity/DAO files, then drag-drop to re-create the entities/DAO in seconds. This said, the achilles heel of Linq-to-SQL is: It supports Microsoft SQL Server only! (Ouch!!!)
#msdev are blessed with Linq-to-SQL. It doesn’t follow that Linq-to-SQL is the shortest path for all scenario. Most financial applications deals with data in Tabula format – #yesSQL!. And most financial calculations are simple additions/subtract and multiplications which can be done in SQL layer. Another Agile Magic that #dotnet has (And Java has not) is DataTable – there’s nothing simpler than loading a DataTable with a DataAdapter and bind to Grid on front-end. Example: http://www.dotnetperls.com/sqldataadapter
Many times, there’s just no real need for creation of entities just to comply with OO paradigm. Just because you don’t “bean” your reports and put them in nice little coffins mean it’s Dirty coding. In fact, by having fewer source files, your code is cleaner.
In fact, emphasis of Agile development should not be about Kanban board, stand-up meetings and micro-managing developers by incompetent or non-technical Project Managers (Titles should be more appropriately renamed to Project Administrative Assistance). Instead, emphasis and focus should be exploration and utilization of available Open Source and Commercial tooling that actually do actual work for you, and do things as simply as possible leveraging such technologies.
Don’t Over-Engineer and Happy Coding!

Oh… apparently last time someone think about this was back in 2010: http://blog.jot.fm/2010/08/26/ten-things-i-hate-about-object-oriented-programming/

Reverse Engineering Data-flow in a Data Platform with Thousands of tables?

Ever been tasked to inherit, or migrate an existing (legacy) Data Platform? There are numerous Open Source (Hadoop/Sqoop, Schedulix …) and Commercial tools (BMC Control-M, Appliedalgo.com, stonebranch …etc) which can help you operate the Data Platform – typically gives you multitude of platform services:
• Job Scheduling
• ETL
• Load Balancing & Grid Computing
• Data Dictionary / Catalogue
• Execution tracking (track/persist job parameters & output)

Typical large scale application has hundreds to thousands of input data files, queries and intermediate/output data tables.
DataPlatform_DataflowMapping

Mentioned Open Source and Commercial packages facilitates operation of Data Platform. Tools which helps generates ERD diagrams typically relies on PK-FK relationships being defined – but of course more often than not this is not the case. Example? Here’s how you can Drag-drop tables in a Microsoft SQL Server onto a Canvas to create ERD – https://www.youtube.com/watch?v=BNx1TYItQn4
DataPlatform_DataflowMapping_Who

If you’re tasked to inherit or migrate such Data Platform, first order of business is to manually map out data flow. Why? To put in a fix, or enhancement, you’d first need to understand data flow before any work can commence.

And, that’s a very expensive, time consuming proposition.

There’re different ways to tackle the problem. Here’s one (Not-so-Smart) option:
• Manually review database queries and stored procedures
• Manually review application source code and extract from it embedded SQL statements

Adding to complexity,
• Dynamic SQL
• Object Relational Mapper (ORM)

The more practical approach would be to employ a SQL Profiler. Capture SQL Statements executed, and trace the flow manually. Even then, this typically requires experienced developers to get the job done (Which isn’t helping when you want to keep the cost down & delivery lead time as short as possible). As such undertaking is inherently risky – as you can’t really estimate how long it’ll take to map out the flow until you do.

There’s one command line utility MsSqlDataflowMapper (Free) from appliedalgo.com which can help. Basically, MsSqlDataflowMapper takes SQL Profiler trace file as input (xml), analyze captured SQL Statements. Look for INSERT’s and UPDATE’s. Then automatically dump data flow to a flow chart (HTML 5). Behind the scene, it uses SimpleFlowDiagramLib from Gridwizard to plot the flow chart – https://gridwizard.wordpress.com/2015/03/31/simpleflowdiagramlib-simple-c-library-to-serialize-graph-to-xml-and-vice-versa/

Limitation?
• Microsoft SQL Server only (To get around this, you can build your own tool capture SQL statements against Oracle/Sybase/MySQL…etc, analyze it, look up INSERT’s and UPDATE’s, then route result to SimpleFlowDiagramLib to plot the flow chart)
MsSqlDataflowMapper operates on table-level. It identify source/destination tables in process of mapping out the flow. However, it doesn’t provide field-level source information (a particular field in output table comes from which source tables?)
• The tool does NOT automatically *group* related tables into different Regions in diagram (This requires a lot more Intelligence in construction of the tool – as we all know, parsing SQL is actually a very complex task! https://gridwizard.wordpress.com/2014/11/08/looking-for-a-sql-parser-for-c-dotnet). At the end of the day, it still takes skilled developer to Make Sense of the flow.

Happy Coding!

Java and dotnet Interop

This article is about Java-dotnet Interop. We’ll explore what options we have for different scenario where interop is required.

First, when we say “Java-dotnet Interop”, there are two possibilities:

1. Java -to- dotnet communications

2. dotnet -to-Java communications

Secondly, we assume, if you’re developing in Java, you’d run it on Linux (Or simply put, if your application written in Java, why would it run on Windows?)

Given above, what are our options?

 

1. Socket

Anand Manikiam has written a piece on this subject, http://www.codeproject.com/Articles/11602/Java-and-Net-interop-using-Sockets

The pros for this approach are:

a. No middle-ware

b. Fast

The cons are:

a. Resiliency

b. Casting complex object/classes from byte[]?

c. Message security? Encryption? Anti-tampering? DOS? If not implemented this be Intranet application only.

 

2. Web Services

I’ve written an article of consuming Java-ws from dotnet:

https://gridwizard.wordpress.com/2014/12/26/java-ws-and-dotnet-interop-example/

You will also find plenty of discussions on consuming WCF-from-Java:

http://www.codeproject.com/Articles/777036/Consuming-WCF-Service-in-Java-Client

The pros for this approach are:

a. No middle-ware

b. Higher level of compatibility with code coded in more languages (C++/SOAP, Python, R …etc)

The cons are:

a. Less fast than socket

b. Resiliency

c. Message security? Encryption? Anti-tampering? DOS? If not implemented this be Intranet application only.

d. Slower than Socket! (Web Services overhead)

 

3. Message Bus

RabbitMQ (http://www.rabbitmq.com) is all about Messaging. If you’re developing real time applications, RabbitMQ offers high performance battle tested communication platforms and it as an API for just about any language on the planet. C++, dotnet, Java, Perl, Python…

Pros are:

a. Resiliency – producers and consumers can die and crash at any moment.

b. Performance

cons:

a. You need install Middleware, and if you’re a software vendor, you’d need bundle installation of RabbitMQ with your application

 

4. Commercial Tools

Depending on what you’re building, if what you’re trying to build is a computing grid, then there are commercial tools which allows you to run jobs on basically any platform, coded in any language.

Appliedalgo.com for instances supports:

a. Scheduling, conditional job chaining and Workload Automation

b. Grid Computing – nodes/slaves on any platform/language

c. Automatic persistence of run history, parameters, input and results

(Even configure cell level validations by “IsNumber”, or use of user specified Regular Expression)

d. GUI for you to track run parameters, input and results

However, such tools inevitably introduces execution overhead. So depending on whether you’re …

a. Executing high number of light weight jobs –> Probably should not use any tool besides a Message bus such as RabbitMQ

b. Executing medium number of medium weight jobs –> Best application of Workload Automation Data Platforms such as Appliedalgo.com

c. Executing low number of heavy weight jobs –> Best custom coded, persistence via BCP (There’s no other way for million rows or #bigdata processing)

 
But this would not be a viable option for instance if you’re building a hotel booking system with web tier built in ASP.NET and backend in Java with Java-ws

Happy Coding!

 

 

Multi tiering for Financial Applications

Multi-Tier Application Architecture isn’t a new concept to anyone who has done any sort of enterprise development to the point nobody ask about this during technical interviews anymore.
At minimum, there’re always three basic tiers whether you’re building web application or client server application:
a. Presentation
b. Application – Business logic
c. Data source
For financial applications, where do you put your calculations? That’s a matter of debate (but it shouldn’t).
I can’t tell you how many times I have seen applications gets built using the standard cookie cutter: DAO to load records from database into entities in Application tier, fashionably complying to every standard practice using OR mapper such as hibernate, Repositories and DAO with Spring. I’m not sure if people do this to learn the different technologies? To comply with Golden OO design paradigm. Or too afraid to deviate from “Best Practice”. This pattern simply don’t apply to all scenario. Not just “edge cases”.
For starter,
a. What kind of calculation are you running? Derivatives risk and pricing? VAR? Stressing? Time series analysis, covariance calculations, factor construction in portfolio optimization? Theses are computationally intensive. Quant libraries generally in c++, Python,  Java. And typically load distributed and calculations, thus, done in “Application Tiers”.
Or are you running simple pnl updates, aggregate position level pnl/return/risk to book level? funding or trade allocation? Reconciliation? These are simple mathematics (no quant Library) : key concatenation/matching, simple aggregations. This brings us to next point.
b. Data volume, performance, and proximity to data source. If your calculations sources or operate on a lot of data, unless nature of calculation complex. Or that it requires quant libraries. There’s probably very little reason why these should be done in Application Tier. Databases are extremely good at keys concatenation /matching, aggregation and simple arithmetic. If data already in database, you’re processing more than a few thousand rows, performance gains can be realised by running these calculations in database/SQL. Even if you have multiple data sources (even message bus or non-SQL sources) : One can always build simple data feeds, consolidate into single database. Downside to this approach is, SQL not portable across different database vendors.
c. Support
If calculations done in SQL, this means production trouble shooting can be done without a debugger. What this further means is that Level One support don’t need bother developers. More importantly, fixes can be simple SQL patches – no need recompile and redeploy, which adds to the risk.
d. Simplicity, Agile, and Maintainability
Let’s keep things simple. You’re adding complexity if you are doing simple maths in application tier, everytime you add a bean, entity, dao.
Happy Coding!

Web vs Mobile vs Client-Server application?

What are your options when developing a new application? Both js and WPF/Winform support MVVM, real time data binding. You can build modern UI with MVVM framework in both js and WPF/Winform. You can now debug/compile/unit test both javascript with gruntjs as you can do with dotnet (from Visual Studio, Cruise Control/TFS/NUnit) or Java. How’d you choose?

Client Server

Pros

  • Unified, stable API. API/framework evolves (and become obsolete) less quickly in comparison to js frameworks. For instance, client can be built in dotnet/WPF or Winform for instance, while backend can be implemented in dotnet, Java, or C++ on Windows or Linux box.
  • Better organisation of code from maintenance/support perspective – dotnet/Java supports inheritance, Javascript doesn’t.
  • One less layer to code/develop, more Agile – In comparison to web based application, data need be wired to clientside/js, that’s additional dev lead-time
  • Not having to test against different browser
  • Access to Operating System API whereas js is sandboxed by browser. For example access to camera/mic.
  • Security – proprietary logic can be obfuscated with anti-tampering technologies.

Cons

  • Tied down to particular platform, or operating system

Web

Pros

  • Accessible from different platform/OS – mobile/tablet/workstations
  • Responsive  design – layout adaptable to different client devices, for example Bootstrap

Cons

  • Lack of unified standards, for example you’d need to choose between angular or nodejs (MVVM)
  • Testing against different browsers (At least Chrome/Firefox/IE) & clients (Desktop, Android devices & iPhone with various screen sizes)
  • Framework/libraries evolves very quickly, version change may be risky unless thoroughly tested
  • Browser updates may break your application (For example http vs https, security around websocket calls)
  • Very often there’ll be back-end development in dotnet, Java or C++ and unavoidably, you’d need to hire back-end developers
  • Everyone can scrutinize, hack, attack, *borrow* your Client side javascript

It’s quite obvious that for most applications, there will always be a back-end built in dotnet, Java or C++. The question really is, what would you choose to build the UI. There’s a lot of subjective preferences around this but personally, here’s my humble opinion on the subject.

1. Backend

Backend in dotnet or Java. Unless you’re writing device drivers, root kits, or highly optimized algo trading pipelines with DMA (Direct Market Access), there’s little reasons to do in C++. The small additions of complexity in syntax and availability of talents adds to the cost of choosing C++ for backend implementation. This add, you should be aware that you can implement *Backend* with node.js now. You can open database connection from js on node.js, you can even interface with RabbitMQ from js on node.js. Some market data vendors even begin to expose API for node.js – https://github.com/bloomberg/blpapi-node

So, truth is, you can implement backend in js – question is would you? (I wouldn’t – I prefer a proper OO language with availability of commercial & open source utility libraries. Math.NET for example.)

2. UI

Is this an end user applications? What’s intended Target Audience? Do you want to pay for the additional cost of developing UI in js?

  • If this is UI for a back-end server (And it has a UI), for example, workload automation/computing grid or EDI messaging infrastruture (more in manufacturing), then building UI in WPF/Winform/Swing/AWT/SWT saves you development cost and headaches in testing against different platforms and is thus the preferred option. Bear in mind while most workstations are Windows, on server side you can have Linux or Windows boxes. Further, your staff/clients may still prefer a responsive web interface so they can check application status/health on the run from mobile devices. One example is batch processing/grid system from Appliedalgo (https://appliedalgo.com). They have a client on Windows (https://www.youtube.com/watch?v=QdM9S0Bsc0A), also a responsive web interface (https://www.youtube.com/watch?v=dtYUona1omo)
  • If this is an enterprise application which requires complex data visualization/manipulation which can only be done on Workstation (not Mobile devices), for example, real time financial derivative risk management system (Example, Derivation.co.uk), then building UI in WPF/Winform (Most Workstations are Windows) saves you development cost and headaches in testing against different platforms and is thus the preferred option.
  • If this is an enterprise application for sales staff always on the run, example SAP REX (http://www.sap.com/pc/tech/mobile/software/industry-apps/retail-visit-management-app/index.html), then building in web or mobile application allows your application accessible from mobile devices, with responsive layout, and is the preferred option. If, for instance, compliance requires that all presentation (Investment Proposition to be specific) to clients be recorded (both audio/video), then you may even need build UI native Andriod or IOS. One such example CRM module from Axisoft.net.
  • If this is a web portal, for example, hotel booking system (hoteljen.com) or e-commerce portal (Amazon.com), then building in web application allows your application accessible to widest array of audience with responsive layout, and is thus preferred option.