Dotnet developers more in demand than Java?

Dotnet developers more in demand for Java? More often than not Java has a higher following being Open Source and all. Given M$ failures to deliver anything useful (practical) … Silverlight remember? Don’t mention smartphones and disappearance of start menu from toolbar all hurts career prospect of dotnet developers. However, appears after all M$ bashing (it’s still *cool* beat up M$? Apple is cool, M$ is Evil?), tides is turning. While dotnet has fewer developers and following, it has disproportionate number of jobs, particularly in more lucrative *Investment* banking industry. It’s important to work only for firms that pay isn’t it? Or you might as well work for Chinese/Indian/Korean firms which emphasize “A Lot of Result Oriented” but not so much on “Reward Oriented” culture.


(Yes there’ll always be higher/very high number of lowly paid, Shabby Web dev (blow) jobs with *limited budget*, always Limited, very Limited – employers always aim to offshore to lowest paid, “Yes-Sir” locations where dev say “Yes Sir” every morning to their boss)

There’s no time for complacent for M$ or dotnet developers however. There’re many competing technologies out there. Anything prefix “apache” for example “avro” can threaten anything home grown M$ WCF for example. Yes from this time onwards it’s *Cool* define schema via Json, not XML. Yes? (Actually this is a forward looking statement – check out Avro’s empty C# documentation section and footprint in dotnet development community? Look here: https://issues.apache.org/jira/browse/AVRO-1420
And if you try compile json avsc schema with avrogen.exe – well, it’s mostly undocumented space first. And if you “import” anything (which is quite essential requirement) you will quickly run into “Undefined Name”, the utility avrogen.exe (convert schema in Json/avsc files into C# classes) for dotnet (I’m sure the version for Java/Python works) is broken:
Avro was invented by Java and Open Source developers. dotnet always come last in their considerations.

M$ invested a lot on “Patterns” – Now, depending on where you’re coming from, whether you work for component vendors, or for example a bank. If you’re doing “Enterprise” applications most often what you developed have max ten, most often five years life time, then new ambitious CIO will come, scrap everything and start afresh. Further, your “libraries” will have very limited audience (in comparison say if you’re component vendor like Infragistics/DevExpress). Patterns get very little done except to get small numbers of developers to code with same Gof (Go5.x anyone?) Cookie Cutters (Important? Yes… yea). Over-engineering your applications accomplish nothing.

Like WPF to define GUI, WCF interprocess communication. Worse is Prism“What does it do for me for the learning curves it takes?” (For myself, and all developers in team).

Good Software isn’t always (50% of cases?) about writing software with same cookie cutters (repo, prism, spring, hibernate, dao) as with everyone else. Wizards do things that other people can’t. Not write code complying to Gof or Prism, or other over interpreted concepts school teaches you. Or whatever latest fades IoC blah blah (mere thoughts of that academic subjects puts me to sleep)

M$ need re-focus not to screw its developers. Stop do things like inventing Silverlight, WCF (then ditch it two years after that a joke?) . Minimize budget to Patterns and Practice team. Perhaps just embrace Android and build on top of it (If you can’t Conquer it, Use it, Exploit it)
M$ need re-focus on technology that delivers *Capability* (or Hardware/Software that simply *Looks Good*) – not imposing more rules to #dotnet dev, and screw her affliates/vendors. Just how much vendors like DevExpress or Infragistics to what extent they been screwed having developed product offerings for Silverlight which M$ herself abandoned? How much and to what extent “Your Own” career been hurted as a result of M$ stupidity recent years?

M$, Create Fan Boys, not Enemies.


Reverse Engineering Data-flow in a Data Platform with Thousands of tables?

Ever been tasked to inherit, or migrate an existing (legacy) Data Platform? There are numerous Open Source (Hadoop/Sqoop, Schedulix …) and Commercial tools (BMC Control-M, Appliedalgo.com, stonebranch …etc) which can help you operate the Data Platform – typically gives you multitude of platform services:
• Job Scheduling
• Load Balancing & Grid Computing
• Data Dictionary / Catalogue
• Execution tracking (track/persist job parameters & output)

Typical large scale application has hundreds to thousands of input data files, queries and intermediate/output data tables.

Mentioned Open Source and Commercial packages facilitates operation of Data Platform. Tools which helps generates ERD diagrams typically relies on PK-FK relationships being defined – but of course more often than not this is not the case. Example? Here’s how you can Drag-drop tables in a Microsoft SQL Server onto a Canvas to create ERD – https://www.youtube.com/watch?v=BNx1TYItQn4

If you’re tasked to inherit or migrate such Data Platform, first order of business is to manually map out data flow. Why? To put in a fix, or enhancement, you’d first need to understand data flow before any work can commence.

And, that’s a very expensive, time consuming proposition.

There’re different ways to tackle the problem. Here’s one (Not-so-Smart) option:
• Manually review database queries and stored procedures
• Manually review application source code and extract from it embedded SQL statements

Adding to complexity,
• Dynamic SQL
• Object Relational Mapper (ORM)

The more practical approach would be to employ a SQL Profiler. Capture SQL Statements executed, and trace the flow manually. Even then, this typically requires experienced developers to get the job done (Which isn’t helping when you want to keep the cost down & delivery lead time as short as possible). As such undertaking is inherently risky – as you can’t really estimate how long it’ll take to map out the flow until you do.

There’s one command line utility MsSqlDataflowMapper (Free) from appliedalgo.com which can help. Basically, MsSqlDataflowMapper takes SQL Profiler trace file as input (xml), analyze captured SQL Statements. Look for INSERT’s and UPDATE’s. Then automatically dump data flow to a flow chart (HTML 5). Behind the scene, it uses SimpleFlowDiagramLib from Gridwizard to plot the flow chart – https://gridwizard.wordpress.com/2015/03/31/simpleflowdiagramlib-simple-c-library-to-serialize-graph-to-xml-and-vice-versa/

• Microsoft SQL Server only (To get around this, you can build your own tool capture SQL statements against Oracle/Sybase/MySQL…etc, analyze it, look up INSERT’s and UPDATE’s, then route result to SimpleFlowDiagramLib to plot the flow chart)
MsSqlDataflowMapper operates on table-level. It identify source/destination tables in process of mapping out the flow. However, it doesn’t provide field-level source information (a particular field in output table comes from which source tables?)
• The tool does NOT automatically *group* related tables into different Regions in diagram (This requires a lot more Intelligence in construction of the tool – as we all know, parsing SQL is actually a very complex task! https://gridwizard.wordpress.com/2014/11/08/looking-for-a-sql-parser-for-c-dotnet). At the end of the day, it still takes skilled developer to Make Sense of the flow.

Happy Coding!

Parsing Microsoft SQL Profiler Trace XML using *DynamicXmlStream*

This article will show how compact syntax is extracting SQL statements from SQL Profiler Trace (Microsoft SQL Server) using Mahesh DyanmicXmlStream – which is *dynamic* (http://www.codeproject.com/Articles/436406/Power-of-Dynamic-Reading-XML-and-CSV-files-made-ea)

And the code to parse it (Don’t get more compact than this!)

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

using LearningMahesh.DynamicIOStream;
using LearningMahesh.DynamicIOStream.Xml;

static void Main(string[] args)
            string TraceFileName = null;
            string SQLStatement = null;
            IList SQLStatements = null;

                if (args != null)
                    TraceFileName = args[0];

                #region STEP 1. Extract SQL from profiler trace
                dynamic profilerReader = DynamicXmlStream.Load(new FileStream(TraceFileName, FileMode.Open));

                SQLStatements = new List();

                foreach (
                     dynamic Event in 
                            as DynamicXmlStream).AsDynamicEnumerable()
                            .Where(Event => Event.name.Value =="SQL:BatchStarting")
                    foreach (dynamic Column in 
                        (Event.Column as DynamicXmlStream).AsDynamicEnumerable()
                        .Where(Column => Column.name.Value == "TextData")
                        SQLStatement = Column.Value;

Multi tiering for Financial Applications

Multi-Tier Application Architecture isn’t a new concept to anyone who has done any sort of enterprise development to the point nobody ask about this during technical interviews anymore.
At minimum, there’re always three basic tiers whether you’re building web application or client server application:
a. Presentation
b. Application – Business logic
c. Data source
For financial applications, where do you put your calculations? That’s a matter of debate (but it shouldn’t).
I can’t tell you how many times I have seen applications gets built using the standard cookie cutter: DAO to load records from database into entities in Application tier, fashionably complying to every standard practice using OR mapper such as hibernate, Repositories and DAO with Spring. I’m not sure if people do this to learn the different technologies? To comply with Golden OO design paradigm. Or too afraid to deviate from “Best Practice”. This pattern simply don’t apply to all scenario. Not just “edge cases”.
For starter,
a. What kind of calculation are you running? Derivatives risk and pricing? VAR? Stressing? Time series analysis, covariance calculations, factor construction in portfolio optimization? Theses are computationally intensive. Quant libraries generally in c++, Python,  Java. And typically load distributed and calculations, thus, done in “Application Tiers”.
Or are you running simple pnl updates, aggregate position level pnl/return/risk to book level? funding or trade allocation? Reconciliation? These are simple mathematics (no quant Library) : key concatenation/matching, simple aggregations. This brings us to next point.
b. Data volume, performance, and proximity to data source. If your calculations sources or operate on a lot of data, unless nature of calculation complex. Or that it requires quant libraries. There’s probably very little reason why these should be done in Application Tier. Databases are extremely good at keys concatenation /matching, aggregation and simple arithmetic. If data already in database, you’re processing more than a few thousand rows, performance gains can be realised by running these calculations in database/SQL. Even if you have multiple data sources (even message bus or non-SQL sources) : One can always build simple data feeds, consolidate into single database. Downside to this approach is, SQL not portable across different database vendors.
c. Support
If calculations done in SQL, this means production trouble shooting can be done without a debugger. What this further means is that Level One support don’t need bother developers. More importantly, fixes can be simple SQL patches – no need recompile and redeploy, which adds to the risk.
d. Simplicity, Agile, and Maintainability
Let’s keep things simple. You’re adding complexity if you are doing simple maths in application tier, everytime you add a bean, entity, dao.
Happy Coding!

Anti Patterns

In additions to common, well covered, Anti Patterns as described in Wiki (http://en.wikipedia.org/wiki/Anti-pattern), from Spaghetti code, Lasagna code, hard code, Lava Code, Stale Code, magic number/string, cut&paste …etc
There’s a list of things dev lead should specifically watch for.

Critical – Things for which you’d have a hard time fixing, debugging.
a. Open Thread – Not using ThreadPool? Slow build up of thread count?
b. Dispose – when not implemented (or incorrectly/accidentally/unintentionally removed). Result? Slow dealth by memory leak, and subsequent paging.
c. Open lock – fine grain vs coarse grain locking? Is lock necessary?
d. Open Transaction – is it needed?
e. Open Connection to a resource (Database, File…etc) and not releasing it?
Architectural/Maintenance – Things that you won’t need to fix right away, but will degrade your code quality over time, to a state you’ll just need rewrite from scratch
a. Duplicate code, cut & paste, code reusea. Duplicate code, cut & paste, code reuse
b. Duplicate class’es? Entity vs VM?
c. Two utilility libraries (FTP for example) serving same purpose in different parts of the application
d. Global variables: Excessive use of Context or static variables (aka Spaghetti code)
e. Over/under engineering
Too much Business Logic in UI/presentation tier, God Object (aka Spaghetti code)
Over layering (aka Lasagna code)
f. No/insufficient input validation
(security concerns, as well as introduction downstream problem)
h. Un-necessary tie down functionality to specific technologies: For example, validation regular expressions/logic should not be tied to, for example, ASP.NET validation control. It should be generic static method in plain C# dll.
i. Critical execution paths
– unnecessary additions
– row by row handling vs BCP
j. Hard coded magic constants
k. switch statements not covering all possible cases
l. Incoherency/incongruency: For example validation expression/rule between client side validation and server side differs
m. Error hiding
n. Stale code (aka “Lava Code”)
i. Syntax
Camel vs Pascal casing
Pascal case: GreenButtonType.
Camel case: myInt
Indent Convention