Grid Computing: Performance Tuning

There are many things you can do to improve performance of your grid. There’re couple of things you need to check before throwing in money on additional hardware – CPU’s, memory, faster disks, or adding nodes.

Before we proceed, it’s important to emphasis that, in general, Grid is only responsible for:

  • Scheduling of Jobs and tracking their execution status
  • Distribution of Jobs to nodes

If your jobs are not running fast enough, there’re two possibilities

  • Your job code has not been sufficiently optimized, or there’s a genuine performance issue in your job code.
  • Bottleneck in your Grid infrastructure

The following passage is to provide a quick check list on the latter, with reference to graphics and architecture of grid from Applied Algo (https://appliedalgo.com) as example.

Grid Computing - Performance Tuning

When troubleshooting performance issues in general, you start by asking yourself these two questions:

  • Identify Where the bottleneck is
  • What is the nature of the problem – is it CPU maxing out? Memory? Or Disk (Thrashing? Is it paging a lot? http://www.programmerinterview.com/index.php/operating-systems/how-virtual-memory-works)? Is your job code moving too much data across the different tiers?

Scheduling Settings

Multiple Jobs referencing same data on same external data source? You may be better off running them sequentially – run one after another.

Input Data – Bottleneck in Database Tier?

  • Partitioning Strategy: Jobs referencing data residing in same table, database? Partition your data across multiple data tables, data files on separate disks, multiple database instances/SQL Clusters.
  • SQL Optimization: Review Query Execution Plan; add primary key, indexes, foreign keys to optimize joins.

Grid Load Balancer

  • Node Affinity: Run fast jobs in one node group, slow jobs on another.
  • Throttling settings? If your node is working too hard, for example long disk queue, pushing it by queuing up jobs it cannot start won’t help.

Nodes

  • Actual job running on the nodes optimized?
  • Nodes proximity to input data – slow link? Your grid resides in the Cloud? What about your Data Source? Are you sending too much data over the wire?
  • Perform preliminary operations (filtering, simple aggregations for example) on input data in SQL, BEFORE fetching data into the nodes to minimize traffic, optimize consumption.
  • Excessive Logging?
  • Excessive threading won’t help – you’d just get a lot of context switches. Try limit # threads to # CPU’s

Happy Coding!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s