Storage Made Easy

Jim Liddle

Subscribe to Jim Liddle: eMailAlertsEmail Alerts
Get Jim Liddle via: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn

Blog Feed Post

Algorithmic Excel Trading with GigaSpaces

One of the really strong points of the GigaSpaces technology is the strength of the client interoperability. As well as being fully interoperable with .Net and C++ (and if you are interested in these technologies I would suggest you check out the C++ article here and the .Net articles here and here) GigaSpaces has integration with Microsoft Excel.

This enables organisations who use Excel in a trading scenario to scale their use of Excel to prevent bottenecks, to parallelize processing, or to automate algorithmic trading that uses Excel.

When using Excel for Algorithmic trading  or for headless calculations a data feed can be input into the in-memory GigaSpaces data space which triggers phases of calculations in distinct unattended Excel spreadsheets. This in turn generates values to a data cache that can be picked up as events and displayed within an attended Excel sheet.

Visually we can view this as below:











The diagram shows:

·      An External Data Feed Handler 

·      A number of Excel Compute Support Processing Units (PUs) which are managed by the GigaSpaces Service Grid. Each PU contains worker components of two types:

1.     workers that select and execute Excel compute tasks.

2.     workers that select and execute manager tasks.

·       An attended Excel sheet that has an add-in that enables it to listen to the GigaSpaces data cache and display value changes are a result of data changes in the cache. Both User-Defined Function (UDF) and Real-Time Data (RTD) add-in approaches can be implemented.

The way this work is as follows:

1.     The External Data Feed Handler reads data from a file and uses it to create an initial data value and a task to manage the overall computation stages. The data value is written to the cache and the manager task is injected using a imple task submission API. A Compute Fabric worker executes the injected manager task.

2.     Execution of the manager task picks up the initial data value and spawns a set of Excel compute tasks. These compute tasks execute a number of parallel calculations using the first sheet. The manager task waits all spawned tasks to complete.

3.     The manager task collates the results from spawned tasks and uses them as input to a calculation on a second sheet. The manager task spawns another Excel compute task to perform this second calculation.

4.     The manager task converts the result of the second calculation into a result data value that it writes back to the GigaSpaces data cache.

5.     UDF/RTD add-in to the interactive Excel sheet sees the data change event in the GigaSpaces data cache, obtains the data values and updates the sheet.

Key to accomplish this is an Excel Compute Support Processing Unit,  which is managed by the GigaSpaces Service Grid. The diagram below illustrates the components of this processing unit.










Inside each processing unit (PU) instance a FederatedWorkerFactory communicates with other FederatedWorkerFactory instances in the grid to form a federated fabric that hosts a number of open Excel workbook instances., The fabric can be instructed to host a number of instances of the same Excel workbook to provide redundancy and failover. The fabric ensures that the Excel workbook instances are distributed as evenly as possible between the PUs available to host them.

When the FederatedWorkerFactory receives an ownership request for a given workbook it creates a worker to pick up and execute ExcelComputeTasks that are tagged with the name of the workbook., ExcelComputeTasks delegate calculation to a resident Excel Compute Manager, If additional PUs are added or removed from the fabric then the fabric automatically and dynamically rebalances the Excel workbook instances to maintain an optimum balance., Additional workbooks can be added (or removed) from the fabric dynamically whilst the system is running and the number of instances of already hosted workbooks can be raised or lowered.

The Excel Compute Manager within the PU manages a configurable pool of “headless” Excel processes. Each Excel process is lightweight and can manage one or more open workbooks.

The fabric ensures that the optimum number of Excel processes and open workbooks are  running on each GigaSpaces Service Grid node. There is a significant relative performance overhead to loading workbooks in Excel. Workbooks are therefore opened lazily on first request, but then remain open, ensuring that repeat calculation requests to the same workbook are optimized for performance.

If a given PU manages more than one open workbook, it will perform calculations for each workbook concurrently, whilst ensuring that each Excel instance in the pool is managed in a thread-safe manager. This enables good scalability of concurrent Excel compute both within a single node, and across the nodes available in the grid. The configurable Excel pool size determines the upper bound of the number of concurrent workbook calculations that can be performed by a single PU.

If an Excel instance is damaged, dies, or is killed, the fabric ensures that it is removed from the pool of available Excel instances and, if necessary a new instance is spawned. Any in-flight calculation request is automatically retried once the workbook has been re-opened with another Excel instance.

The Excel PU, tasks and fabric specific to this solution build upon the GigaSpaces Excel integration and were developed by a partner to give a solution oriented approach to working with Excel in a Grid. If you would like to know more about this then please feel free to contact me.

Read the original blog entry...

More Stories By Jim Liddle

Jim is CEO of Storage Made Easy. Jim is a regular blogger at SYS-CON.com since 2004, covering mobile, Grid, and Cloud Computing Topics.