Applied Dimensionality

Processing a billion+ cells in Planning Analytics

Posted at — Dec 14, 2022
Processing a billion+ cells in Planning Analytics

This is post is updated on 2022/12/20 to correct a couple statements below based on the Linkedin discussion we had and add an example of commit timing.

I remember reading

Using a parallel data processing regime/framework, TM1 can load upwards of 50.000 records per second per CPU core. In a 16 CPU core context, this can mean an overall data-load/update speed of roughly 800.000 records per second or 1 Billion records in 21 minutes!

in “Parallel Data Processing with IBM Planning Analytics” whitepaper by Andreas Kugelmeier (worth reading) a while back when I was writing about large scale TM1 models and thinking:

I have since participated in building a system that processes data on the similar scale (1.2 bln cells in 45 minutes) and it has been in production for about a year, so I thought it’s worth sharing some of the learnings.

Why so much data

This is labour forecasting for a large retail chain, so we calculate effort by:

which ends up being 450m cells populated on average calculation run. And we have multiple versions that are ran in parallel.

Pic or didn’t happen: Stats by Cube

a typical day with ~7 different versions populated for the calculation.

How and some learnings

Some additional notes to my earlier ramblings on TI performance (NB: all still applicable):

...
  nRawMinutes =  nTimeValueByNumberOfTimes \ nLabourStandardtotalNumberSlotsPerDay;
  sDebugString = Expand('nTimeValueByNumberOfTimes (%nTimeValueByNumberOfTimes%) \ nLabourStandardtotalNumberSlotsPerDay (%nLabourStandardtotalNumberSlotsPerDay%)');
...
if (pTraceFile @<>'');
  AsciiOutput(pTraceFile, vSite,vTask,vDay,vTimeOfDay,'Raw Minutes', NumberTostring(nRawMinutes) ,sDebugString);
endif;
...

turned out to be useful

Should PA be used for this

Back to the headline question - should this be done in PA at all?

I was constantly asking myself that during the implementation (and especially while debugging) and then had a chance to listen to the product owner’s demonstration of the system to the wider audience. Surprisingly enough this calculation wasn’t even mentioned, but the lot of more ‘standard’ PA pieces that we built around it (i.e. driver based calculations, scenario modeling, top-down adjustments) were really life-changing. None of them would’ve worked or be possible without that base calculation and PA was really well suited for doing those.

Using another system to do the calculation (Spark is the obvious candidate I kept thinking about) and building the required data integration out of PA and back in probably would’ve landed us in the same run time with added complexity of maintaing 2 systems.

And the very same calculation in the actual time tracking system is taking over 15 hours, so our 45 minutes seems like a good outcome at the end of the day.

Commit timing

Picture worth a 1000 words? Stats by Cube tm1server.log (and any ExecuteProcess used in queue orchestration within TI) thinks the process is finished, whereas data is still committing for another 10 seconds (that’s running it with 100m records to write). Hence the need to Sleep to write committed data.

REST API correctly records the full execution time with Commit being a different section, but it requires you to ‘orchestrate’ TI runs outside of PA.

Here’s the sample code that you can use to reproduce it yourself (you’d need 1GB spare memory in you want to write 100 m cells as I did to see a ‘long commit’).

Process 1. Create test objects

Prolog

# Let's create a cube with 4 dimensions with 1000 elements each to write data to 
j = 1;
while (j<=4);
  sDim = 'CommitTest Dimension ' | NumberToString( j );
  DimensionCreate( sDim );
  i = 1;
  while (i<=1000);
    sElement = 'Element ' | NumberToString( i );
    DimensionElementInsert( sDim, '', sElement, 'N' );
    i = i + 1;
  end;
  j = j + 1;
end;

CubeCreate('CommitTest','CommitTest Dimension 1','CommitTest Dimension 2','CommitTest Dimension 3','CommitTest Dimension 4');

Process 2. Write data

BTW, this TI writes 100 m cells in 615s on PAoC – about 160k cells / s :)

Prolog

#let's loop through dimensions and populate cells until we reach the target number
#we have 4 dims 
CellPutS('NO','}CubeProperties','CommitTest','Logging');

CubeClearData( 'CommitTest' );
nDimSize1 = Dimsiz('CommitTest Dimension 1');
nDimSize2 = Dimsiz('CommitTest Dimension 2');
nDimSize3 = nDimSize3;
nDimSize4 = nDimSize4;

i = 1;
while (i<= pNumberOfCellsToWrite);
  nCoord1 = Mod(int(i/(nDimSize4*nDimSize3*nDimSize2)), nDimSize1) + 1; 
  nCoord2 = Mod(int(i/(nDimSize4*nDimSize3)), nDimSize2) + 1;
  nCoord3 = Mod(int(i/nDimSize4), nDimSize3) + 1;
  nCoord4 = Mod(i, nDimSize4) + 1;
   CellPutN(1000 * Rand(),
            'CommitTest',
            'Element ' | NumberToString(nCoord1), 
            'Element ' | NumberToString(nCoord2),
            'Element ' | NumberToString(nCoord3),
            'Element ' | NumberToString(nCoord4));

  i = i + 1;
end;

CellPutS('Yes','}CubeProperties','CommitTest', 'Logging');

Epilog

LogOutput( 'info', 'Epilog of write process finished, wrote ' | NumberToString(i) | ' cells' );

Process 3. Destroy test objects

Prolog


CubeDestroy( 'CommitTest' );
DimensionDestroy( 'CommitTest Dimension 1' );
DimensionDestroy( 'CommitTest Dimension 2' );
DimensionDestroy( 'CommitTest Dimension 3' );
DimensionDestroy( 'CommitTest Dimension 4' );

comments powered by Disqus