Applied Dimensionality

Loading data into ASO cube

Posted at — Apr 2, 2009
Loading data into ASO cube

Hm, I guess it’s the time for a first big essbase-related post. Today we’ll talk about data loading in ASO cube.

Those who’ve listened to our recent Oracle events presentations (russian TechForum 2008, BI Forum 2009), know that we’ve made a nice PoC project with SportMaster, consisting of loading their year’s worth of daily stock and sales data (around billion fact cells with over a million elements in products dimension).  All in all, I’ve decided to post a write-up about how to achieve acceptable load speed on such volumes and tell a scary-story, with a detective twist and a happy ending.

The basics:

Just a remark: Essbase Integration Services does a lousy job on either building dimensions (we’re adding 14 attributes for products and loading a file again for _every_ attribute is a disaster) or loading data (no visible ability to run multiple buffer load). No comments on 11 Essbase Studio, haven’t tried it.

So, we’ve extracted data from DWH to text files and are loading these files into text buffers. While loading data from file into buffer, with a simple rule file I could achive data read speed of around 6 Mb\s (on a 80 Mb\s disk system, we’ve got two such disk systems attached on PoC server). Loading two files in parallel, gave 12 Mb\s. So to fully load up disk system we need to chunk files into 6 parts. That actually doesn’t present a problem, but seems a bit awkward. While testing, I’ve accidentially forgot using server rules_file in MaxL script and was astonished by load speed of 25-40 Mb\s! That’s how I’ve found out about ‘free-form data load’ )

Essbase has a built in ability to load data without rule file (see DBAG). And, moreover, this is almost 10 times faster than loading with rule file. Easiest way to find more about free-form load is to export an ASO cube to text file (right-click on database in Essbase Administration console). Moreover, free-form allows to load data in cross-tab (pivot accounts in columns), therefore greatly suppressing number of rows. Using this technique requires additional data formating: spaces as separator, double quotes enclosure, writing Missing or MI for null values. But load speed is worth it )

So the bottom line: free-form data load, additional data massage: 70 Gbs of data in less than 30 minutes, almost reaching half of disk system speed.

That’s it for fast data loading, if you’re on 9.3.1 i’d recommend trying to massage data into free-form and load it this way.

Now for the mystery part of the story

So we’ve reached ultra-high loading speed and loaded over-half a billion rows in less than half an hour (+ another 40 minutes for aggregation, to be sincere). And then left it out for a couple of months. Fast forward some server migration, another PoC and all. Then once we’ve need those PoC results once more – they dissapeared. But that’s not a mystery, this usually happens on dev servers. Backups were gone as well, but that’s common for dev as well.

But then I’ve tried to reload data — it just didn’t. Said — ‘unknown member’ on a member that was in outline and stopped loading. Okay, let’s try to focus on finding error rows. So it’s chunking the 25 g file into smaller ones and trying again. 1mln rows — doesn’t work. 100k rows — no go. 10k — no go. 1k — no go. 500 — YEP. Excellent, so it’s about those other 500 rows. Adding another 100 — NO. 50 — YEP. 75 — YEP. 100 — NO. 90 — NO. Hm, strange. Another 10 rows — YEP.  What? I’ve already loaded 100 and got NO. 100 again — NO. Again — YEP. That’s when I blew my head off. I’ve got a data sample that I could load successfully one time out of 4. That’s crazy.

Rule file load worked. But ten times as slow.

Looking back it simple ) — this depends on whether Essbase can find selected member name in outline cache or no. By default all member names are preloaded into memory on application start, but there’s an essbase.cfg setting PRELOADMEMBERNAMESPACE which allows to turn off this loading (which was done by someone between PoCs), If preloading is turned off member names are loaded the same way as the whole outline: by chunks of OUTLINECACHESIZE kbs (by the way, if somebody read to this part — there’s a free beer for you). Off: Anybody knows why outline paging cache size setting officially dissapeared from documentation after 9.2.1??. And free-form loading is so fast, that outline cannot be loaded on same speed to get member names (they can be in different parts of otl file), so it eventually fails with ‘unknown member’. Some sequential tries, however, lead to exactly needed piece of otl lying there in outlinecache, so the whole file loads successfully.

The answer was to turn member names prealoading back on: it started to work again. But tracking the error was really messy.

Lessons learned: Allways check essbase.cfg. And backup everything )

comments powered by Disqus