Monday, May 2, 2016

Cognos Framework Manager Design Best Practices - 1

This proven design is based on using a data warehouse as the data source. 4 Layers are implemented as namespaces in Framework Manager as follows:

Database Layer:
A replica of the source database. Use the Metadata Import wizard to get tables into this namespace.
Do not meddle with the data source queries created by the import as this will adversely affect query performance for end users.
The other tasks that need to be done in this layer are:
a. Set usage correctly – “identity” for surrogate keys, “fact” for sumable numbers, otherwise set to “attribute”.
b. Create 1:N relationship from each dim to fact. I prefer to do this manually rather than allow the Metadata Import wizard to do it incorrectly. Only inner joins, not outer joins!!!
c. Add Determinants on dimensions only, no determines on facts. Top down approach – group by from the top down to unique for surrogate key at the bottom.

Modelling Layer:
Here we create queries that reference data source queries from the Database layer. At a minimum, create identical query subject in the Modelling layer for each data source query in the Database layer. You can also rename queries and its items to your hearts content. If you’ve got a well designed data warehouse as the data source you should not have to rename all the items. The one thing I do is take out the word “Dim” and “Fact” from the query subject names to make them more user friendly.
Relationships are duplicated/recreated in this layer from the Database layer. If sub dimensions or sub facts queries are created then additional relationships need to be added that do not exist in the Database layer.
Other tasks you can do in this layer
a. business calculations eg gross margin, gross margin %
b. amounts in standard currencies
c. folders inside query subjects to organise query items rather than just seeing a big long list of item
d. hide items eg surrogate keys
e. set default formats for numbers and dates

DMR Layer (Optional):
This is where we create multi-dimensional ROLAP cubes containing dimension hierarchies and measures.

Presentation Layer:
Contains star schema groups (right click on each fact in the Modelling layer and and each measures dimension in the Dimensional layer to create and cut and paste into this layer).
Optionally, create namespaces or folders to separate relational objects from dimensional objects. I like to create two namespaces, “Query” and “Analysis”. Query is for the SQL fact based star schema groups and Analysis is for the Dimensional star schema groups

Governor Settings in Cognos

Governor Settings in Cognos:

Governor settings are used to reduce system resource requirements and improve performance. You set governors before you create packages to ensure the metadata in the package contains the specified limits. All packages that are subsequently published will use the new settings. There are a number of governors that you can set. Governor settings in the model can be overridden by governor settings in Report Studio

For Example,

Report Table Limits - You can control the number of tables that a user can retrieve in a query or report

Data Retrieval Limits - You can control the number of rows that are returned in a query or report

Query Execution Time Limits - You can limit the time that a query can take 

Large Text Items Limit - You can control the character length of BLOBS (binary large objects) that a user can retrieve in a query or report

Allow Enhanced Model Portability at Run Time - You can use a separate but structurally similar data source at run time. The database schema must be the same between the two data sources.

Allows Usage of Local Cache - Select this governor to specify that all reports based on this model should use cached data.

Outer Joins - You can control whether outer joins can be used in your  query or report.

Cross-Product Joins - You can control whether cross-product joins can be used in your query or report.

Use With Clause When Generating SQL - You can choose to use the With clause with Cognos SQL if your data source supports the With clause.

Suppress Null Values for Dimensional Data Sources - This governor is supported for SAP BW data sources only. It does not work with relational or OLAP data sources.

Thursday, March 10, 2016

Challenges of Data Integration

Challenges of Data Integration

At first glance, the biggest challenge is the technical implementation of integrating data from disparate often incompatible sources. However, a much bigger challenge lies in the entirety of data integration. It has to include the following phases:

Design

  • The data integration initiative within a company must be an initiative of business, not IT. There should be a champion who understands the data assets of the enterprise and will be able to lead the discussion about the long-term data integration initiative in order to make it consistent, successful and benefitial.
  • Analysis of the requirements (BRS), i.e. why is the data integration being done, what are the objectives and deliverables. From what systems will the data be sourced? Is all the data available to fulfill the requirements? What are the business rules? What is the support model and SLA?
  • Analysis of the source systems, i.e. what are the options of extracting the data from the systems (update notification, incremental extracts, full extracts), what is the required/available frequency of the extracts? What is the quality of the data? Are the required data fields populated properly and consistently? Is the documentation available? What are the data volumes being processed? Who is the system owner?
  • Any other non-functional requirements such as data processing window, system response time, estimated number of (concurrent) users, data security policy, backup policy.
  • What is the support model for the new system? What are the SLA requirements?
  • And last but not least, who will be the owner of the system and what is the funding of the maintenance and upgrade expenses?
  • The results of the above steps need to be documented in form of SRS document, confirmed and signed-off by all parties which will be participating in the data integration project.

Implementation

Based on the BRS and SRS, a feasibility study should be performed to select the tools to implement the data integration system. Small companies and enterprises which are starting with data warehousing are faced with making a decision about the set of tools they will need to implement the solution. The larger enterprise or the enterprises which already have started other projects of data integration are in an easier position as they already have experience and can extend the existing system and exploit the existing knowledge to implement the system more effectively. There are cases, however, when using a new, better suited platform or technology makes a system more effective compared to staying with existing company standards. For example, finding a more suitable tool which provides better scaling for future growth/expansion, a solution that lowers the implementation/support cost, lowering the license costs, migrating the system to a new/modern platform, etc.

Testing

Along with the implementation, the proper testing is a must to ensure that the unified data are correct, complete and up-to-date.
Both technical IT and business needs to participate in the testing to ensure that the results are as expected/required. Therefore, the testing should incorporate at least Performance Stress test (PST), Technical Acceptance Testing (TAT) and User Acceptance Testing (UAT ) PST, TAT (Technical Acceptance Testing), UAT (User Acceptance Testing).

Data Integration Techniques

There are several organizational levels on which the integration can be performed. As we go down the level of automated integration increases.
Manual Integration or Common User Interface - users operate with all the relevant information accessing all the source systems or web page interface. No unified view of the data exists.
Application Based Integration - requires the particular applications to implement all the integration efforts. This approach is manageable only in case of very limited number of applications.
Middleware Data Integration - transfers the integration logic from particular applications to a new middleware layer. Although the integration logic is not implemented in the applications anymore, there is still a need for the applications to partially participate in the data integration.
Uniform Data Access or Virtual Integration - leaves data in the source systems and defines a set of views to provide and access the unified view to the customer across whole enterprise. For example, when a user accesses the customer information, the particular details of the customer are transparently acquired from the respective system. The main benefits of the virtual integration are nearly zero latency of the data updates propagation from the source system to the consolidated view, no need for separate store for the consolidated data. However, the drawbacks include limited possibility of data's history and version management, limitation to apply the method only to 'similar’ data sources (e.g. same type of database) and the fact that the access to the user data generates extra load on the source systems which may not have been designed to accommodate.
Common Data Storage or Physical Data Integration - usually means creating a new system which keeps a copy of the data from the source systems to store and manage it independently of the original system. The most well know example of this approach is called Data Warehouse (DW). The benefits comprise data version management, combining data from very different sources (mainframes, databases, flat files, etc.). The physical integration, however, requires a separate system to handle the vast volumes of data.

Courtesy :http://www.dataintegration.info/data-migration

Thursday, February 25, 2016

The 10 Essential Rules of Dimensional Modelling


Rule #1: Load detailed atomic data into dimensional structures.
Dimensional models should be populated with bedrock atomic details to support the unpredictable filtering and grouping required by business user queries. Users typically don’t need to see a single record at a time, but you can’t predict the somewhat arbitrary ways they’ll want to screen and roll up the details. If only summarized data is available, then you’ve already made assumptions about data usage patterns that will cause users to run into a brick wall when they want to dig deeper into the details. Of course, atomic details can be complemented by summary dimensional models that provide performance advantages for common queries of aggregated data, but business users cannot live on summary data alone; they need the gory details to answer their ever-changing questions.
Rule #2: Structure dimensional models around business processes.
Business processes are the activities performed by your organization; they represent measurement events, like taking an order or billing a customer. Business processes typically capture or generate unique performance metrics associated with each event. These metrics translate into facts, with each business process represented by a single atomic fact table. In addition to single process fact tables, consolidated fact tables are sometimes created that combine metrics from multiple processes into one fact table at a common level of detail. Again, consolidated fact tables are a complement to the detailed single-process fact tables, not a substitute for them.
Rule #3: Ensure that every fact table has an associated date dimension table.
The measurement events described in Rule #2 always have a date stamp of some variety associated with them, whether it’s a monthly balance snapshot or a monetary transfer captured to the hundredth of a second. Every fact table should have at least one foreign key to an associated date dimension table, whose grain is a single day, with calendar attributes and nonstandard characteristics about the measurement event date, such as the fiscal month and corporate holiday indicator. Sometimes multiple date foreign keys are represented in a fact table.
Rule #4: Ensure that all facts in a single fact table are at the same grain or level of detail.
There are three fundamental grains to categorize all fact tables: transactional, periodic snapshot, or accumulating snapshot. Regardless of its grain type, every measurement within a fact table must be at the exact same level of detail. When you mix facts representing multiple levels of granularity in the same fact table, you are setting yourself up for business user confusion and making the BI applications vulnerable to overstated or otherwise erroneous results.
Rule #5: Resolve many-to-many relationships in fact tables.
Since a fact table stores the results of a business process event, there’s inherently a many-to-many (M:M) relationship between its foreign keys, such as multiple products being sold in multiple stores on multiple days. These foreign key fields should never be null. Sometimes dimensions can take on multiple values for a single measurement event, such as the multiple diagnoses associated with a health care encounter or multiple customers with a bank account. In these cases, it’s unreasonable to resolve the many-valued dimensions directly in the fact table, as this would violate the natural grain of the measurement event. Thus, we use a many-to-many, dual-keyed bridge table in conjunction with the fact table.
Rule #6: Resolve many-to-one relationships in dimension tables.
Hierarchical, fixed-depth many-to-one (M:1) relationships between attributes are typically de-normalized or collapsed into a flattened dimension table. If you’ve spent most of your career designing entity-relationship models for transaction processing systems, you’ll need to resist your instinctive tendency to normalize or snowflake an M: 1 relationship into smaller sub dimensions; dimension de-normalization is the name of the game in dimensional modelling.
It is relatively common to have multiple M:1 relationships represented in a single dimension table. One-to-one relationships, like a unique product description associated with a product code, are also handled in a dimension table. Occasionally many-to-one relationships are resolved in the fact table, such as the case when the detailed dimension table has millions of rows and its roll-up attributes are frequently changing. However, using the fact table to resolve M:1 relationships should be done sparingly.
Rule #7: Store report labels and filter domain values in dimension tables.
The codes and, more importantly, associated decodes and descriptors used for labelling and query filtering should be captured in dimension tables. Avoid storing cryptic code fields or bulky descriptive fields in the fact table itself; likewise, don’t just store the code in the dimension table and assume that users don’t need descriptive decodes or that they’ll be handled in the BI application. If it’s a row/column label or pull-down menu filter, then it should be handled as a dimension attribute.
Though we stated in Rule #5 that fact table foreign keys should never be null, it’s also advisable to avoid nulls in the dimension tables’ attribute fields by replacing the null value with “NA” (not applicable) or another default value, determined by the data steward, to reduce user confusion if possible.
Rule #8: Make certain that dimension tables use a surrogate key.
Meaningless, sequentially assigned surrogate keys (except for the date dimension, where chronologically assigned and even more meaningful keys are acceptable) deliver a number of operational benefits, including smaller keys which mean smaller fact tables, smaller indexes, and improved performance. Surrogate keys are absolutely required if you’re tracking dimension attribute changes with a new dimension record for each profile change. Even if your business users don’t initially visualize the value of tracking attribute changes, using surrogates will make a downstream policy change less onerous. The surrogates also allow you to map multiple operational keys to a common profile, plus buffer you from unexpected operational activities, like the recycling of an obsolete product number or acquisition of another company with its own coding schemes.
Rule #9: Create conformed dimensions to integrate data across the enterprise.
Conformed dimensions (otherwise known as common, master, standard or reference dimensions) are essential for enterprise data warehousing. Managed once in the ETL system and then reused across multiple fact tables, conformed dimensions deliver consistent descriptive attributes across dimensional models and support the ability to drill across and integrate data from multiple business processes. The Enterprise Data Warehouse Bus Matrix is the key architecture blueprint for representing the organization’s core business processes and associated dimensionality. Reusing conformed dimensions ultimately shortens the time-to-market by eliminating redundant design and development efforts; however, conformed dimensions require a commitment and investment in data stewardship and governance, even if you don’t need everyone to agree on every dimension attribute to leverage conformity.
Rule #10: Continuously balance requirements and realities to deliver a DW/BI solution that’s accepted by business users and that supports their decision-making.

Dimensional modellers must constantly straddle business user requirements along with the underlying realities of the associated source data to deliver a design that can be implemented and that, more importantly, stands a reasonable chance of business adoption. The requirements-versus-realities balancing act is a fact of life for DW/BI practitioners, whether you’re focused on the dimensional model, project strategy, technical/ETL/BI architectures or deployment/maintenance plan.