Monthly Archives: June 2017

Oracle BI Cloud Service (BICS) Access Options: Data Sync Overview and Configuration

June 28, 2017

Author: Doug Ross, Performance Architects 

Introduction

As more organizations move their business intelligence (BI) environments to the cloud, loading and accessing enterprise data will become as important as the front-end visualizations.  Oracle’s BI Cloud Service (BICS) offers several options for those data requirements that go beyond simple data upload. Each has a specific purpose, features, benefits, and limitations. One of the more powerful options for true enterprise-level data transfer to the cloud is Oracle’s Data Sync tool.

Data Sync Overview

Data Sync provides a full-featured data transfer tool with a client interface that allows for scheduling load jobs that efficiently move data from flat files, database tables, and other cloud data sources into the BICS Database Schema Service or Oracle Database Cloud Service.  It can also directly load data as a data set source for the Visual Analyzer projects that are available in BICS.

It includes many of the features found in other data loading tools: logging of load job execution steps, restarting after failures, incremental loading of new or modified data, and configuring the sequence of load operations.  A Data Sync solution would more likely lend itself either to new development of data load processes or to a more agile analytics environment that allowed for changing processes and data models more rapidly than would be possible with an on-premise database.

Data Sync Configuration Steps

Data Sync’s primary function is to upload data into a BICS environment.  Data can be loaded from flat files (CSV or XLSX), relational database sources (either tables, views, or SQL statements that are executed dynamically), Oracle Transactional BI (OTBI), JDBC data sources (e.g., MongoDB, Impala, SalesForce, RedShift), or the Oracle Service Cloud.  Uploaded data can be stored in cloud-based tables or data sets accessible to the data visualization components.

Data Sync can:

  • Load data sources other than Oracle in addition to data files or Oracle tables
  • Execute incremental data loads or rolling deletes and insert / append strategies
  • Merge data from multiple sources
  • Schedule data loads

Data Sync is installed on a local computer running either the Windows or Linux operating systems.  Prior to installing Data Sync, ensure that Java Development Kit (JDK) 1.7 or later is installed on the local computer.  It must be the JDK and not a JRE.  It is also necessary to validate that the user account that will be used to access the BICS database schema has the proper permission.   Work with your cloud administrator to request permission to upload data to Oracle BICS by assigning the BI Data Load Author application role to the account.   To upload data to a data set instead of a table, the BI Advanced Content Author application role should be assigned.

Installation Steps:

  1. Download the Data Sync software from Oracle Technology Network. Currently located at: Data Sync Download on OTN
  2. Unzip BICSDataSync.Zip to a local directory (no spaces in directory name)
  3. Set the JAVA_HOME variable in config.bat or config.sh to point to JDK location
  4. Copy database-specific JDBC drivers to Data Sync’s \lib directory

Data Sync is comprised of both a server component and a client GUI interface.  To start Data Sync and its server component, run datasync.bat (Windows) or datasync.sh (Linux/UNIX) from the Data Sync installation directory. The Data Sync icon displays in the system icon tray to indicate that the server is up and running.

To access the client interface, click on the icon and choose Start UI to open the Data Sync client.

Click on the icon and choose Exit to stop the Data Sync server.

Data Sync Updates

Data Sync automatically checks against the cloud environment to see if a new version is available prior to each load job executing.  It is possible that a new version of Data Sync has been installed in the cloud that is incompatible with the local version.  If the versions match, the data load continues unaffected.  If the minor version of the tool is changed in the cloud, it indicates a patch is available.  This triggers a one-time alert and an optional email if the Data Sync server is configured for emailing notifications.   If the version change indicates a major version release, an alert is created and an email sent. The data load job is stopped and will not run until the new version is installed.

New versions (minor or major) are simple to install by following the standard installation process into a new home directory and then by copying the existing environment configuration into the new installation.

Data Sync Terminology

Connection:      Defines the data sources and target databases.

Project:             A container which describes and organizes the information related to a data load.
There can be any number of projects within a Data Sync instance.

Job:                  The mechanism for uploading all the data sources defined in a project to BICS.

Load Strategies

When moving data to the cloud, a load strategy defines how that data is loaded from the data source into the target. Incremental data loading is available as long as there is a column on the source table which uniquely identifies each row (primary) as well as another column with a “Date/Timestamp” data type that can be used to identify which rows have been added or modified since the previous load.   During the initial load job, the full source table is transmitted to the cloud.  In subsequent loads, the last update date is compared to a stored timestamp for

Load strategy options include:

  • Replace data in table: Truncates existing data and always reloads the table. Any indexes are dropped prior to data load and are recreated after the load
  • Append data to table: New data is added to the table without checking for any prior existence of data.  Indexes are not dropped before the load.  Any new indexes defined on the source are created after the data load
  • Update table (add new records only): Requires a primary key column. If a row with the same key does not exist, then it is inserted, or else the record is ignored
  • Update table (update existing records): Requires a primary key column.  If the data with the same key is available, then it is updated, or else it is ignored

Loading Data Sets

Data sets are separate data objects used primarily by the Visual Analyzer component of BICS.  They can be thought of as separate data files stored in the cloud.   Data Sync can load those data sets with the following guidelines:

  • If a data set by a similar name does not exist, one is created automatically with default settings
  • All string and timestamp based source columns are set to attribute type, and numeric datatype columns are set as measures in the target data set
  • The maximum data set size is 50MB; data uploads fail if the data set exceeds the 50MB limit

Loading Data with Jobs

Data Sync uses the concept of a job to organize, schedule, and execute load processes. A run is an instance of a data loading job. For example, if you run a job twice, then you’ll see two run records on the History tab.  As is common with most scheduling applications, Data Sync allows for job settings to recur on a scheduled basis to meet whatever load frequency is required.

Parameterized Jobs

Parameters are available in Data Sync to customize the data loads at run time to dynamically change the conditions on the data selection step.  Rather than changing the SQL statements used to query the data sources, a parameter can be changed at the project or job level instead. A job level parameter will override a parameter with the same name at the project level.

Chaining Data Sync Load Jobs

There may be occasions where it would be beneficial to the load process to chain load jobs in a specific order. Possible reasons might be the need to load multiple separate data source tables into a common target or the need to load aggregate tables after base level data tables have completed.   The first step would be to separate the load jobs into distinct projects.

When a job starts, a signal file with a name like “job_name”_StartSignal.txt is created in the Data Sync:  log\jobSignal directory for each run of the job. A file with the naming pattern “job_name”_CompletedSignal.txt is created when the job completes successfully, or “job_name”_FailedSignal.txt when the job fails.  Data Sync has a polling mechanism to look for the existence of these files before executing another load job.

By editing the on_demand_job.xml file located in the conf-shared directory, you can specify the name of a file that will trigger a specific load job.

For example:

<TriggerFile job=”Load_HR_Data” file=”d:\Load_The_HR_Data_Now.txt”/>

In the example above, Data Sync polls for the presence of the Load_The_Hr_Data_Now.txt file, and when it is found it triggers the execution of the Load_HR_Data job.  When the job is started, the triggering file is deleted.   A time window for polling can also be specified in the XML file.

<TriggerFile job=”Load_HR_Data” file=”d:\Load_The_HR_Data_Now.txt”>
<TimeWindow startTime=”00:00″ endTime=”05:00″/>
<TimeWindow startTime=”21:00″ endTime=”23:00″/>
</TriggerFile>

Two other options exist for triggering Data Sync load jobs, either from the command line using the datasyncCmdLine.bat/.sh script file or else with polling tables stored in a database.  Both of these methods are described in detail in the Data Sync documentation.

Monitoring and Troubleshooting Data Loads

Data load jobs can be monitored while in progress using the Current Jobs tab or History tab in Data Sync.   The top panel shows the job summary information. The Tasks tab underneath that panel shows one record per user-defined data flow. The tasks and the details show important metrics, including start and end timestamps, the number of rows read and written and throughput.

The run log files are stored in the Data Sync log directory. One directory per run is created with a naming convention of CR_<Table/File Name>_<From Connection>_<To Connection>.<Timestamp>.log.

There are options available within the Connections configuration screens to reset a data target to do a full reload if necessary.  The reload can be for all connections or individual tables within a connection.

Data Sync Configuration Migration

As with most development environments, it may be necessary to maintain multiple instances of Data Sync for DEV, TEST, and PROD.  This can be managed within the Data Sync interface by exporting the metadata from one environment and importing it into the next.  The export and import options are available under the Tools menu.

When exporting, there are options to export the following metadata types:

  • Logical: Exports all information contained in the Project view
  • System: Exports all information contained in the Connections view, except passwords for servers and database connections
  • Run Time: Exports information about jobs and schedules contained in the Jobs view
  • User Data: Exports users, roles, and passwords

© Performance Architects, Inc. and Performance Architects Blog, 2006 - present. Unauthorized use and/or duplication of this material without express and written permission from this blog's author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Performance Architects, Inc. and Performance Architects Blog with appropriate and specific direction to the original content.

Oracle BI Cloud Service (BICS) Access Options: Remote Data Connector (RDC) Overview and Configuration

June 21, 2017

Author: Doug Ross, Performance Architects

As more organizations move their business intelligence (BI) environments to the cloud, loading and accessing enterprise data will become as important as the front-end visualizations.  Oracle’s BI Cloud Service (BICS) offers several options for those data requirements that go beyond simple data upload. Each has a specific purpose, features, benefits, and limitations. Only one option allows data to remain on-premise for querying by BICS: Remote Data Connector (RDC).

Rather than moving data to the cloud, RDC enables a secure connection to on-premise data sources for analysis and visualization. RDC utilizes the BI Server Data Gateway running in the BICS environment to provide secure access to on-premise data using private/public key pairs and SSL communication.  The primary benefit of RDC is that it preserves the investment in the technology used to house and load on-premise data warehouses.  It offers a hybrid approach to transitioning to a cloud-based analytics environment without having to also migrate the entire data environment as well.

RDC enables analyses in BICS to connect directly to an on-premise Oracle database following proper configuration of the on-premise firewall, security, and WebLogic installation.  When an analysis is executed in BICS, a SQL request is generated and transmitted to the on-premise WebLogic server.  WebLogic passes that SQL onto the associated database, compresses the resulting dataset, and then returns that result to BICS where it is presented in a visualization view.

This provides organizations with very large on-premise data warehouses the ability to use BICS as a front end without having to duplicate the same data in the cloud.

A Remote Data Connector setup requires the following mandatory components:

  • The Oracle Business Intelligence Enterprise Edition (OBIEE) BI Administration client tool that is used to create the RPD must be version 12.2.1.0.0 only. This is due to the RDC requiring the JDBC (JNDI) Data Source option for the connection to work. The configured RPD will be “lifted-and-shifted” to the BICS environment while maintaining RDC connections at the physical layer
  • The on-premise database can be Oracle, Teradata, SQL Server, or DB2
  • The on-premise environment must have either a configured WebLogic server or Apache Tomcat server. While prior versions of WebLogic should work, the latest version would be preferred
  • The on-premise WebLogic server must be accessible externally via the necessary networking, security and firewalls configuration. In the RPD, the port defined in the physical connection must accurately route to the WebLogic server port

One important item to understand is that once a “lift-and-shift” of the RPD is performed in the BICS environment, any previous connections to the co-located database ‘Schema Service’ will not be accessible.  The on-premise RPD data model will replace the Schema Service repository and will not be able to connect to Schema Service database objects.

The following diagram produced by Oracle illustrates how the RDC environment works:

Consider the following before implementing an RDC solution:

Network performance will become a much greater factor in the execution time of analyses and visualizations.   Large query result sets transferred over the network will likely introduce latency challenges.

The hybrid approach to BICS data access typically requires significant assistance and support from a customer’s network support team.  There may be resistance to RDC based on corporate policies related to opening up access to internal databases from external sources.


© Performance Architects, Inc. and Performance Architects Blog, 2006 - present. Unauthorized use and/or duplication of this material without express and written permission from this blog's author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Performance Architects, Inc. and Performance Architects Blog with appropriate and specific direction to the original content.

Oracle BI Cloud Service (BICS) Access Options: When to Use Data Sync Versus Remote Data Connector (RDC)

June 14, 2017

Author: Doug Ross, Performance Architects

Introduction

As more organizations move their business intelligence (BI) environments to the cloud, loading and accessing enterprise data will become as important as the front-end visualizations.  Oracle’s BI Cloud Service (BICS) offers two options for these data requirements that go beyond simple data uploads: Data Sync and Remote Data Connector (RDC).

Other options range from simple manual data loads of spreadsheets using the BICS front end to advanced programmatic options based on REST APIs. Each has a specific purpose, features, benefits, and limitations. As the migration to the cloud and the tools used to support that transition are still in their early stages, this blog post discusses the current state of Data Sync and RDC with the expectation that Oracle will continue to enhance the capabilities of each over time.

Overview

The full list of available BICS data load options includes:

RDC is different from all of the options above in that data is not moved to the cloud; it remains on-premise and is available both to cloud applications like BICS, as well as existing reporting tools in the on-premise environment.

At the most basic level, BI Data Sync and RDC represent two ends of the spectrum in providing access to data in the cloud.  Data Sync is used to push data from on-premise sources to a cloud database, while RDC is used to pull data from an on-premise source database into BICS visualizations.

Data Sync provides a full-featured data transfer tool with a client interface that allows for scheduling load jobs that efficiently move data from flat files, database tables, and other cloud data sources into the BICS Database Schema Service or Oracle Database Cloud Service.  It can also directly load data as a data set source for the Visual Analyzer projects that are available in BICS.  It includes many of the features found in other data loading tools: logging of load job execution steps, restarting after failures, incremental loading of new or modified data, and configuring the sequence of load operations.

Rather than moving data to the cloud, RDC enables secure connection to on-premise data sources for analysis and visualization.  BICS RDC utilizes the BI Server Data Gateway running in the BI Cloud Service environment to provide secure access to on-premises data using private/public key pairs and SSL communication.  The primary benefit of RDC is that it preserves the investment in the technology used to house and load on-premise data warehouses.  It offers a hybrid approach to transitioning to a cloud-based analytics environment without having to also migrate the entire data environment as well.

The decision of whether to use Data Sync versus RDC would be based on a number of factors:

  • Concerns over data security in the cloud
  • Data volumes in the local data warehouse tables that might be difficult to transfer to the cloud in a timely manner
  • Synchronization of data transmissions to the cloud with current load processes
  • Ongoing investment in ETL tools, processes, and employees that would not be ready to transition completely to the cloud

A Data Sync solution would more likely lend itself either to new development of data load processes or to a more agile analytics environment that allowed for changing processes and data models more rapidly than would be possible with an on-premise database.

Conclusion

Regardless of where a BI environment is located, it truly is all about the data. And with the push to migrate more of the analytics functions into the cloud, it is necessary to determine the optimal strategy for using data visualization tools to access that data.  Oracle provides many options to do this, whether it is the relatively simple process of configuring access to existing on-premise databases by using RDC or implementing a fully formed data loading strategy into the cloud using BI Data Sync.   The capabilities and tradeoffs for each method should be reviewed thoroughly before proceeding with a cloud-based BI solution.


© Performance Architects, Inc. and Performance Architects Blog, 2006 - present. Unauthorized use and/or duplication of this material without express and written permission from this blog's author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Performance Architects, Inc. and Performance Architects Blog with appropriate and specific direction to the original content.

Your Guide to Oracle Developer Tools User Group (ODTUG) Kscope17

June 7, 2017

Author: Melanie Mathews, Performance Architects

Oracle Developer Tools User Group (ODTUG)’s annual Kscope17 conference is coming up soon, and you’re probably starting to try to build your schedule! Since there are a bunch of networking opportunities and special events at Kscope, we thought we should share some of the ‘must attend’ happenings!

Keep in mind that Performance Architects’ focus is the business analytics arena, which we define as the intersection of business intelligence (BI), data discovery, and enterprise performance management (EPM)…and in Oracle product speak, that means Oracle Business Intelligence, Endeca Information Discovery (OEID), and Hyperion, as well as supporting/enabling technologies such as data integration, data warehousing, etc…so if you’re focused on other areas, this blog post may not provide a comprehensive guide for you!

This year Kscope17 is hosted in sunny San Antonio and if you are lucky enough to get in a couple of days early, take advantage of the absolutely gorgeous resort that is hosting this year. Perhaps some cocktails while you float in the JW Marriott’s lazy river?! We also recommend kicking off your conference experience with the Community Service event on Saturday.  This year, ODTUG has partnered with Haven for Hope in its mission of radical compassion, driven by hope, to move people from a feeling of worthlessness to the pursuit of excellence.  Haven for Hope provides drug treatment, job training, education, medical services, and so much more all on one campus. They strive to provide participants with the opportunity to better their own lives by moving from homelessness to full employment and stability in their lives. You will have several options to help at Haven for Hope: activities such as meal prep, campus beautification, helping in the free store, serving coffee, and playing bingo with residents.

Sunday is where you can really start to dig in and to get the benefit of the conference experience.  The first “must do” of the conference is the BI and/or EPM Symposiums. These are all-day, substantive learning opportunities that provide a forum to interact with Oracle product management and other experts in these solution areas that you won’t experience during other times at the conference.

In terms of the conference sessions that kick off Monday, there are simply too many to list in this post.  For those of you who don’t get to attend more than one Oracle conference in a year, we strongly recommend that you try to fit in Oracle’s product and technology roadmap sessions in whatever product areas interest you.  Below is a list of our sessions:

  • Monday, Jun 26, 2017, 10:30 am – 11:30 am (Grand Oaks I): Cloudy with a Chance of PBCS: Tips and Tricks for Migrating On-Premise Oracle Hyperion Planning to Oracle Planning and Budgeting Cloud Service (PBCS)
  • Monday, Jun 26, 2017, 2:00 pm – 3:00 pm (Wisteria/Sunflower): It’s All About The Data: Oracle BI Cloud Service (BICS) Data Sync and Remote Data Connector
  • Tuesday, Jun 27, 2017, 8:30 am – 9:30 am (Magnolia): Advanced Data Visualization Techniques Using Oracle BI Cloud Service (BICS): A Cal State Pilot Case Study
  • Wednesday: Jun 28, 2017, 8:30 am – 9:30 am (Grand Oaks I): Moving The University of Pennsylvania Hyperion Planning On-Prem to the PBCS Cloud: A Journey of Consolidation, Migration, and Enhancements
  • Wednesday, Jun 28, 2017, 4:15 pm – 5:15 pm (Grand Oaks A/B): Diagnosis Close: How Vertex Pharma Uses Planning, Essbase, and OBIEE to Close the Books

We also want to invite you to play a fun game with us this year: KScope Session Quest! Simply stop by our Booth (#217) on Sunday or our first session on Monday and get your Performance Architects Session Quest card.  If you get stamped at each one of our five sessions at Kscope (or as many as you possibly can given your travel schedule), we will enter you into our drawing to win a $200 Amazon gift card! Simply email us a pic of your completed card to enter the drawing. We encourage you to take time to walk the exhibitor hall during breakfast/happy hours. This conference does a particularly good job of making sure most of the top consultants and software vendors are available to talk about the latest and greatest in this field in the exhibit area.

Monday night events are also a “must do” because they provide another good opportunity to mix and mingle with the folks in your functional/technical focus area in a fun, low-pressure environment. You will likely see us at the BI Texas-Style Trivia in Grand Oaks H and at the EPM Cloud Lounge in Grand Oaks G.

Lastly, Tuesday’s Lunch and Learn sessions are information-packed Q&A panels from 12:45 – 1:45 PM led by Oracle ACE Directors, ACEs, and ACE Associates.

Finally, Performance Architects is hosting a very special, invite-only friends and family dinner during ODTUG Kscope17. We are also happy to schedule some 1:1 time with you at the show if you have any particular questions.  For either item, please send an email to communications@performancearchitects.com and we’ll send you details.

Enjoy the show! – and we hope to see you there.

 


© Performance Architects, Inc. and Performance Architects Blog, 2006 - present. Unauthorized use and/or duplication of this material without express and written permission from this blog's author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Performance Architects, Inc. and Performance Architects Blog with appropriate and specific direction to the original content.