copyright | lastupdated | keywords | subcollection | ||
---|---|---|---|---|---|
|
2020-10-23 |
getting started tutorial, Master Data Management, MDM, Cloud Pak for Data as a Service, IBM Cloud |
mdm-oc |
{:shortdesc: .shortdesc} {:new_window: target="_blank"} {:codeblock: .codeblock} {:pre: .pre} {:screen: .screen} {:tip: .tip} {:note: .note} {:external: target="_blank" .external}
{: #getting-started}
{{site.data.keyword.mdm-oc_full}} (Beta) on IBM Cloud Pak for Data as a Service enables you to establish a single, trusted, 360-degree view of your customers — a digital twin. This getting started tutorial walks you through the steps of setting up and using {{site.data.keyword.mdm-oc_full}} to onboard, match, and explore your master data. {: shortdesc}
The {{site.data.keyword.mdm-oc_full}} Beta lite plan allows you to create one service instance per account and process up to one million records. Beta lite plan services are active for 60 days or until the Beta period is over, whichever comes first. Beta lite plan services will be deleted after 30 days of inactivity. {: note}
For more information, see About Master Data Management.
For detailed instructions and information about using the {{site.data.keyword.mdm-oc_full}} service, see Managing master data in the Cloud Pak for Data as a Service documentation application.
{: #prereqs}
- Go to dataplatform.cloud.ibm.com.
- Log in with your personal {{site.data.keyword.Bluemix_notm}} credentials or create an account.
- Create a {{site.data.keyword.mdm-oc_full}} service instance. For details, see Creating services in the Cloud Pak for Data as a Service documentation.
- Go to cloud.ibm.com.
- Log in with your personal {{site.data.keyword.Bluemix_notm}} credentials or create an account.
- From your dashboard, click Create resource, then choose Services from the left navigation pane.
- Select the Master Data Management tile to create a {{site.data.keyword.mdm-oc_full}} service instance.
The first step in setting up {{site.data.keyword.mdm-oc_full}} is to create your master data configuration asset. The configuration asset is where you will onboard data sources, map your data into the system, customize your data model, and set up and tune the matching algorithm.
-
Go to dataplatform.cloud.ibm.com.
-
Under My services, click View all to open the service instances associated with your account.
-
Click on your {{site.data.keyword.mdm-oc_full}} instance to open the Launch page.
-
Click Launch to open the master data home page.
-
Click Set up master data to create your configuration asset.
You must have the correct adminstrator privileges to be able to create and configure a configuration asset. {:note}
-
Review the service instance name. Optionally, rename it to be more descriptive. Click Next.
-
Select an existing Cloud Pak for Data as a Service project to use with this {{site.data.keyword.mdm-oc_full}} service instance or create a new one by clicking +. Click Next.
-
Optionally, you can associate your {{site.data.keyword.mdm-oc_full}} instance with a catalog. Choose a catalog from your associated Watson Knowledge Catalog instance, or create a new one by clicking +. If there is no associated Watson Knowledge Catalog service, you can create one.
-
Click Finish.
You've now created your master data configuration asset. Let's get it set up and match some data!
In this step, we'll add a flat data file in CSV or TSV format. If you have a data file containg customer records already, you can use that.
If you don't have a data file ready to go but want to get started using {{site.data.keyword.mdm-oc_full}}, you can skip this step and load the provided sample data and model instead. From the master data home page, go to the Master data tile, then click Publish sample model. After the model loads, click Publish sample data. {: note}
- From the master data home page, click Configuration to open the Data setup screen. Click Start with data assets.
- Click Add data or the Find and add data icon in the action bar at the top of the screen.
- From the Data panel that opens, choose whether to add data by upload, from the project, or from the catalog. For this tutorial, choose Load to upload a data file.
- On your local machine, select a flat data file in CSV or TSV format and drag it into the Data panel. When the file finishes uploading, it is added to your assets summary list.
- Review the details of your newly added asset. If your asset does not have any information in the Asset record type column, you must define the record type.
- Select your asset in the assets summary list.
- Click Assign record type and select the correct record type from the list. If the appropriate record type is not in the list, then you might have to customize your data model.
When you onboard your first data asset, {{site.data.keyword.mdm-oc_full}} automatically generates the data model using a combination of industry standard model attributes and embedded Watson technology. When you upload additional data, the model will intelligently adjust itself to accomodate newly populated attributes and fields. You can always customize the model to match your organization's requirements by adding new record types, attributes, and fields.
- On the Data setup screen, click the Modeling tab.
- Review the current model's record types and attribute types.
- From here, you can:
- View or edit existing record types or create new ones. By default, the data model includes definitions for Person and Organization record types.
- View or edit existing attribute types or create new ones. You can add or remove fields in each attribute type to reflect your organization's data model requirements.
- When you are done, click the publish model icon in the action bar at the top of the screen.
Each data source or asset must be mapped and loaded into the data model before it can be used in MDM functions such as matching. {{site.data.keyword.mdm-oc_full}} includes a powerful automapping capability that removes the need for data engineers to manually map each column of data into the model. The automapping feature detects, analyzes, and categorizes each column of data to the corresponding attributes or fields in the data model. Before you can run automapping, you must profile your data.
-
On the Data setup screen, click the Mapping tab.
-
From the Asset list in the left panel, select the data source that you want to map into the system. The data from the file displays in tabular format with a number of rows and columns. Each column represents an attribute that must be mapped to a corresponding attribute type in the data model. When you first open a data source or asset, each column is marked with a Not Mapped tag.
You can manually map each column if you choose, but you can greatly speed up the mapping process by taking advantage of the automapping feature. {: tip}
-
To enable automapping for this asset, you must first profile the data. Click Profile. Profiling analyzes and classifies your data to enable the automapping process to take place. Profiling can take some time to complete, so it runs in the background to allow you to continue working. You might want to start reviewing and manually mapping some columns.
Automapping will never overwrite any manual mapping that you have done. {: note}
-
When profiling completes, click Auto map. {{site.data.keyword.mdm-oc_full}} analyzes your data and automatically maps as many columns as possible into the data model. Even if it cannot map a given column, the automap function can suggest some of the most likely mapping selections.
-
Review the automapping. If any of the mappings are incorrect, or if a column remains unmapped, then manually map it correctly. Alternately, if a given column is not required, you can exclude it from your MDM data load.
-
To manually map a column, select it, then use the Mapping targets panel on the right to search for and select the appropriate attribute or field from the data model. Click Map and save to data model.
Scroll right and left through the columns to ensure that every column in your data source is mapped. {: tip}
-
When you've finished mapping the data source, you're ready to publish the data into the system.
-
Return to the data setup Overview page by clicking the Data setup page title and selecting Overview from the list.
-
On the Overview page, confirm that you have at least one data source or asset added and mapped.
When your data is mapped and published into the {{site.data.keyword.mdm-oc_full}} service, you can run the powerful matching process on the data. The matching process analyzes your data to determine if there are any duplicate records in your data. Suspected duplicate records are merged into master data entities to establish a single, trusted, 360-degree view of your customers. Each entity contains one or more records.
Before running matching, ensure that you have published your data model and sources to the {{site.data.keyword.mdm-oc_full}} system. {: tip}
-
On the Data setup Overview screen, click the Overview page title and select Matching setup.
-
Go to the Match settings tab to select the attributes to use in matching data. The first time you navigate to this tab, {{site.data.keyword.mdm-oc_full}} will automatically generate some suggested attributes from your data model to use in matching.
-
Review the list of matching attributes. These attributes will be used as the basis of comparison to match records and create master data entities. To add or remove attributes from the list, click Select attributes then select or deselect attributes as needed.
-
When you are satisfied with your matching attributes, click the run matching icon in the action bar. The matching process will take a while to complete. It will run in the background so that you can continue working. You'll be notified when it's complete.
-
When matching is complete, go to the Match results tab to see a dashboard of statistics and visualizations to provide insight about your master data.
You can adjust your matching algorithm at any time by editing your matching attributes. {: tip}
As you add more data sources and assets to your {{site.data.keyword.mdm-oc_full}} system and rerun matching, the new data will be matched both within itself and against the existing data in the system. In this way, you can build a unified, single, 360-degree view of your customers across your entire enterprise.
After a data engineer has configured the {{site.data.keyword.mdm-oc_full}} service, loaded and mapped data, and run matching, a business analyst or data steward user can explore the master data to search, view, and analyze it.
- From the master data home page, click Search master data to open the master data explorer.
- Search within your data to find data to explore. You can choose whether to search for entities or records, and you can either run a simple text search or an advanced search using rules.
- From your search results list, you can:
- Click a row to see details of the entity or record.
- Use the row's three-dot menu or Explore icon to select an entity or record for further exploration in the Explore tab. When you send an entity or record to the Explore tab, you can more closely review its details and compare it to any other entities or records in the Explore tab.
- Choose the Explore tab to review and compare the details of any entities or records that you selected for exploration.
- Select any of the entities or records in the Entity explorer panel to view their detailed attributes.
From the master data explorer, you can also:
- Export data.
- Add individual records.
- Edit individual records.
For detailed instructions and information about setting up and using the {{site.data.keyword.mdm-oc_full}} service, see Managing master data in the Cloud Pak for Data as a Service documentation application.
For information about the {{site.data.keyword.mdm-oc_full}} API, see the {{site.data.keyword.mdm-oc_full}} API reference documentation.
For information about Cloud Pak for Data as a Service, see the Cloud Pak for Data as a Service documentation.