28 May Data and Modeling
Data modeling is essential in the early stages of any information system design. By moving toward a data-centered model, we can make our data, and our system, smarter. There are many data-modeling techniques, but we will focus on two for now:
- Entity Relationship Diagramming (ERD), and
- Object Role Modeling or object relational modeling (ORM).
Entity relationship diagramming is used for flat and relational database models (see Section 9 for the distinction). The example at left shows three tables: Customers, Orders and Details. Each table has a relationship with one other table based on the information in one of the fields. The field that connects customers to orders is Cust_ID, which is an identification number that uniquely identifies each customer. The field that connects orders to details is Order_ID, which is an identification number applied to each order to uniquely identify it. By establishing these relationships, we can describe how things happen in the real world to a computer database system in a way that will help maintain data integrity.
|Understanding Context Cross-Reference|
|Click on these Links to other posts and glossary/bibliography references|
|Prior Post||Next Post|
|Machine Components for Intelligence||Framing Formal Logic|
|modeling information system||Hoberman 2008 Holowczak|
|knowledge words context||Fiammante 2010|
|ERD ORM normalization||Roshen 2009 Terry Halpin|
Data “normalization” is the process of saving storage space and making updates cleaner by ensuring that repeating groups are stored in separate tables and the same data is not stored in too many different places. In the illustration above, even though invoices are associated with customers, you don’t store them in the “Customer” table. If you did, the customer table would need to have reserved space for an arbitrarily large number of invoices. Furthermore, the customer data is not stored in the invoices table. If it were, you would have to duplicate the customer name and address and e-mail, etc., with every invoice. Then, when the customer changes her e-mail, which is a real possibility, you may have to change it for all the invoices in which the e-mail is no longer correct. There are much more efficient ways of doing this through data normalization (see Holowczak’s tutorial).
ORM goes a step beyond ERD: it explicitly describes how each element relates to it’s immediate neighbors to fit into both the information scheme and the process model. ERD has a seldom-used notation for the nature of the relationship, but it is not as rich as ORM. While ERD is useful for databases, it does not provide enough information about the data to be useful in knowledge-based system design. ORM is a more flexible tool for data modeling and knowledge representation.
Here is a more complete example ERD for orders:
You may notice that this data model says nothing explicitly about the customer. Customer information may be in a set of tables associated with “Order Relationships” or tied to the Order through the “Profile_ID”. In any case, once you adjust for the underscores and abbreviations, you have a pretty good idea about what content is likely to be in each of these tables. The ERD does not tell you how the tables are interrelated – just that they are interrelated with specific keyed relationships that define the hierarchy.
Here is a small segment of an example ORM for orders:
- Customer (ID) places orders
- Customer has name
- Customer lives at address
- Order (ID) consists of Details
- Order is placed on Order Date
- Detail (ID) has one Item
- Item has a Quantity of 1 or more
- Item price is Unit Cost
- Detail Line Price is the cost times the quantity
Both ERD and ORM are important processes in developing a canonical model. ERD helps you understand the taxonomy of objects in which parent, child and sibling relationships are defined through primary and foreign keys that express arity. Arity in a parent child relationship is normally one-to-many (or one-to-one in some cases). Many-to-many relationships normally indicate siblings. ERD’s can show this pretty well, when used in combination with the table names, if the names are expressive of the content. ORM requires expressiveness both in naming the data elements and in naming their relationships. This additional expressiveness increases the analyst’s ability to show how the data can become actionable knowledge in the real world.
Canonical Model Development
Building canonical models is a critical first step in software solution development. Understanding the data and modeling it correctly can be the difference between success and failure in a project. A canonical model is, from a business architecture perspective, an agreed-upon organization of business concepts that define the meaning of the information processed in the core business systems. The organization may be represented in a simple framework such as ERD or ORM, or a more advanced framework such as RDF or Ontology.
With a canonical model at the center of an information architecture, data exchanges between disparate systems can be made clean and reliable, even when one or more of the systems undergoes change, such as a manufacturer’s upgrade. Steve Hoberman suggests that:
“The canonical data model is the definition of a standard organization view of a particular subject, plus the mapping back to each application view of this same subject. The standard organization view is built traditionally using simple yet useful structures. Employee and Contractor, for example, might be represented as Person Role; Order and Credit as Event; Warehouse and Distribution Point as Site. The canonical data model is frequently implemented as an XML hierarchy. Specific uses include delivering enterprise-wide business intelligence (BI), defining a common view within a service-oriented architecture (SOA) and streamlining software interfaces” (Information Management, July 23, 2008).
With a strong canonical data model, not only is data exchange between systems much more straightforward and clean, but it is easier to use automated tools, such as business process modeling (BPM) programs to automate tasks that involve data from multiple sources and processes. In fact, a complete and robust service-oriented architecture can be built around medium to small grained services, BPM and a canonical data model. Though some may argue this point with me, the same logic used to perform database normalization are useful in developing canonical models.
In a knowledge-based world where the words you use drive the apps, data and processes, a contextually aware canonical model of everything you care about will eliminate the current legacy. Today’s systems are based on a principle of “garbage-in, garbage-out”, in which users are required to read the minds of the system designers and act in lock-step with their ideas of the domain. This principle needs to be replaced with “This device will do the best it can with whatever you’re asking for, based on whatever words you use.” The age of knowledge will make this possible, and contextual canonical data models will be at the center of the knowledge revolution.
|Click below to look in each Understanding Context section|
|4||Perception and Cognition||5||Fuzzy Logic||6||Language and Dialog||7||Cybernetic Models|
|8||Apps and Processes||9||The End of Code||Glossary||Bibliography|