Ah, that’s how the Canonical Data Model fits in!
Brian LeGros | July 18th, 2007 | programmingSo over the last few months, myself and few other developers at work have been brainstorming on approaches to implementing an SOA. Now before I say anything more, realize I haven’t read any books on best practices as far as building an SOA from scratch; I’m just getting into the area and still have mixed feelings about the approach. In any case, I was listening to good old Software Engineering Radio on the way home today (SOA Part 2a) and they were able to answer a question I had in the back of my mind for a while.
When the challenge of coordinating services came up, I was able to understand the purpose of an ESB and how helpful it could be as a tool. When the issue of communication and protocols came up, JMS seemed like a very natural and loosely coupled solution (I definitely liked the idea of messaging over RPC). With Max working on a PoC with Mule and ActiveMQ, I felt the need to explore the issue of a canonical data model and the challenges that came with using one.
From the SER episode I listened to today, I felt reassured that this portion of the SOA was definitely a difficult one to incorporate. What I was trying to get my head around was a way in which common data types could be exchanged between business processes and how the definition for those types could be shared. SER pointed out that there are a few aspects of the shared data model you have to consider:
- Business processes may require semantically different data attributes for similar data types (i.e. - a Customer data type may require different information in a Data Indexing service than an Order Processing service)
- Data types may require different views of their data attributes based on external factors such as quality of service (i.e. - a call center service may only need name, phone number, and last 4 of your SSN for performance reasons but an Billing service would require a more complete view of the Customer regardless of performance).
SER recommended approaching this issue from a modeling perspective and then utilizing versioning and something as ubiquitous as XML (via XSD, or maybe even OWL) to represent an additional data layer to be used by the SOA. Services can retain their own domain classes which could be based on data structures defined in XML. SER suggested that the lack of a rigidly typed Java interface, for example, offers more resilience when it comes to versioning the canonical data model. They went as far as to suggest that versions of the data model could be marked as semantically similar or different. If a version of a data type was deemed semantically similar then it may be possible to for its consuming APIs to upgrade to the latest version without the need to refactor. I did worry that versioning could potentially get out of hand and in the end all we’d be doing is producing data types which were tightly coupled to business process, but from what I’m hearing that may not be the case.
Additionally, they suggested that a canonical data model be evaluated for use within selective domains in the enterprise. In doing so, it may be easier for business processes which have a dependency on a similar data type to share the same data type definition without the baggage of needing an act of God to change the data type definition later down the road. This approach seems like a patch-work solution where a few groups agree on the same data type definitions rather than the entire enterprise, but I can’t think of a better approach from my limited exposure to SOA.
I realize I have lots to learn about this particular style of architecture, but SER is always a great listen. They definitely give me new perspectives on the topics I’m interested in. Keep up the great work guys.
Tags: design-patterns, oop, soa
Related posts

Discussion
July 20th, 2007 at 10:25 pm
Yep, that’s basically how I have come to understand it. Reading this makes me feel better, since I’m still getting my hands around this stuff too (obviously).
I like the idea of using Canonical Data Model with Content Enricher (quoting patterns from the Enterprise Integration Patterns book) in an event-driven system. The way I see this working, a system announces an event, and interested systems that have registered to listen for that event receive it. Before the “meat” of the listening systems receive the event, a Content Enricher pulls additional data as necessary and populates it in the event message. After the additional data has been retrieved, the Content Enricher hands the message off to the underlying system.
So, for example, a Customer Management system announces an event: “customer #123 just updated their address.” A message is broadcast containing nothing except for the customer ID, and identifying the type of event (Address Update).
The Order Management system receives this event on a Message Channel, and a Content Enricher specific to that system unpacks the message and retrieves the customer ID. It then interacts with other systems (via synchronous or asynchronous means) to gather the data required to enrich the message before handing it off to the Order Management system’s service layer. So, maybe it reaches back to the Customer Management system and says “hey, I need the complete new address so I can update the shipping address on any pending orders for this customer.” When the data is returned (in canonical format), the Content Enricher fills in the blanks, hands off the enriched message to the Order Management system, and the customer’s address is updated on all pending orders before they ship.
I like this sequence of events, since it keeps the event messages small and the couplings loose. For example, if we agree that any Customer events are only required to pass the ID in messages, then we can keep the messages small and not send extraneous data. Although additional round-trips would be required to fetch the rest of the data required by each system receiving the Customer messages, each system would be in control of the data they were interested in. We’d obviously need a Canonical representation of the Customer object in order for the additional data to traverse the wire and be processed, but we could allow the requesting system to specify a message format version, or perhaps request data from a channel that has a sequence of transformations in place to format the data appropriately. Also, as a system scales and we see common use cases requiring similar data, we can refactor the canonical model to include more or less data as we see fit. Peronally, I like the approach of starting with very little data, and incrementally adding to it over time as performance/convenience needs expose themselves.
Ultimately, however you slice it, the couplings stay loose and you can make your app as chatty or chunky as you see fit - plus, you can do this on a per-app basis with message format versioning. It’s a neat approach to system integration and I’m really interested in seeing it come in to practice in some of our systems.
- max
July 21st, 2007 at 12:43 pm
@Max - Yeah I’m definitely excited too. I guess the issue I had trouble resolving in my head was how the data model would exist in the overall architecture. I couldn’t figure how we’d define our data model independent from the other systems but still have it consumed by them. SER was able to help me bridge the gap by explaining how XSD/OWL came into the picture to create the shared model and then use versions for semantic differences and views for “enriching” and “restricting”.
I think the use of Content Enrichers and Restrictors and Normalizers will definitely help with the “view of the data” part of what I was talking about. I’m thinking UML is the easiest first attempt to get something that will work to define the model and translate to XSD/OWL. If I can find time, I really hope I can see a use for OWL since its so much more semantically meaningful and could eventually help us make more assumptions when it comes to code generation. I’m sure there is probably some work I could do to learn more about meta-modeling too, so lots of fun stuff to learn.
July 23rd, 2007 at 3:36 pm
With all your study on the semantic web, I’d like to see an OWL post soon if you can rustle one up. I’m completely unfamiliar, and too lazy to Google it, so demanding blog content from you is obviously the next step.
- max
Add A Comment