Semantics is Back
Fred Mikkelsen's Blog |
July 26, 2007 9:11 PM
|
Comments (0)
I recently read Bill Roth's JDJ story on Semantics. It is not a new idea, but big again. I feel that semantics has been under-utilized in computer programming. Primarily, it is under-utilized because semantics has little meaning without the context of intent, which is often hard to capture, and it's often non-algorithmic, meaning that a fully automated semantically-driven operation will probably fail at some point--the unstructured brings the unexpected.
Data transformation is a task that has slowed down systems integration for as long as there has been systems integration. If you're transforming from a record that has a field labeled first_name, to a record with a field named firstName, that transformation systems could automatically map that for you. As of a few years ago, the common tools I used would not do that, and wouldn't even map "firstName" to "firstName". By whittling away obvious choices, finding near matches, and possible matches, it seemed to me that a fair mapping of master-detail to master-detail could be done relatively quickly--especially if sample data were available in addition to the record schemas and descriptors. The basic premise of a transformation system is that the contents of the records are close and can be mapped. For a development tool or a plug-in connector for a web component in a rapid developement environment, that seems quite valuable.
From a semantic standpoint, our culture already recognizes (123)234-3343 as a phone number. 987-65-4321 as a SSN. 123.234.3343 may be a phone number, though 123.234.33.43 is probably an IP address. Real Estate Addresses are quite uniform and routinely parsed by mapping programs. The rules for the routine analysis of business and network data are quite good.
My Semantic Experience
I had an opportunity to employ a semantic analysis as a fifth-level support engineer for a wireless telecom provider. My job was to figure out how to solve tickets. The "long pole" in solving problems turned out to be just figuring out what was broken. Trouble tickets would be issued, or emails, or spreadsheets, or XML files with hexadecimal encoded values, and these may have been supplemented with additional notes on things tried, but usually getting to brass tacks, the phone, the billing account, and the provisioning system were all that was needed. It was either right or wrong in each of those systems, and no amount of trouble ticket commentary would change that.
One failed order may have 100 phones on it. A report from an enterprise customer may have a spreadsheet with 200 phone numbers on it. With a simple Java application I wrote, I would copy and paste the text of whatever had been presented to me in one window. Without pressing a button, the code would gather all the phone numbers, trouble ticket numbers, process IDs, GSM IDs, and other information that could be lexically interpreted. The output was a query to the ticketing system to retrieve those problems that were still outstanding. If the phone number, or trouble ticket ID, or GSM id, or a couple other values led to a ticket that was assigned to me, I'd fix it.
It turned out that you could identify large numbers of problems that could be fixed just by retrying the operation that failed. I extended the tool to figure those out, and produce the retry script. The off-shore team eagerly grabbed this tool because they could solve 100 tickets in the time it would take to solve 4. Their support numbers greatly improved overnight.
Web 2.0 and Semantics
Very exciting in the Web 2.0 world is the ability to inject semantics. We've all seen smart tags and sites like Digg and del.iscio.us add the injection of tagging. A tag keyword is, essentially, a semantic of what the page means to someone.
Tools like the one above could be built onto an existing system. And, the architecture committee could regulate the APIs accessed through governance. The back-end system integration can be pure and written with the goal of flawless operation. Engineering itself cannot take the time to identify all possible errors and which ones are trivially solved versus those that are difficult to solve. They'd never get anything done.
The collaborative effort of solving and fixing problems can do that better on the Web 2.0 side. The problems to solve do change with each update of the system. The legacy of working around every trivial production problem that ever was encountered does not want to be incorporated into the master system design.
Comments
Comments are listed in date ascending order (oldest first) | Post Comment
|