Arch2Arch Tab BEA.com
Syndicate this blog (XML)

Death from a thousand cuts

Bookmark Blog Post

del.icio.us del.icio.us
Digg Digg
DZone DZone
Furl Furl
Reddit Reddit

Hussein Badakhchani's Blog | July 28, 2006   7:36 AM | Comments (11)


Your log files contain thousands of errors and warnings but when you confront the application teams they insist you ignore the messages as everything "works". The next day your application deployments hang and servers fail to start. After countless retries your domain splutters into life. If this is sounds familiar to you then make no mistake your domain is suffering a death from a thousand cuts.

It's a common experience for BEA professional service staff to arrive onsite to undertake an audit, health check or perhaps to assist with a production issue to find thousands of errors and warnings once they gain access to the log files.

After a quick tour of the clients bug tracker they wont be surprised to find only one or two of the errors they found in the logs. These errors are in a fixed state and the comment attached instructs the reader to "Ignore this error".

So what are these errors and warnings that you can "ignore"? Well, the ones I normally spot are malformed or poorly written deployment descriptors. Often you find old versions of WLS DTD are being used, undefined role mappings and transaction attributes. The most common error I see are NotSerializableExceptions.

Deployment descriptor warnings can multiply like a virus. In my experience many developers do not consider DDs as genuine development artefacts. They simply want to be able to deploy their miserable applications on eclipse and get on with writing shoddy code. The DDs are shared amongst themselves and used in new projects. Outsourced code seems to be a particularly infectious, probably because quality controls focus on functional requirements rather than non functional or operational requirements and the people responsible for QA don't know anything about DDs anyway. The upshot of all this is that within a few months multiple WLS domains will become infected. Application errors tend to be less infectious but more virulent. An infected server can have thousands of repeating errors in its logs. The effect can render the logs useless.

All these warnings and errors have a detrimental effect on the performance of the domain. Poorly written deployment descriptors can cause servers to hang during deployment or start-up, open security holes and add to the start-up time of your servers. Application exceptions that fill log files make the task of debugging a lot harder, generate needless IO and consume heap memory, possibly causing leaks. The sum of all these errors can have a crippling effect on a WLS domain.

Many of these problems can be dealt with before they leave the developers machine. The DD warnings and errors often slip past quality gates as result of using the deprecated ejbc and jspc tools in your build process. The appc tool is far stricter in its validation criteria and will help enforce your coding standards assuming, of course, you have them.

Application errors should never be considered as benign simply because under very low load (1 developer on his windows machine) they are perceived to have no impact on functionality. You can use lots of different measures to assess the ability of your development team, my sugestion is to take a look at the server log file, make sure the log level hasn't been turned down, this "fix" is more common than you might expect!

If you are restarting clusters/servers in your domain more often than you should be to meet your 24/7 SLA the explanation is probably in your log files, don't ignore them!


Comments

Comments are listed in date ascending order (oldest first) | Post Comment

  • I couldn't agree more with what you saying. I recently came across an application which is deployed with malformed DD warnings. I think it is also common to see log entries - obsolete MDB trying to connect with non existent JMS destination for months and months long!!!

    Posted by: patel_jayesh_j on July 28, 2006 at 2:36 PM

  • I have been installing WLS 9.2 recently and I think your observation regarding the MDB and missing JMS Destination is an "out of the box feature" of running the sample wl_server domain:

    Jul 28, 2006 11:52:45 PM BST Warning WorkManager BEA-002919 Unable to find a WorkManager with name weblogic.wsee.mdb.DispatchPolicy. Dispatch policy weblogic.wsee.mdb.DispatchPolicy will map to the default WorkManager for the application bea_wls9_async_response

    How can we expect people to write error/warning free code if the sample applications we provide for them are littered with errors? Surely someone at BEA must have thought twice about shipping out this domain or was it part of the troubleshooting course :)

    Posted by: hoos on July 28, 2006 at 4:01 PM

  • I argue here that most folk simply don't want to trawl through log files. There's too much junk, and they're too big. So how about simply defining a nice set of rules in WLDF. Perhaps we should have a standard set of WLDF expressions that catch all the common errors.

    Posted by: jonmountjoy on July 29, 2006 at 8:01 AM

  • WLDF sounds good to me, it beats having to write shell scripts that grep for patterns (normally BEA message codes). I will certainly be looking into using it in the WLS 9.2 domains I am currently deploying.

    Creating a set of rules for WLDF would make a good codeshare project, especially if it included what action to take when the error/warning is detected. However I think the central problem is all that junk in the log files in the first place.

    Often what a developer considers junk is actually a valid warning and symptom of a non-functional defect that could, for example, be causing deployment failure. The situation is compounded by the fact that application logging and error handling are two areas where software architects/designers tend to fail badly.

    Posted by: hoos on July 29, 2006 at 1:44 PM

  • Sorry about the cross post on the BEA-002919 warning. The sample applications, tutorials should be easy to get started without any issues. (am using unix.)

    Posted by: ramjean on August 8, 2006 at 8:01 AM

  • Hello ramjean. Just to clarify I didn't have any problem with the sample applications, they seemed to work. I was just pointing out that these applications, that will be used by beginners, have errors and warnings in the log files. This can only send the signal that it is ok to have logfiles full of errors as long as the application seems to meet functional requirments.

    My point is that these errors and warnings indicate poor design and a general slackness/ignorance on behalf of the people that worked on the application. Perhaps more importantly such errors will eventually destabilise the domain and interfere with it's operational perfotmance.

    Posted by: hoos on August 8, 2006 at 11:19 AM

  • Hi: I downloaded weblogic server9.2, installed and started the server. I ran in to the aforementioned problem: Aug 14, 2006 12:04:45 PM PDT Warning WorkManager BEA-002919 Unable to find a WorkManager with name weblogic.wsee.mdb.DispatchPolicy. Dispatch policy weblogic.wsee.mdb.DispatchPolicy will map to the default WorkManager for the application bea_wls9_async_response This is not the portal product. Could anyone post on how to fix this. Thanks. PS: When I download a server and try to start it the very first time I was expecting to start it up withot seeing any warnings/error messages. Well, looks like it is only a dream.

    Posted by: vawind on August 14, 2006 at 12:46 PM

  • Hi: I downloaded weblogic server9.0 for Intel Solaris 10, installed and started the server. Below sentences is coming: I ran sample weblogic server only, Could anyone post on how to fix this???

    Posted by: krithikvijay@atmail.com on December 28, 2006 at 7:19 AM

  • Hi: I downloaded weblogic server9.0 for Intel Solaris 10, installed and started the server. Below sentences is coming: DEC 28, 2006 12:04:45 PM IST Warning WorkManager BEA-002919 Unable to find a WorkManager with name weblogic.wsee.mdb.DispatchPolicy. Dispatch policy weblogic.wsee.mdb.DispatchPolicy will map to the default WorkManager for the application bea_wls9_async_response I ran sample weblogic server only, Could anyone post on how to fix this???

    Posted by: krithikvijay@atmail.com on December 28, 2006 at 7:21 AM

  • Hi,

    There are 2 ways to get rid of this harmless but annoying error message.

    1. Create a work manager with the name "weblogic.wsee.mdb.DispatchPolicy" and target it to the servers with showing the message.

    2. In you setDomainEnv.cmd file, add a line before the JAVA_PROPERTIES is set:

    set EXTRA_JAVA_PROPERTIES=-Dweblogic.wsee.bind.suppressDeployErrorMessage=true -Dweblogic.wsee.skip.async.response=true

    Rgds,

    Tom

    Posted by: tcoppock on February 20, 2007 at 3:36 AM

  • Thanks Tom, I'll try this out. Whats the impact of creating this work manager? I assume very lttle but I don't know much about work managers or the point of them.

    Posted by: hoos on February 20, 2007 at 11:08 AM



Only logged in users may post comments. Login Here.

Powered by
Movable Type 3.31