Introduction
Lucene is an open source search engine framework available from Apache.org under the Jakarta family of open source projects. Lucene provides libraries for searching and indexing of different types of files and data providers.
Because Lucene is a search framework and requires some work in order to be up and running. Do not expect to find an installer or GUI tools to configure and run the search engine. However, Lucene offers a very straight forward manual installation with simple configuration, and ultimately provides very powerful set of search APIs.
The downloadable package provides a few modules to allow indexing of text files and html content stored locally. Additional custom modules can be created or downloaded from the internet. A good example is the LARM plug in that allows for integration of crawler functionality with Lucene.
Lucene comes with two main services available: indexing and searching. The indexing tasks are done independently from the search tasks. Both the index and search services are available so that developers can extend them to meet their needs. Lucene is written in 100% Java with emphasis on performance.
Text indexing is the area of Lucene focused on building a searchable index. The index works as a repository created for high performance content queries. Lucene exposes a rich API to interact with the information stored in the index. You can manage the index to be as basic as listing the document name and its abstract or as rich as storing the full document and additional related metadata about that document. For example, the additional metadata could be ranking information, so that certain documents would show higher than others in the search results.
Text searching creates a query that contains a collection of terms that the user is looking for in the index. The index repository is built for high speed look ups and the results can be returned with relevancy ranking. Lucene supports several types of searches that are common in the industry. Some of the main search types are listed below:
- Wildcard: Lucene supports single or multiple wildcard searches
- Fuzzy: Fuzzy searches are based on the Levenshtein Distance, or Edit Distance algorithm.
- Proximity: Lucene supports finding words are a within a specific distance away.
- Ranging: Range queries allow one to match documents whose field values are between the lower and upper bound specified by the range query.
- Boosting a term: Lucene provides the relevance level of matching documents based on the terms found.
- Boolean operators: Boolean operators allow terms to be combined through logic operators. Lucene supports AND, "+", OR, NOT and "-" as Boolean operators.
Getting Started
To start taking advantage of Lucene you will need to first download the Lucene jar files and example search modules. They are available from:
http://jakarta.apache.org/lucene/docs/index.html
Or
http://cvs.apache.org/dist/jakarta/lucene/v1.3-final/
At the time this article was written the version used was Lucene 1.3. Newer versions of the libraries maybe available for download. It is a good idea to also read the file "changes.txt" to learn more about what has changed since previous releases and about the fixes added to the 1.3 release.
Adding Lucene to the Sampleportal in WebLogic Portal 8.1 sp2
|
Start by downloading the sample code zip file associated with this article which includes:
|
Download the author's files associated with this article: |
Follow the next set of instructions to install Lucene and the Lucene sample.
1. Set JAVA_HOME to point to your java installation in order to run the index engine against the sample content.
2. You can use the Lucene libraries in the sample package or you can get the latest package from the Lucene site. Once you have the Lucene jar files (lucene-1.3-final.jar, lucene-demos-1.3-final.jar) drop them into:
<install_drive>\bea\weblogic81\samples\portal\portalApp\sampleportal\WEB-INF\lib
3. To run the indexer drop both jar files from step 1 in your Java CLASSPATH for indexing
4. Drop the file "LuceneSearch" in the root level of Sampleportal. You should have something like the path below:
<install_drive>\bea\weblogic81\samples\portal\portalApp\sampleportal\LuceneSearch
Inside "LuceneSearch" you will have a directory called "Content" with some sample content text files:
<install_drive>\bea\weblogic81\samples\portal\portalApp\sampleportal\LuceneSearch\Content
The index has been created as well in case you don't want to take the time to run the index on the next step.
5. Run the index on your content directory by executing this from a command line inside the Sampleportal webapp:
<path to your webapp> java org.apache.lucene.demo.IndexFiles <path to your content>
Should look something like:
<install_drive>\bea\weblogic81\samples\portal\portalApp\sampleportal java org.apache.lucene.demo.IndexFiles <install_drive>\bea\weblogic81b\samples\portal\portalApp\sampleportal\ LuceneSearch\Content
For this example the sample package includes a set of simple text files with content related to the BEA line of products.
6. After the indexing task is completed the result should be an index directory created at the root of your content directory:
<install_drive>\bea\weblogic81\samples\portal\portalApp\sampleportal\ LuceneSearch\Content\index
7. Once the indexes have been created go back and configure a portlet to talk to the index. The sample package has a simple Java Page Flow example that talks to the Lucene engine and assume the index is inside the Sampleportal directory. Launch the WebLogic Workshop and from the menu bar >> Select File Open Application >> Pick the PortalApp application which contains the Sampleportal.
8. You may have to adjust the code inside the Java Page Flow and the JSP pages to map to your environment. The result page is based on the results.jsp page that is available on the Lucene site. After any adjustments have been made the next step is to add the Lucene Search portlet to the Sampleportal. This can be done from WebLogic Workshop by dragging the portlet to the Portal Designer, or by right clicking the Portal designer and selecting from the list of portlets.
Next from the Portal Menu select >> View this Portal:
9. After the portal comes up you can try the search portlet and see results:

How does BEA use Lucene on WebLogic Portal 8.1?
WebLogic Portal 8.1 uses Lucene standard features in the Administration Portal help system. The content in the help system is HTML based documents and the index is located in the Administration Portal web application. Due to the fact that the Administration Portal is deployed in a compressed packaged .war file additional steps were taken to allow Lucene to work with the index files.
The contextual help system can be accessed at any time from the Administration Portal via a help icon in the right upper corner. You can see in the figure below that the help system returns a set of search results with a pagination system. In addition there are a collection of search tips to assist in improving your search results.
Conclusion
The Lucene search engine is a 100% Java based search framework that can be easily integrated with your web applications. Additional plug-ins are available from the internet to expand the capabilities of the search libraries and the ability to fetch data from different sources.
You can get the Lucene libraries and additional information from the links below:
Lucene Home: http://jakarta.apache.org/lucene/docs/index.html
Performance Benchmarks: http://jakarta.apache.org/lucene/docs/benchmarks.html
FAQ: http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi
Sites running Lucene: http://wiki.apache.org/jakarta-lucene/PoweredBy





