Generate a Lucene.NET index

NHibernate.Search is an awesome search API, but it's poorly documented. Besides a few blog posts, I have not had much luck getting answers to my NHibernate.Search questions. I hope this article can help you avoid some headaches.

This article assumes you have a working knowledge of NHibernate Search and Lucene.NET. If you’re looking for basics on NHibernate Search and Lucene.NET, just Google "NHibernate.Search" and/or "Lucene.NET" and you'll find what you're looking for.

To work with this code, you will need to find some the NHibernate.Search and Lucene.Net DLLs. I had a hell of time finding the NHibernate.Search source code, so it's your lucky day! I added some support files here. They probably aren’t the last versions, but they work.

NHibernate Search: NHibernateSearch.zip (2.60 mb)

The following console application generates a full lucene index based off an indexable NHibernate data abstraction layer. I wrote this utility to regenerate a lucene.net index for cases when my index and database get out of sync. This utility can be extremely helpful during initial development time when a lot of backdoor editing is happening to the database, and the application layer is not indexing the data when the database is being modified.

Here’s an example of a class from an NHibernate data abstraction layer which represents a glossary term. It’s important to note that this class is decorated a “Indexed,” this attribute indicates that this type of entity is to be indexed into the lucene index.

imports NHibernate.Search.Attributes

<Serializable()> _ 
<Indexed()> _ 
Public Partial Class GLOSSARYEntity

    Private _Id As System.Int32
    Private _Topic As System.String
    
    <DocumentId()> _ 
    Public Overridable Property Id() As System.Int32
        Get
            Return _Id
        End Get
        Set(ByVal value As System.Int32)
            _Id = value
        End Set
    End Property
    
    <Field(Index.Tokenized, Store := Store.Yes)> _ 
    <Analyzer(GetType(Lucene.Net.Analysis.Standard.StandardAnalyzer))> _ 
    Public Overridable Property Topic() As System.String
        Get
            Return _Topic
        End Get
        Set(ByVal value As System.String)
            _Topic = value
        End Set
    End Property
    
End Class

Here’s the code for the console application. First off, the program looks for an old index and deletes the directory if it exists. The program than loads the assembly housing the data abstraction layer, and creates an NHibernate Search Session.

Once everything is initialized, the application loops through all the Types in the assembly’s exported types collection. If the current Type is decorated as "Indexed", all of the entities for that particular data entity class are added to the lucene index.

Imports System.IO
Imports NHibernate
Imports System.Reflection
Imports NHibernate.Event.ListenerType
Imports NHibernate.Search.Environment
Imports Lucene.Net.Analysis.Standard
Imports NHibernate.Search.Attributes

Public Class LuceneIndexer

    Public Sub Main()

        Console.WriteLine("Indexing Data...")

        Dim sIndexDirectory As String = "C:\inetpub\wwwroot\MyWebSite\index"
        If (Directory.Exists(sIndexDirectory)) Then
            Dim oDirectoryInfo As DirectoryInfo = New DirectoryInfo(sIndexDirectory)
            oDirectoryInfo.Delete(True)
        End If

        Directory.CreateDirectory(sIndexDirectory)

        Dim sConnectionString As String = "user id=;password=;data source=;initial catalog="
        Dim sAssemblyName As String = "MyAssembly"
	Dim bIndexable As Boolean = False
        Dim oIndexObjects As IList
        Dim oFSDirectory As FSDirectory = Nothing
        Dim oIndexWriter As IndexWriter = Nothing
	Dim sIndexPath As String = ""
	Dim bpass As Boolean = False
        Dim oDALAssembly As Assembly = Assembly.Load(sAssemblyName)

        Dim oConfig As New Configuration()
        oConfig.SetProperty("connection.connection_string", sConnectionString)
        oConfig.AddAssembly(sAssemblyName)
        oConfig.SetProperty(AnalyzerClass, GetType(StandardAnalyzer).AssemblyQualifiedName)
        oConfig.SetListener(PostUpdate, New FullTextIndexEventListener())
        oConfig.SetListener(PostInsert, New FullTextIndexEventListener())
        oConfig.SetListener(PostDelete, New FullTextIndexEventListener())

        Dim oSessionFactory As IFullTextSession = oConfig.BuildSessionFactory()

        For Each oType As System.Type In oDALAssembly.GetExportedTypes()

            bIndexable = oType.GetCustomAttributes(GetType(IndexedAttribute), False).Length > 0

            If (bIndexable) Then

                sIndexPath = Path.Combine(sIndexDirectory, objType.Name)
		IO.Directory.CreateDirectory(sIndexPath)

                oFSDirectory = FSDirectory.GetDirectory(sIndexPath, True)
                oIndexWriter = New IndexWriter(oFSDirectory, New StandardAnalyzer(), True)
                oIndexObjects = oSessionFactory.CreateCriteria(oType).List()

                For Each oIndexMe As Object In oIndexObjects
                    bpass = False
                    While bpass = False
                	Try
                	    bpass = True
                	    oSessionFactory.Index(oIndexMe)
               		Catch ex As Exception
                            Debug.WriteLine(ex.Message)
                            Debug.WriteLine("")
                	End Try
                    End While
                Next

		oIndexWriter.Optimize()
                oIndexWriter.Close()
                oFSDirectory.Close()

                If (File.Exists(sIndexPath + "\segments.new")) Then
		    File.Move(sIndexPath + "\segments.new", sIndexPath + "\segments")
		End If

            End If

        Next

        Console.WriteLine("All Done. Press a key to exit.")
        Console.ReadLine()

    End Sub

End Class

This utility is great for design time and initial development. You know your project is moving along when you end up using this application less and less…

Tags: ,
Categories: NHibernate Search

Permalink E-mail | Kick it! | DZone it! | del.icio.us Comments (0) Post RSSRSS comment feed