Programming & Development, Software Development

Leveraging Apache Lucene for Powerful Search in Your ASP.NET Application

Contributors

Mahmudul Hasan

Tech Stack

0 +

The search function is an indispensable feature for any website, be it an e-commerce platform, a blog, or a documentation site. Its necessity stems from the ever-increasing volume of content available on the internet. With websites growing larger and more complex, users need a convenient and efficient way to locate specific information within them. A well-implemented search functionality enhances user experience by saving time and effort. It allows visitors to quickly find products, articles, tutorials, or any other desired content by simply entering relevant keywords. By providing relevant search results, a website can help users discover new content and navigate through extensive archives. In this article, we will see how to implement an extensive search engine within an existing ASP.NET application by leveraging the power of Apache Lucene. But first, let’s talk about Apache Lucene itself.

What is Apache Lucene

Apache Lucene is a widely used open-source information retrieval library which provides powerful indexing and search capabilities for applications dealing with large volumes of text-based data.

At its core, Lucene offers an inverted index, which is an optimized data structure that maps terms to the documents containing them. This index allows for fast and efficient full-text searching, enabling users to retrieve relevant documents based on their queries. Lucene supports various search features, including Boolean queries, phrase matching, wildcard searches, and relevance ranking. It also can be extended and customized to meet specific requirements through its pluggable architecture, which enables the incorporation of additional analyzers, tokenizers, filters, and scoring algorithms.

Lucene has gained popularity and widespread adoption due to its performance, scalability, and robustness. It has influenced the development of numerous search-related projects and frameworks, including Apache Solr and Elasticsearch, which build upon Lucene to provide more comprehensive search solutions.

Now, let’s explore how we can integrate Lucene into our existing ASP.NET application. Please note that we’ll keep the walkthrough as simple as possible, disregarding any complex design pattern procedures.

Integrate Apache Lucene for Powerful Search in Your ASP.NET Application

Assuming we have an e-commerce website where customers can purchase books, we have implemented a basic search bar. Users can enter either the full or partial name of a book as the query. If the query matches one or multiple books, a list of books is displayed in a result box.

The logic behind the search is as follows –

				
					[HttpGet]
public IActionResult SearchBook(string query)
{
    var books = _context.Books.Where(x => x.Title.Contains(query) || x.Author.Contains(query)).ToList();
    return Ok(books);
}

This approach works well if the visitor knows precisely what they are looking for and the correct spelling. However, that is not always the case. Customers may only know a portion of the book name, misspell it, or simply want to browse books related to a specific topic. Let’s see how our current search function performs in these scenarios:

As we can observe, our search function fails to produce satisfactory results. But can we improve it? The answer is yes, we can. By harnessing the power of Apache Lucene, we can make our search function more robust and powerful. To accomplish this, let’s first examine the Book object.

				
					public class Book
{
    public Book()
    {
        Id = Guid.NewGuid();
        PublishedOn = DateTime.UtcNow;
    }
    public Guid Id { get; set; }
    public string Title { get; set; } 
    public string Author { get; set; } 
    public string? Description { get; set; }
    public decimal Price { get; set; }
    public string? ImageUrl { get; set; }
    public string? Isbn { get; set; }
    public DateTime PublishedOn { get; set; }
    public string Publisher { get; set; }
    public string SeoText { get; set; } = string.Empty;
}

Within our search space, we have identified six fields that can be utilized: Title, Author, Description, ISBN, Publisher, and SeoText. By leveraging these fields, we can easily find books and support more complex queries, such as “system architecture”. Additionally, certain fields like Id can be used to construct URLs for navigating users to the book detail page, while Price and Image can enhance the visual appeal on result cards.

When considering the storage of field values within the search space, we find that Id, Title, Author, Price, and ImageUrl need to be stored as they are to properly construct the result card. Other fields do not require storage in their original form. By selectively storing only the necessary information, we can significantly reduce storage consumption.

To illustrate the characteristics of each field –

Field Name	Needs to Analyze	Needs to Store as Whole
Id	No	Yes
Title	Yes	Yes
Author	Yes	Yes
Description	Yes	No
Price	No	Yes
ImageUrl	No	Yes
Isbn	Yes	No
PublishedOn	No	No
Publisher	Yes	No
SeoText	Yes	No

Our objective is to enable customers or app users to search for a query within the fields of Title, Author, Description, Isbn, Publisher, and SeoText. If any book matches the search query, we want to retrieve and display the book’s Id, Title, Author, Price, and ImageUrl on the frontend. This allows users to see an image of the book along with its title, price, and author name. If users click on a search result, they will be directed to the book details page for more information.

With a better understanding of our needs, let’s proceed to implementation. Begin by installing the necessary NuGet packages and add the following lines within the <ItemGroup></ItemGroup> tags.

				
					<ItemGroup>
    <PackageReference Include="Lucene.Net" Version="4.8.0-beta00016"/>
    <PackageReference Include="Lucene.Net.Analysis.Common" Version="4.8.0-beta00016"/>
    <PackageReference Include="Lucene.Net.QueryParser" Version="4.8.0-beta00016"/>
…
</ItemGroup>

After adding the necessary packages to the project, create a folder named Feature at the root directory. If the folder already exists, that’s good. Now create a folder called Search inside the Feature folder. Now create a file named SearchableBook.cs inside the Search folder.

Paste the following content inside the SearchableBook.cs file –

				
					public class SearchableBook
{
    public Guid Id { get; set; }
    public string Author { get; set; }
    public string Description { get; set; }
    public string? ImageUrl { get; set; }
    public string Title { get; set; }
    public string Isbn { get; set; }
    public string SeoText { get; set; }
    public decimal Price { get; set; }
    public string Publisher { get; set; }

    public SearchableBook(Book book)
    {
        Id = book.Id;
        Author = book.Author;
        Description = book.Description ?? string.Empty;
        ImageUrl = book.ImageUrl;
        Title = book.Title;
        Isbn = book.Isbn ?? string.Empty;
        SeoText = book.SeoText ?? string.Empty;
        Price = book.Price;
        Publisher = book.Publisher;
    }

    public IEnumerable<IIndexableField> GetFields()
    {
        return new Field[]
        {
            new TextField(nameof(Title), Title, Field.Store.YES),
            new TextField(nameof(SeoText), SeoText, Field.Store.NO),
            new TextField(nameof(Author), Author, Field.Store.YES),
            new TextField(nameof(Publisher), Publisher, Field.Store.NO),
            new TextField(nameof(Description), Description, Field.Store.NO),
            new TextField(nameof(Isbn), Isbn, Field.Store.NO),

            new StringField(nameof(Id), Id.ToString(), Field.Store.YES),
            new StringField(nameof(ImageUrl), ImageUrl ?? string.Empty, Field.Store.YES),
            new StringField(nameof(Price), Price.ToString(CultureInfo.InvariantCulture), Field.Store.YES),
        };
    }
}

In this class, we take a book object via the constructor and map the fields. It has a public method named GetFields. Let’s focus on it. Lucene treats the TextField object as an analyzable field for searching. Conversely, the StringField is considered a normal field without analysis. We have chosen the appropriate field types based on our earlier discussion. Additionally, we indicate which fields need to be stored as-is using the Field.Store enum values of YES or NO.

Now create a class to reflect the search result. Give the file name as SearchResult.cs and replace the page content with the following –

				
					public class SearchResult
{
    private readonly Document _doc;
    public SearchResult(Document doc)
    {
        _doc = doc;
    }
    public Guid Id { get; set; }
    public string ImageUrl { get; set; }
    public string Title { get; set; }
    public string Author { get; set; }
    public decimal Price { get; set; }

    public void Parse(Action<Document> action)
    {
        action(_doc);
    }
}

Now create an Interface named ISearchManager and populate the interface with following content –
public interface ISearchManager
{
    void AddToIndex(List<SearchableBook> searchables);
    void AddToIndex(SearchableBook searchable);
    void DeleteFromIndex(List<SearchableBook> searchables);
    void DeleteFromIndex(SearchableBook searchable);
    void Clear();
    IEnumerable<SearchResult> Search(string searchQuery, int max = 10);
}

And create another class named SearchManager to implement the ISearchManager interface. Populate the class with following content –

public class SearchManager : ISearchManager
{
    private const LuceneVersion AppLuceneVersion = LuceneVersion.LUCENE_48;
    private static FSDirectory? directory;
    private readonly string luceneDir;



    public SearchManager(IHostEnvironment environment)
    {
        luceneDir = Path.Combine(environment.ContentRootPath, "Lucene_Index");
    }

    private FSDirectory Directory
    {
        get
        {
            if (directory is not null)
            {
                return directory;
            }

            DirectoryInfo info = System.IO.Directory.CreateDirectory(luceneDir);
            return directory = FSDirectory.Open(info);
        }
    }

    public void AddToIndex(List<SearchableBook> searchables)
    {
        DeleteFromIndex(searchables);
        UseWriter(x =>
        {
            foreach (SearchableBook searchable in searchables)
            {
                var doc = new Document();
                foreach (IIndexableField field in searchable.GetFields())
                {
                    doc.Add(field);
                }
                x.AddDocument(doc);
            }
        });
    }

    public void AddToIndex(SearchableBook searchable)
    {
        DeleteFromIndex(searchable);
        UseWriter(x =>
        {
            var doc = new Document();
            foreach (IIndexableField field in searchable.GetFields())
            {
                doc.Add(field);
            }
            x.AddDocument(doc);
        });
    }

    public void DeleteFromIndex(SearchableBook searchable)
    {
        UseWriter(x =>
        {
            x.DeleteDocuments(new Term(nameof(SearchableBook.Id), searchable.Id.ToString()));
        });
    }

    public void Clear()
    {
        UseWriter(x => x.DeleteAll());
    }

    public IEnumerable<SearchResult> Search(string searchQuery, int max)
    {
        if (string.IsNullOrWhiteSpace(searchQuery))
            return new List<SearchResult>();
        using var analyzer = new StandardAnalyzer(AppLuceneVersion);
        using DirectoryReader? reader = DirectoryReader.Open(Directory);
        var searcher = new IndexSearcher(reader);
        var parser = new MultiFieldQueryParser(AppLuceneVersion, new[]
        {
            nameof(SearchableBook.Title),
            nameof(SearchableBook.Author),
            nameof(SearchableBook.Description),
            nameof(SearchableBook.Publisher),
            nameof(SearchableBook.SeoText),
            nameof(SearchableBook.Isbn)
        }, analyzer);
        Query? query = parser.Parse(QueryParserBase.Escape(searchQuery.Trim()));
        ScoreDoc[]? hits = searcher.Search(query, null, max, Sort.RELEVANCE).ScoreDocs;
        
        return hits.Where((x, i) => i <= max)
    .Select(x => new SearchResult(searcher.Doc(x.Doc)))
    .ToList();
    }


    public void DeleteFromIndex(List<SearchableBook> searchables)
    {
        UseWriter(x =>
        {
            foreach (SearchableBook searchable in searchables)
            {
                x.DeleteDocuments(new Term(nameof(SearchableBook.Id), searchable.Id.ToString()));
            }
        });
    }

    private void UseWriter(Action<IndexWriter> action)
    {
        using var analyzer = new StandardAnalyzer(AppLuceneVersion);
        using var writer = new IndexWriter(Directory, new IndexWriterConfig(AppLuceneVersion, analyzer));
        action(writer);
        writer.Commit();
    }
}

Here we have created a private writer method to write to the search space. Lucene stores indexed tokens in physical storage. On the class initialization, we stored a path to a folder (possibly empty) dedicated to Lucene in a variable. Later we created a local field of FSDirectory to interact with the folder using the provided folder location. In the UseWriter method, we first created an analyzer using the latest Lucene version, then we created a writer for the storage directory. After that we invoked the action that needs to be performed using the writer. In the end we committed to the changes.

Now register the interface and the class as transient in the Program class to make it injectable.

				
					builder.Services.AddTransient<ISearchManager, SearchManager>();

Now let’s redecorate the API endpoint like the following –

				
					[HttpGet]
public IActionResult SearchBook(string query)
{
    List<SearchResult> searchResults = _searchManager.Search(query.ToLowerInvariant()).ToList();
    foreach (SearchResult book in searchResults)
    {
        book.Parse(x =>
        {
            book.Author = x.Get(nameof(SearchableBook.Author));
            book.Title = x.Get(nameof(SearchableBook.Title));
            book.Id = new Guid(x.Get(nameof(SearchableBook.Id)));
            book.ImageUrl = x.Get(nameof(SearchableBook.ImageUrl));
            book.Price = decimal.Parse(x.Get(nameof(SearchableBook.Price)));
        });
    }
    return Ok(searchResults.Select(sr => new
    {
        sr.Title, sr.Author, sr.Price, sr.Id, sr.ImageUrl
    }));
}
Don’t forget to inject the search manager.
private readonly ISearchManager _searchManager;

public BookController(…, ISearchManager searchManager)
{
    …
    _searchManager = searchManager;
}

Now if we try to search again let’s see what happens –

Magic! Nothing happens! Hmm something went wrong!! Let’s examine the console, as we may find some clue –

By examining the response, we can understand that the Lucene is struggling to find a place to search. This is obvious because we haven’t tokenized anything. Let’s modify our book creation endpoint to tokenize the book and save it to the search space. Also let’s create another method to index our existing database.

Please be advised that we need to add the book to the search space only once if there are no data changes. If we modify a book, we can just re-add it to the search space.

				
					[HttpPost]
public IActionResult CreateBook(Book book)
{
    _context.Books.Add(book);
    _context.SaveChanges();
    _searchManager.AddToIndex(new SearchableBook(book));
    return Ok();
}

[HttpPost]
public IActionResult CreateBooks(List<Book> books)
{
    _context.Books.AddRange(books);
    _context.SaveChanges();
    _searchManager.AddToIndex(books.Select(x => new SearchableBook(x)).ToList());
    return Ok();
}

[HttpPost]
public IActionResult IndexAll()
{
    _searchManager.Clear();
    _searchManager.AddToIndex(_context.Books.Select(x => new SearchableBook(x)).ToList());
    return Ok();
}

Now let’s add books from our existing database to the search space by invoking a POST request to the IndexAll endpoint.

If we lookup at our given directory, we can see that our search index has been created and it took only 28.0 KB space.

Finally, it works! As a result of the improved search implementation, we can observe that two books have appeared in the result section. The titles of the books indicate that the search results are indeed satisfactory. Now, let’s explore some more queries to further evaluate the effectiveness of our search functionality.

Indeed, the updated search implementation shows significant improvements compared to the previous version. However, it’s worth noting that the current setup has a limitation: the search index remains in a specific local directory. This can pose challenges when scaling the application.

To address this limitation and enhance scalability, one can consider utilizing search servers like Apache Solr or Elasticsearch. These search servers offer distributed and highly scalable search functionalities, allowing for efficient handling of large datasets and accommodating increased traffic and user demands.

By adopting a search server, you can leverage their advanced features and capabilities, such as distributed indexing, fault tolerance, and high availability. These servers also provide additional functionalities like faceted search, advanced querying options, and result relevance ranking.

While incorporating a search server is beyond the scope of this article, it is a recommended approach for addressing scalability concerns. However, for smaller-scale applications or projects, the current implementation may suffice, providing an effective and efficient search solution.

Final Thoughts

We have explored the integration of Apache Lucene to enhance the search functionality of an ASP.NET application. By leveraging Lucene’s capabilities, we achieved more accurate search results for our e-commerce website.

Talk to us to get dedicated developers.

Tech Stack

0 +

Accelerate Your Software Development Potential with Us

With our innovative solutions and dedicated expertise, success is a guaranteed outcome. Let's accelerate together towards your goals and beyond.

Blogs You May Love

Don’t let understaffing hold you back. Maximize your team’s performance and reach your business goals with the best IT Staff Augmentation

Software Development

SDLC vs. Agile Development

Data breaches, identity theft, and unauthorized access are major concerns in today’s online world. It is very important to keep our private and digital footprints safe.