To use the Azure Library for Lucene.NET and Lucene.NET from a Visual Studio project, you must add a reference to both the AzureDirectory project or assembly, and the Lucerne.NET project or assembly. You must also add the following using statements to your
project:
using Lucene.Net;
using Lucene.Net.Store;
using Lucene.Net.Index;
using Lucene.Net.Search;
using Lucene.Net.Documents;
using Lucene.Net.Util;
using Lucene.Net.Analysis;
using Lucene.Net.Analysis.Standard;
using Lucene.Net.Search;
using Lucene.Net.QueryParsers;
using Lucene.Net.Store.Azure;
Lucene.NET is a .NET implementation of Lucene (http://lucene.apache.org/) and provides full-text indexing and search of documents. Documents are composed of multiple fields and do not have a predefined schema. When performing a query against the index, you can search across multiple fields within a document. Lucene.NET doesn't directly integrate with SQL Database; instead you must perform a query against a database and construct a Document from the results, which is then cataloged by Lucene.Net. For more information on Lucene.NET, see http://lucenenet.apache.org/.
This library allows you to expose blob storage as a Lucene.NET.Store.Directory object, which Lucene.NET uses as persistent storage for its catalog. More information on the Azure Library for Lucene.NET, as well as the latest version, can be found on the project
homepage at https://azuredirectory.codeplex.com/.
The current version of the Azure Library (as of 22 May 2013) may require modification before using it in your solution. Specifically:
The following code creates an AzureDirectory object and uses it as a parameter when creating the IndexWriter:
AzureDirectory azureDirectory =
new
AzureDirectory(
CloudStorageAccount.FromConfigurationSetting(
"Microsoft.WindowsAzure.Plugins.Diagnostics.ConnectionString"
),
"TestCatalog"
);
IndexWriter indexWriter =
new
IndexWriter(azureDirectory,
new
StandardAnalyzer(),
true
);
As mentioned previously, Lucene.NET is not integrated directly with SQL Database and is based on indexing 'documents' that contain multiple fields. In order to index data from SQL Database, you must query the database and create a new Document object for
each row. Individual columns can then be added to the Document. The following code illustrates querying a SQL Database that contains information on individual bloggers, and then adding the ID and Bio column information ="color:black;">IndexWriter indexWriter =
new
IndexWriter(azureDirectory,
new
StandardAnalyzer(),
true
);
As mentioned previously, Lucene.NET is not integrated directly with SQL Database and is based on indexing 'documents' that contain multiple fields. In order to index data from SQL Database, you must query the database and create a new Document object for each row. Individual columns can then be added to the Document. The following code illustrates querying a SQL Database that contains information on individual bloggers, and then adding the ID ato the Lucene index using an IndexWriter and Document:
// Create the AzureDirectory against blob storage and create a catalog named 'Catalog'
AzureDirectory azureDirectory=
new
AzureDirectory(CloudStorageAccount.FromConfigurationSetting(
"Microsoft.WindowsAzure.Plugins.Diagnostics.ConnectionString"
),
"Catalog"
);
IndexWriter indexWriter =
new
IndexWriter(azureDirectory,
new
StandardAnalyzer(),
true
);
indexWriter.SetRAMBufferSizeMB(10.0);
indexWriter.SetUseCompoundFile(
false
);
indexWriter.SetMaxMergeDocs(10000);
indexWriter.SetMergeFactor(100);
// Create a DataSet and fill it from SQL Database
DataSet ds =
new
DataSet();
using
(SqlConnection sqlCon =
new
SqlConnection(sqlConnString))
{
sqlCon.Open();
SqlCommand sqlCmd =
new
SqlCommand();
sqlCmd.Connection = sqlCon;
sqlCmd.CommandType = CommandType.Text;
// Only get the minimum fields we need; Bio to index, Id so search results
// can look up the record in SQL Database
sqlCmd.CommandText =
"select Id, Bio from bloggers"
;
SqlDataAdapter sqlAdap =
new
SqlDataAdapter(sqlCmd);
sqlAdap.Fill(ds);
}
if
(ds.Tables[0] !=
null
)
{
DataTable dt = ds.Tables[0];
if
(dt.Rows.Count > 0)
{
foreach
(DataRow dr
in
dt.Rows)
{
// Create the Document object
Document doc =
new
Document();
foreach
(DataColumn dc
in
dt.Columns)
{
// Populate the document with the column name and value from our query
doc.Add(
new
Field(
dc.ColumnName,
dr[dc.ColumnName].ToString(),
Field.Store.YES,
Field.Index.TOKENIZED));
,
dr[dc.ColumnName].ToString(),
Field.Store.YES,
// Write the Document to the catalog
indexWriter.AddDocument(doc);
}
}
}
// Close the writer
indexWriter.Close();
Note: The above sample returns all rows and adds them to the catalog. In a production application you will most likely only want to add new or updated rows.
After you have added documents to the catalog, you can perform a search against them using the IndexSearcher. The following example illustrates how to create perform a search against the catalog for a term contained in the 'Bio' field and return the Id of that result:
// Create the AzureDirectory for blob storage
AzureDirectory azureDirectory =
new
AzureDirectory(CloudStorageAccount.FromConfigurationSetting(
"Microsoft.WindowsAzure.Plugins.Diagnostics.ConnectionString"
),
"Catalog"
);
// Create the IndexSearcher
IndexSearcher indexSearcher =
new
IndexSearcher(azureDirectory);
// Create the QueryParser, setting the default search field to 'Bio'
QueryParser parser =
new
QueryParser(
"Bio"
,
new
StandardAnalyzer());
// Create a query from the Parser
Query query = parser.Parse(searchString);
// Retrieve matching hits
Hits hits = indexSearcher.Search(query);
// Loop through the matching hits, retrieving the document
for
(
int
i = 0; i < hits.Length(); i++)
{
//Retrieve the string value of the 'Id' field from the
//hits.Doc(i) document.
TextBox_Results.Text +=
"Id: "
+ hits.Doc(i).GetField(
"Id"
).StringValue()+
"\n"
;
}
Based on the Id, you can perform a query against SQL Database to return additional fields from the matching record.