Blog

Integrating Sitecores Azure Search Provider

Technology

6 min read


Posted by Patricia Murphy on October 15, 2018

Integrating Sitecores Azure Search Provider

Having previously worked with the Lucene and Solr search provider, I decided to test out how easy it is to switch from using the Lucene search provider to the Azure search provider. I set about changing the default configurations on a Habitat installation; Habitat 8.2 update 5 which comes pre-configured for Lucene. I ran into a couple of interesting issues and learned some things along the way.

Sitecore have a helpful guide for setting up Azure Search and connecting to it. Once you have the search service set up in Azure and you have configured your Sitecore instance to connect to that service, you can get back to reconfiguring the Sitecore instance and Habitat solution.

Configuration Changes

Enable the following files in App_Config/Include:

ContentTesting/Sitecore.ContentTesting.Azure.IndexConfiguration.config
XM/Sitecore.FXM.Azure.DomainsSearch.DefaultIndexConfiguration.config
FXM/Sitecore.FXM.Azure.DomainsSearch.Index.Master.config
FXM/Sitecore.FXM.Azure.DomainsSearch.Index.Web.config.config
ListManagement/Sitecore.ListManagement.Azure.Index.List.config
ListManagement/Sitecore.ListManagement.Azure.IndexConfiguration.config
Social/Sitecore.Social.Azure.Index.Master.config
Social/Sitecore.Social.Azure.Index.Web.config
Social/Sitecore.Social.Azure.IndexConfiguration.config
Sitecore.ContentSearch.Azure.DefaultIndexConfiguration.config
Sitecore.ContentSearch.Azure.Index.Analytics.config
Sitecore.ContentSearch.Azure.Index.Core.config
Sitecore.ContentSearch.Azure.Index.Master.config
Sitecore.ContentSearch.Azure.Index.Web.config
Sitecore.Marketing.Azure.Index.Master.config
Sitecore.Marketing.Azure.Index.Web.config
Sitecore.Marketing.Azure.IndexConfiguration.config
Sitecore.Marketing.Definitions.MarketingAssets.Repositories.Azure.Index.Master.config
Sitecore.Marketing.Definitions.MarketingAssets.Repositories.Azure.Index.Web.config
Sitecore.Marketing.Definitions.MarketingAssets.Repositories.Azure.IndexConfiguration.config

Disabled the following files in App_Config/Include:

ContentTesting/Sitecore.ContentTesting.Lucene.IndexConfiguration.config
FXM/Sitecore.FXM.Lucene.DomainsSearch.DefaultIndexConfiguration.config
FXM/Sitecore.FXM.Lucene.DomainsSearch.Index.Master.config
FXM/Sitecore.FXM.Lucene.DomainsSearch.Index.Web.config
ListManagement/Sitecore.ListManagement.Lucene.Index.List.config
ListManagement/Sitecore.ListManagement.Lucene.IndexConfiguration.config
Social/Sitecore.Social.Lucene.Index.Analytics.Facebook.config
Social/Sitecore.Social.Lucene.Index.Master.config
Social/Sitecore.Social.Lucene.Index.Web.config
Social/Sitecore.Social.Lucene.IndexConfiguration.config
Sitecore.ContentSearch.Lucene.DefaultIndexConfiguration.config
Sitecore.ContentSearch.Lucene.DefaultIndexConfiguration.Xdb.config
Sitecore.ContentSearch.Lucene.Index.Analytics.config
Sitecore.ContentSearch.Lucene.Index.Core.config
Sitecore.ContentSearch.Lucene.Index.Master.config
Sitecore.ContentSearch.Lucene.Index.Web.config
Sitecore.Marketing.Definitions.MarketingAssets.Repositories.Lucene.Index.Master.config
Sitecore.Marketing.Definitions.MarketingAssets.Repositories.Lucene.Index.Web.config
Sitecore.Marketing.Definitions.MarketingAssets.Repositories.Lucene.IndexConfiguration.config
Sitecore.Marketing.Lucene.Index.Master.config
Sitecore.Marketing.Lucene.Index.Web.config
Sitecore.Marketing.Lucene.IndexConfiguration.config
Sitecore.Speak.ContentSearch.Lucene.config


Code Changes

The web.config contains an <appSetting> value for setting the search provider. The support providers can be set to Lucene, Solr or Azure. Set this to Azure:

<add key="search:define" value="Azure" />

The next steps are to update the Habitat code to work with Azure. There are 2 configuration files that are set up to add additional fields to the Lucene index. These files need to be updated to conform to the Azure configuration requirements for adding fields to the Azure Cloud Index.

Open the configuration file located in "App_Config\Include\Foundation\" for Foundation.LocalDatasource.config and replace the <contentSearch> section with the following:

<contentSearch>
    <indexConfigurations>
         <defaultCloudIndexConfiguration type="Sitecore.ContentSearch.Azure.CloudIndexConfiguration, Sitecore.ContentSearch.Azure"> 
              <documentOptions>
                  <fields hint="raw:AddComputedIndexField">
                      <field fieldName="local_datasource_content" storageType="NO" indexType="TOKENIZED">Sitecore.Foundation.LocalDatasource.Infrastructure.Indexing.LocalDatasourceContentField, Sitecore.Foundation.LocalDatasource</field> 
                  </fields>
              </documentOptions>
        </defaultCloudIndexConfiguration>
    </indexConfigurations>
</contentSearch>

Open the configuration file located  in "App_Config\Include\Foundation\" for Foundation.Indexing.config and replace the <contentSearch> section with the following:

<contentSearch>
    <indexConfigurations>
        <defaultCloudIndexConfiguration type="Sitecore.ContentSearch.Azure.CloudIndexConfiguration, Sitecore.ContentSearch.Azure">
            <fieldMap type="Sitecore.ContentSearch.Azure.FieldMaps.CloudFieldMap, Sitecore.ContentSearch.Azure">
                <fieldNames hint="raw:AddFieldByFieldName">
                    <field fieldName="all_templates" storageType="YES" indexType="TOKENIZED" vectorType="NO" boost="1f" type="System.Collections.Generic.List`1[[System.String, mscorlib]]" settingType="Sitecore.ContentSearch.Azure.CloudSearchFieldConfiguration, Sitecore.ContentSearch.Azure">
                        <Analyzer type="Sitecore.ContentSearch.LuceneProvider.Analyzers.LowerCaseKeywordAnalyzer, Sitecore.ContentSearch.LuceneProvider" />
                    </field>
                    <field fieldName="has_presentation" storageType="YES" indexType="TOKENIZED" vectorType="NO" boost="1f" type="System.Boolean" settingType="Sitecore.ContentSearch.Azure.CloudSearchFieldConfiguration, Sitecore.ContentSearch.Azure" />
                    <field fieldName="has_search_result_formatter" storageType="YES" indexType="TOKENIZED" vectorType="NO" boost="1f" type="System.Boolean" settingType="Sitecore.ContentSearch.Azure.CloudSearchFieldConfiguration, Sitecore.ContentSearch.Azure"  />
                    <field fieldName="search_result_formatter" storageType="YES" indexType="UNTOKENIZED" vectorType="NO" type="System.String" settingType="Sitecore.ContentSearch.Azure.CloudSearchFieldConfiguration, Sitecore.ContentSearch.Azure" />
                </fieldNames>
            </fieldMap>
            <virtualFields type="Sitecore.ContentSearch.VirtualFieldProcessorMap, Sitecore.ContentSearch">
                <processors hint="raw:AddFromConfiguration">
                    <add fieldName="content_type" type="Sitecore.Foundation.Indexing.Infrastructure.Fields.SearchResultFormatterComputedField,Sitecore.Foundation.Indexing"/>
                </processors>
            </virtualFields>
            <documentOptions type="Sitecore.ContentSearch.Azure.CloudSearchDocumentBuilderOptions,Sitecore.ContentSearch.Azure" >
                <fields hint="raw:AddComputedIndexField">
                    <field fieldName="has_presentation" storageType="no" indexType="untokenized">Sitecore.Foundation.Indexing.Infrastructure.Fields.HasPresentationComputedField, Sitecore.Foundation.Indexing</field>
                    <field fieldName="all_templates" storageType="no" indexType="untokenized">Sitecore.Foundation.Indexing.Infrastructure.Fields.AllTemplatesComputedField, Sitecore.Foundation.Indexing</field>
                    <field fieldName="has_search_result_formatter" storageType="no" indexType="untokenized">Sitecore.Foundation.Indexing.Infrastructure.Fields.HasSearchResultFormatterComputedField, Sitecore.Foundation.Indexing</field>
                    <field fieldName="search_result_formatter" storageType="no" indexType="untokenized">Sitecore.Foundation.Indexing.Infrastructure.Fields.SearchResultFormatterComputedField, Sitecore.Foundation.Indexing</field>
                </fields>
            </documentOptions>
        </defaultCloudIndexConfiguration>
    </indexConfigurations>
</contentSearch>

You also need to add additional lines to the Sitecore <settings> section to set the default search index to the Azure Search provider. The <settings> should be set as follows:

<settings>
    <setting name="ContentSearch.ParallelIndexing.Enabled" value="true" />
    <setting name="ContentSearch.DefaultIndexType">
        <patch:attribute name="value">Sitecore.ContentSearch.Azure.CloudSearchProviderIndex, Sitecore.ContentSearch.Azure</patch:attribute>
    </setting>
    <setting name="ContentSearch.DefaultIndexConfigurationPath">
        <patch:attribute name="value">contentSearch/indexConfigurations/defaultCloudIndexConfiguration</patch:attribute>
    </setting>
</settings>

After making these configuration changes, run the indexing manager to create the search indexes including the newly mapped fieldnames defined above.

 

Running into Issues

The Sitecore Azure Search provider comes with a set of limitations in comparison to Solr and Lucene. This means that you will get different search results when using Azure compared to Solr/Lucene.

When debugging the searchService.cs code I found that it was erroring out at the following line.

rootPredicates = rootPredicates.Or(item => item.Path.StartsWith(provider.Root.Paths.FullPath));

Due to the limitations of Sitecore Search with Azure using "StartsWith" will match terms that are located in any part of the field value. This is not the desired result in this instance so this line needed to be amended.

I tested updating this to use "Contains" and passed in the Root.ID rather than the FullPath string value:

rootPredicates = rootPredicates.Or(item => item.Paths.Contains(provider.Root.ID));

This was still causing an error and after investigating the data stored in the sitecore_master_index I could see that the field "path_1" was not maintaining the data item ID in its original format.

For example, the Sitecore Root.ID of

{11111111-1111-1111-1111-111111111111}

Was stored in the index as

11111111111111111111111111111111

I decided to test updating the predicate to reference the "path_1" field directly. First, I needed to strip the Root.ID value to match the format saved in the index and then use this stripped value in the predicate.

var stripID = provider.Root.ID.ToString().Replace("{", "").Replace("-", "").Replace("}", "");
rootPredicates = rootPredicates.Or(item => item["path_1"].Contains(stripID));

This allowed the searchService to execute without error.

 

Testing

I then set about testing the search. The main drawback that I found was when searching for a phrase. If you stick to only searching for single words the search works as expected. When searching for multiple words, each word is searched separately and the results are then combined. Results returned are not restricted to results that match the full phase.

If you search for the term "contact us", you will get results that contain “contact” and results that contain "us". An About Us page could be returned in the results as it contains the word "us" even if it does not contain the word "contact". This operation should be considered when thinking about using the Azure search with Sitecore.

Interested in learning more about Sitecore? Contact us today.

About the Author

Patricia Murphy
Patricia Murphy

Patricia is a Senior Developer at Arekibo. She is a certified developer with numerous technologies including Sitecore, Sitefinity and Kentico.