Thursday, September 15, 2011

Windows Search and Xml files

When you try to use Windows Search on the contents of xml files, you will find that not all text of an xml file is indexed.

For example, if you have the following xml file:
<myelement attrib="myattrib">
myelementtext
</myelement>
You will be able to find it by searching for “myelementext”, but searching on any of the terms below will fail
  • <myelement
  • "<myelement"
  • \<myelement
  • "\<myelement"
  • %myelement
  • ?myelement
  • [<]myelement
  • xpath:\\myelement
Apparently, you can only search over the prefix of the term (ex: "test*"). With that said, the XML filter only outputs text terms within the XML document. (in our example only the terms myattrib and myelement are indexed). The other elements are attributes that are not typically user facing.

To have all text in an XML file indexed, we need to assign XML extension to use the plain text filter.
This is (currently) not possible with the UI, so you will need to modify the registry.

Navigate to HKEY_CLASSES_ROOT\.xml\PersistentHandler and change the default value to {5e941d80-bf96-11cd-b579-08002b30bfeb}

You will have to rebuild the index (Control Panel / Indexing options / Advanced / Rebuild)  to re-index the xml files.

Copy the lines below into a file with a .reg extension and double click to update the registry for you.
Windows Registry Editor Version 5.00
[HKEY_CLASSES_ROOT\.xml\PersistentHandler]
@="{5e941d80-bf96-11cd-b579-08002b30bfeb}"
To verify that the 'plain text filter' is being used, go to 'Control Panel', 'Indexing Option'. Select Advanced and on the 'File Types' tab, verify that the 'Plain Text Filter' is being used for the 'xml ' extension


Note that you can also have other files (xsl, xslt, xsd, ...) using the plain text filter by modifying the associated registry key

e.g. for xslt
Windows Registry Editor Version 5.00
[HKEY_CLASSES_ROOT\.xslt\PersistentHandler]
@="{5e941d80-bf96-11cd-b579-08002b30bfeb}"
For info on how to enable windows search in Windows Server 2008 (R2) see: http://blog.sdbonline.com/2012/02/update-windows-search-and-xml-files-on.html

2 comments:

  1. Windows 7? Windows Server 2008? R2? Datacenter, Webserver, core...? I've tried it on two different 2k8 R2 64bit boxes, and still cannot get all the contents of xml files to index. ARGH, thanks tho!

    ReplyDelete
    Replies
    1. Shaun,

      I've reinstalled my laptop recently (Windows Server 2008 R2), so I was able to quickly test this for you.

      This post refers to the 'Windows Search Service'. It's included in Windows 7 by default, but must be added to Windows Server 2008 R2 by adding the 'File Server Role' and selecting the 'Windows Search Service' under 'Role Services'.
      Also you need to make sure to rebuild the index after modifying the registry and wait until indexing is complete to make sure all your files are indexed: I've put my test file in the rootfolder and apparently it got indexed last as it didn't show up in the search results until indexing was completely finished.

      I added an update post to specify this part in detail: http://blog.sdbonline.com/2012/02/update-windows-search-and-xml-files-on.html

      Delete