Windows Search
Windows Search is an indexed desktop search platform[1] released by Microsoft for the Windows operating system. It is available as part of Windows Vista and Windows Server 2008 operating system, where it is known by the name of Windows Search (also referred to as Instant Search).[2] For Windows XP and Windows Server 2003, the indexed search technologies are available as Windows Desktop Search (or WDS). While older versions of WDS were available for Windows 2000 as well, the latest release (version 3) is available only for Windows XP and Windows Server 2003.[3] Windows Search collectively refers to both the platforms, which not only share a common architecture and indexing technology,[1] but also are API-compatible with one another.
While Windows 2000 through Windows Server 2003 included the Indexing Service, WDS and Windows Search are separate from that. Indexing Service, which was a remnant of the planned Object File System feature of the ill-fated Cairo project, lacked its own UI and had to be used from either the Windows Explorer search UI or an MMC snap-in. WDS uses a different architecture and a new indexer. In fact, Windows Search replaces Indexing Service in newer Operating Systems.
Windows Desktop Search
File:Windows Desktop Search 301.png | |
Developer(s) | Microsoft |
---|---|
Stable release | 3.01
/ February 20, 2007 |
Operating system | Windows XP/Server 2003 |
License | Proprietary EULA |
Website | Windows Desktop Search Website |
WDS was initially released as MSN Desktop Search, as a part of the MSN Toolbar suite. It was rechristened Windows Desktop Search with version 2, while still being distributed with MSN Toolbar Suite. Version 2.6 came in two flavors, one for home users and the other for enterprise use. The only difference between the two was that the latter could be configured via group policy. The home edition was bundled with MSN Toolbar, while the other was available as a stand alone application. Latter, when MSN Toolbar was discontinued in favor of Windows Live Toolbar, the home edition of WDS was discontinued as well. With version 3, the indexer was implemented as a Windows Service, rather than as a per-user application, so that the same index as well as a single instance of the service can be shared across all users - thereby improving performance.
Overview
Upon installation, Windows Desktop Search builds an index of the files on a user's hard drive. The initial creation of this index can take up to several hours, but this is a one-time event. Once the index is complete, Windows Desktop Search is able to use this index to search results more rapidly than it would take to search through all the files on your computer. Searches are performed not only on file names, but also on the contents of the file (provided a proper handler for the file type is installed) as well as the keywords, comments and metadata the file might tagged with. For example, searching the computer for The Beatles would return a list of the Beatles music on the computer, as well as any e-mails and documents that include the phrase "The Beatles" in their titles or contents. WDS also features word-wheeled search (or search-as-you-type). It begins searching as soon as characters are entered in the search box, and keeps on refining and filtering the search results as more characters are typed in. As an advantage, this results in finding the required files even before the full search text is entered.
WDS by default include handlers for most common filetypes, including Word documents, Excel spreadsheets, PowerPoint presentations, HTML documents, text files, MP3 and WMA music files, WMV, ASF and AVI videos, JPEG, BMP and PNG images, among others.[4] WDS also supports IFilters, which is a defined interface that can be implemented by vendors of unsupported format. Once a file format has an associated IFilter, WDS uses the IFilter to extract the text of the document to index it.[5] It uses property handlers to handle metadata from file formats. A property handler needs a property description and a schema for the property for WDS to index the metadata.[6] Protocol handlers are used for indexing specific data stores. For example, files are accessed using File System Protocol Handler, Outlook datastores using the Outlook Protocol Handler and IE cache using the IE History/Cache Protocol Handler.[7] Network shares can be added to the index by installing proper property-handlers.[8]
WDS provides an API to let other applications leverage the searching capabilities. Using this API, other applications can query the index based on particular parameters (including restricting the search to only a few file types or data store) on either the entire index or a subset of it; the results will be presented to the application, which can then either present it to the user or apply further processing on it. WDS 3.0 also adds the ability for applications to push items to it for indexing, rather than waiting for it to come and index. Microsoft Office Outlook 2007, as well as Microsoft Office OneNote 2007 uses this ability to index the items managed by them and use WDS search capabilities to provide the in-application searching features. As such, they require WDS 3.0 (or Windows Search in Windows Vista) to be installed and running to use the enhanced search capabilities within the applications.
Advanced Query Syntax
Windows Desktop Search supports advanced queries as well, expressed via the Advanced Query Syntax (AQS).[9] AQS defines certain keywords which can be used to refine the search query, such as specifying boolean operations on searched terms (AND, OR, NOT) as well as to specify further filters based on file metadata or file type. It can also be used to limit results from specific information stores like regular files, offline files cache, or email stores. File type specific operators are available as well.[10] WDS also supports wildcard searches.[11] It also includes several SQL-like operators like GROUP BY.
UI
The WDS functionality is exposed via a taskbar mounted deskbar. It provides a text field to type the query and the results are presented in a flyout pane. It also includes a Windows Explorer window as well. On selecting a file in the Explorer window, a preview of the file is shown in the right hand side of the window, without opening the application which created the file. Web searches can be initiated from both interfaces, but that will open the browser to search the terms using the default search engine.
The WDS deskbar also has the capability to create application aliases, which are short strings which can be set to open diiferent applications. This functionality is accesssed by prefixing the ! character to the predefined string. For example "!calc" opens the Windows Calculator. This feature can also be used to create shortcut for URLs, which when entered, will open the specified URL in browser. It can also be used to send parametrized information over the URL, which are used to create search aliases. For example, "w text" can be configured to search "text" in Wikipedia.
Windows Search
Developer(s) | Microsoft |
---|---|
Operating system | Windows Vista/Server 2008 |
License | Proprietary EULA |
Website | Windows Vista Features: Instant Search |
Windows Search is the indexed search platform that debuted with Windows Vista, and offers a superset of the features provided by WDS. It is also API-compatible with version 3.0 WDS, so that applications that leverage the search capabilities of one will work, without any modification, without the other. Like WDS, the indexer runs as a service, known as Windows Search. Unlike WDS, though, the indexer performs the I/O operations with low priority, the process also runs with low priority. As a result, whenever other processes require the I/O bandwidth or processor time, it is able to pre-empt the indexer, thereby significantly reducing the performance hit associated with the indexer running in the background. Windows Search also includes a number of features that are not available with WDS.
Architecture
Windows Search is implemented as a Windows Service which implements the Windows Search runtime and APIs, as well as acting as host for the index stores and controlling the components. The most important component of Windows Search is the Indexer, which crawls the file system and creates and maintains the index of the data. It achieves this using three processes:[12]
- SearchIndexer.exe, which hosts the indexes and the list of URIs that require indexing, as well as exposes the external APIs that other applications use to leverage the Windows Search features.
- SearchProtocolHost.exe, which hosts the protocol handler. It runs with the least permission required for the protocol handler. For example, when accessing filesystem, it runs with the credentials of the system account, but on accessing network shares, it runs with the credentials of the user.
- SearchFilterHost.exe, which hosts the IFilter and property handlers to extract metadata and textual content. It is a low integrity process, which means that it does not have any permission to change the system settings. So, even if it encounters files with malicious content, and by any chance if they manage to take over the process, they will not be able to change any system settings.
The Indexer consists of two components, the Gatherer and the Merger,[13] the Gatherer retrieves the list of URIs that need to be crawled and invokes proper protocol handler to access the store that hosts the URI, and then the proper Property-handler (to extract metadata) and IFilter to extract the document text. Different indices are created during different runs, it is the job of the Merger to periodically merge the indices.[13] While indexing, the indices are generally maintained in-memory and the flushed to disc after a merge to reduce disc I/O. The metadata is stored in property stores, which is a database maintained by the ESE database engine.[13] The text is stored in a custom database built using Inverted Indices.[13] Apart from the indices and property store, another persistent data structure is maintained: the Gather Queue.[13] The Gather Queue maintains a FIFO queue the list of URIs that needs indexing; the Gatherer reads the list of this queue. The Indexer also includes another component, which is the Resource Monitor. It monitors the available resources, and controls the indexer. It has three states:[13]
- Running: In this state, the indexer runs without any restrictions. The indexer runs in this state only when there is no contention for resources.
- Throttled: In this state, the crawling of URIs and extraction of text and metadata is deliberately throttled, so that the number of operations per minute are kept under a tight control. The indexer is in this state when there is contention for resources, for example, when other applications are running. By throttling the operations, it is ensured that the other operations are not starved off resources they might need.
- Backed off: In this state, no indexing is done. Only the Gather Queues are kept active so that items do not go unindexed. This state is activated on extreme resource shortage (less than 5 MB of RAM or 200 MB of disc space), or the computer is on battery power with indexing when on battery disabled, or the indexer is manually paused.
The Windows Search service is notified of items to index by means of the Notifications API component.[13] Applications use the component to supply the URIs of the items that needs to be indexed, and the URIs are written to the Gather Queue, where they are read off by the indexer. Windows Search includes the USN Journal Notifier component that monitors the Change Journal in a NTFS volume, which is maintained by the OS itself,[14] to find the files that have changed recently. If the file does not have the FANCI (File Attribute Not Content Indexed) attribute set,[13] the Windows Search service is notified of its path via the Notification API. Third party applications, like Microsoft Office Outlook and Microsoft Office OneNote use the same APIs, to notify the service of any emails or notes that have changed and need indexing. The Configuration APIs are used to specify the configuration settings. such as the root of the URIs that needs to be monitored, or viewing status information like number of items indexed or length of the gather queue or the reason for throttling the indexer.[13] It also exposes APIs to register protocol handlers (via the ISearchProtocol()
interface, property handlers (via the IPropertyStore()
interface) or IFilter implementations (via the IFilter()
interface). IFilter
implementations allow only extraction of text, whereas IPropertyStore
allows reading as well as modifying properties, both to the file and the property store database.[13]
The OLE DB/SQL API implements the functionality for searching and querying across the indices and property stores. It uses a variant of SQL to represent the query in (regular SQL with certain restrictions). Results are returned as OLE DB Rowsets.[13] Whenever a query is executed, the parts of the index it used is temporarily cached so that further searches filtering the result set need not access the disc, to improve performance.
Searching
Like WDS, Windows Search supports word-wheeling as well as Advanced Query Syntax. However, unlike WDS, Windows Search can seamlessly mix indexed and non-indexed files. If the location being searched is indexed, the search is performed against the indexed. And if it is not indexed, the files are processed on-the-fly with the same IFilters and Property-handlers as they would be if they were indexed. This allows for more consistent results, though, at the cost of searching speed. Windows Search also indexes offline caches of network shares.[1] Windows Search can also index removable drives, the index of which is kept on the drive itself, so that the index is accessible as long as the drive is connected. Subsequent sessions will use the stored index itself.
Windows Search can also natively search the feed store of Windows RSS Platform, thus being able to search downloaded web feeds natively. It also natively supports the file types used by the in-built applications in Windows Vista. Natural language search is also supported and so the user can search for things like "photo taken last week" or "email sent from Dave". However, this is disabled by default.[15] Natural language search uses expresses the queries in Natural Query Syntax (NQS), which is the natural language equivalent of AQS.
The Windows Search index can be accessed programmatically using both managed as well as native code.[16] Native code connects to the index catalog by using a Data Source Object retrieved from Windows Vista shell's Indexing Service OLE DB provider. Managed code use the MSIDXS ADO.NET provider with the index catalog name. A catalog on a remote machine can also be specified using a UNC path. The criteria for the search is specified using a SQL-like syntax.
The default catalog is called SystemIndex and it stores all the properties of indexed items with a predefined naming pattern. For example, the name and location of documents in the system is exposed as a table with the column names System.ItemName and System.ItemURL respectively.[17] An SQL query can directly refer these tables and index catalogues and use the MSIDXS provider to run queries against them. The search index can also be used via OLE DB, using the CollatorDSO provider.[18] However, OLE DB provider is read-only, supporting only SELECT and GROUP ON SQL statements.
Windows Search, and Windows Desktop Search in downlevel versions, also registers a search-ms
application protocol, which can be used to represent searches as URIs.[19] The search parameters and filters are encoded in the URI using AQS, or its natural language counterpart, NQS. When the URI is invoked, Windows Search (which is registered as handler for the protocol) is invoked with the search query. Windows Search is currently the default handler for the protocol, but with Windows Vista SP1, third party handlers will be able to register themselves as the protocol handler, so that searches can be performed using any search engine which the user has set as default, and not just Windows Vista.
Windows Search also includes a feature known as Federated Search. Using this, if the file server, on which a network file share is hosted, is running either Windows Vista or Windows Server 2008, any searches against the share will be federated to the server. The server will perform the search against its index and present the results to the client system, filtering out the files the user does not have access to. This procedure is totally transparent to the user.[1]
UI
The search functionality is exposed using the search bars in the start menu and the upper right hand corner of the Windows Explorer windows, as well as Open/Save dialog boxes. When searching from the start menu, the results are shown in the start menu itself, overriding the recent used programs. And searching from the search bars in Explorer windows replaces the content of the current folder with the search results. When searching from the start menu, it is also possible to launch an application by searching for its executable image name or display name. In the control panel, the search bar in the window can also search for control panel options. However, unlike WDS, Windows Search does not support creating aliases. There is also a Search Explorer, which is a special Windows Explorer screen that is used for searches. It presents GUI to specify the search parameters, including locations and file types that should be searched, and certain operators, without crafting the AQS queries by hand. With Windows Vista SP1, third party applications will be able to override Search Explorer as default search interface. On doing that, the third party application will be launched, instead of bringing up the Search Explorer, on selecting search from the start menu or accessing the shortcuts which used to invoke the Search Explorer.[20] However, the windows search indexer will not be disabled by this, and the search bars will still continue to use the windows search index.
It is also possible to save a search query as a Virtual Folder, called a Saved Search or Search Folder,[1] which, when accessed, runs the search with the saved query and returns the results as a folder listing. Physically, a search folder is just an XML file (with a .search-ms
extension) which stores the search query, including the search operators as well. Accessing it runs the saved query and returns the results in a folder-like view. Windows Vista also supports query composition, where a saved search (called a scope) can be nested within the query string of another search.[21] These virtual folders are also distributable via RSS. Search folders can also be shared as a SearchMelt, which is search folder are accessible over a network.[22] Accessing a searchmelt over a network, like a regular search folder, make the results of the search available as a virtual shared folder. The search will be performed on the machine which shares the searchmelt, and will return only the results accessible from the network. However, by default, search folders are scoped for local use only; before sharing, they must be configured for remote access. Microsoft makes a SearchMelt Creator tool available for this as well.[23]
- See also: Search functionality in Windows Vista
Windows Search 4
Windows Search 4 is the planned successor to the Windows Search platform. It was supposed to be a Windows Live-branded program, codenamed Casino or OneView, but it was revealed that it will not be part of Microsoft's Windows Live services, rather be merged with the desktop search platform in Windows and be released as the successor to it.[24] It will be able to aggregate searches from various local as well as remote indexes including the windows search index (both local as well as those of networked systems), Windows RSS Platform common feed store, Microsoft Exchange and Microsoft SharePoint indexes among others,[25] as well as perform searches against web services,[26] which uses OpenSearch specification to make available the search results as a web feed, and present the results in an unified interface.[24]
References
- ^ a b c d e "Windows Search Technologies for Business Customers". Retrieved 2007-07-14.
- ^ "Windows Vista: Features Explained: Instant Search". Retrieved 2007-03-16.
- ^ "Windows Desktop Search". Retrieved 2007-03-16.
- ^ "List of searchable file types". Retrieved 2007-06-23.
- ^ "IFilter". Retrieved 2007-06-23.
- ^ "Developing Property Handlers for Windows Search". Retrieved 2007-06-23.
- ^ Brandon Paddock. "FAQ: How does indexing work? What are IFilters and Protocol Handlers?". Retrieved 2007-06-23.
- ^ "Windows Desktop Search: Add-in for Files on Microsoft Networks". Retrieved 2007-07-14.
- ^ "Advanced Query Syntax". MSDN TechNet. Retrieved 2007-06-23.
- ^ Nick White. "Advanced search techniques". Retrieved 2007-06-23.
- ^ "Seek and Ye Shall Find". Retrieved 2007-07-05.
- ^ Brandon Paddock. "FAQ: Why does WDS / Windows Vista use so many processes?". Retrieved 2007-06-23.
- ^ a b c d e f g h i j k "Good Citizenship When Developing Background Services That Run on Windows Vista". Retrieved 2007-07-14.
- ^ "Change Journals (Windows)". Retrieved 2007-07-14.
- ^ "Natural Language Search in Windows Vista". Retrieved 2007-06-22.
- ^ "Searching data". Retrieved 2007-03-17.
- ^ Catherine Heller. "Windows Vista Search: Syntax Update". Retrieved 2007-06-23.
- ^ "Querying the Index Programmatically". Retrieved 2007-06-23.
{{cite web}}
: Unknown parameter|pubisher=
ignored (|publisher=
suggested) (help) - ^ "Using the search-ms Protocol". Retrieved 2007-09-24.
- ^ "Overview of the Windows Vista desktop search changes in Windows Vista Service Pack 1". Retrieved 2007-07-14.
- ^ "Query Composition: Building a search upon another search". Retrieved 2007-06-22.
- ^ Nick White. "Searching, part III: Do you know what a SearchMelt is?". Retrieved 2007-06-23.
- ^ "SearchMelt Creator tool". Retrieved 2007-07-14.
- ^ a b Brandon Paddock. "The fate of codename "Casino"". Retrieved 2007-06-14.
- ^ Brandon Paddock. "Where is YOUR stuff?". Retrieved 2007-06-14.
- ^ Brandon LeBlanc. "Open Search". Retrieved 2007-06-14.
{{cite web}}
: Text "Open Search" ignored (help)