LGOSystems Logo
Home
Products
Cognatrix
Cognatrix Features
Cognatrix Guide
Cognatrix Limits
Buy Cognatrix
Cognatrix Importer
Cognatrix Search
FrameSeer
Stripper
IPGadget
Support
About LGOSystems
FrameSeer Help

CognatrixSearch Reference Manual
1.0

Contents

  1. Introduction
  2. CognatrixSearch
  3. Minimum requirements
  4. Assumptions
  5. How CognatrixSearch works
  6. Queries
  7. A word about paths
  8. Installing CognatrixSearch
  9. Creating a search-base
  10. Creating a set of templates
  11. Configuring CognatrixSearch
  12. Constructing the HTML for a search
  13. Substitutions in results pages
  14. Configuring CognatrixSearch - the details
  15. Understanding debugging information
  16. About logging
  17. Checking the CGI’s version
  18. Using CognatrixSearch with PHP

Introduction

You have created a thesaurus using Cognatrix. You have used the Generate HTML... command to produce web pages for all your terms. You have published those pages using Mac OS X’s built-in web server. Your users are able to use the buttons and links on those pages to navigate around your thesaurus.

But you feel that something is missing — you would like your users to be able to search for terms in your thesaurus. Although you can wait for global search engines like Google™ to index your pages, the problems with that approach include the time-lag plus the fact that your work may well be lost in the clutter of a million hits. Ideally, you would prefer a search engine that is:

  • Specific to your thesaurus;
  • Supports simple searching for novice users; and
  • Has advanced features like wild-cards and Boolean expressions for experienced searchers.

Enter CognatrixSearch.

CognatrixSearch

CognatrixSearch is a search engine that helps your users quickly find the terms published in your Cognatrix thesaurus. CognatrixSearch supports advanced features such as Boolean expressions and wild-cards, and is designed to handle multiple thesauri.

Search Engines normally consist of two parts:

  • An indexing engine that scans the material to be searched (the “corpus”) and creates an index; and

  • A retrieval engine that accepts queries, checks the index, and returns hits pointing to the original material.

CognatrixSearch uses Apple’s SearchKit technology for both indexing and retrieval. SearchKit is built into every copy of Mac OS X and underpins many system services such as Spotlight.

The indexing engine for CognatrixSearch is built into Cognatrix itself. Whenever you choose the Generate HTML… command, Cognatrix rebuilds the CognatrixSearch index. This ensures that the index is always in sync with the HTML files. It also avoids the need for you to have to remember to run a separate indexing process.

The retrieval engine is a Common Gateway Interface, or CGI. This is web-speak for a process that can be launched by the Apache web-server which is built into every copy of Mac OS X.

You provide a page where your users can enter a query. This can be as simple as a small text field on your main home page or a dedicated page giving examples of advanced usage. Apache hands the queries to CognatrixSearch. CognatrixSearch processes each query and returns a page of hits which point to the matching terms in your thesaurus. You can customise the appearance of both the initial search page and the results pages.

Minimum requirements

CognatrixSearch runs on Mac OS X 10.4 (Tiger) or later. Although Cognatrix is supported on Mac OS X 10.3.2 (Panther) or later, CognatrixSearch depends on parts of SearchKit that are only available in Tiger. Cognatrix will only create indexes for CognatrixSearch if it is running on Tiger.

The minimum hardware requirements for CognatrixSearch are the same as those for Mac OS X 10.4. In other words, if Tiger will run then so will CognatrixSearch. Naturally, more memory is always better than Tiger’s recommended minimum of 256MB.

CognatrixSearch is released as a Universal Binary. This means that runs on both PowerPC- and Intel-based Macintosh systems.

Assumptions

  • The Macintosh system on which you plan to install CognatrixSearch is running Mac OS X 10.4 or later (10.4.4 or later recommended).

  • The installation of Apache and its configuration files are as-supplied by Apple on the standard Tiger installation discs. If you are working on a tailored system, you may need obtain assistance from your system administrator.

  • You have administrative privileges. The files and folders installed as part of CognatrixSearch assume that you are a member of the admin group. If you attempt to work through these instructions whilst logged-in as a non-admin user, you may encounter permission problems.

  • You have an HTML-capable text editor installed and that you are familiar with its use. We use and recommend BBEdit from BareBones Software for all HTML editing.

    Note: By default, the standard TextEdit application that ships with Mac OS X attempts to interpret HTML files. In other words, it behaves like Safari and shows you the rendered result rather than the raw HTML. If you wish to use TextEdit to edit HTML, you will need to turn on the “Ignore rich text commands in HTML files” option in its “Open and Save” preferences panel.
  • You have sufficient familiarity with HTML to be able to tailor the supplied examples and templates to your own requirements. You are not expected to be an HTML guru but you will need to be able to distinguish between tags and text.

  • You have sufficient familiarity with Unix to be able to recognise and understand path names. Again, you are not expected to be a guru. You are, however, expected to know that “/Library/WebServer/” means the folder named “WebServer” within the folder named “Library” which is at the top level of your startup disk.

  • Familiarity with the Apache web server and general CGI concepts is not essential but will help you understand how CognatrixSearch works.

How CognatrixSearch works

Every search begins with a web page containing a form similar to Figure 1. A “form” is HTML’s way of obtaining information from users via standard user-interface widgets such as text-input boxes, popup menus and radio buttons. Information obtained via a form can be transmitted easily to a web server. The form for a CognatrixSearch query contains both the query itself (eg “health AND wealth”) plus the name of the thesaurus to search and the number of hits to return. Form construction is discussed below under Constructing the HTML for a Search.

Figure 1
Figure 1

Your user types a query into the form, presses carriage return and Safari sends the form to Apache. Apache, in turn, passes the form to CognatrixSearch.

CognatrixSearch uses the name of the thesaurus contained in the form to find the following:

  • Configuration information specific to your thesaurus;
  • A set of templates to be used to construct results pages; and
  • The SearchKit index for your thesaurus, created by the Generate HTML… command.

CognatrixSearch searches the SearchKit index for terms that match the query and uses the hits plus the templates to build a results page. Each result links to the page for the corresponding term in your thesaurus.

CognatrixSearch returns the completed results page to Apache which sends it back to your user’s browser.

Queries

CognatrixSearch relies on SearchKit to interpret queries, to find matching terms, and to calculate relevance. SearchKit interprets queries like this:

  • Searching for a single word will find terms where the name contains exactly that word. There is no implicit wild-card or stemming. For example, a search for ‘health’ will match ‘health’, ‘health matters’ and ‘good health’ but will not match ‘healthy living’.

  • Asterisks can be used as wildcards, providing that they are at the beginning and/or end of a word:

    • Searching for ‘hea*’ will match ‘health’ and ‘heart’.
    • Searching for ‘*lth’ will match ‘health’ and ‘wealth’.
    • Searching for ‘*eal*’ will match both ‘health’ and ‘wealth’.

    Note that an asterisk can not be embedded within a word. Searching for ‘he*lth’ is interpreted as being a search for ‘he*’.

  • Spaces between words in a query imply OR. A search for ‘health wealth’ is the same as a search for ‘health OR wealth’ and will match both ‘health matters’ and ‘wealth report’.

  • Enclosing words in double-quotes implies AND. A search for “health wealth” is the same as a search for ‘health AND wealth’ and will only return terms containing both words.

  • You can use parentheses to construct more complex queries. For example:

    (life AND health) OR (wealth AND happiness)
  • To exclude a word, use AND NOT as in ‘wealth AND NOT happiness’.

  • Searches are case-insensitive. A search for ‘health’ will match ‘Health’ and ‘HEALTH’. However, the Boolean operators AND, OR and NOT are case sensitive and must be typed in upper case.

Keep in mind that the index built by the Generate HTML… command only includes the name fields from the published terms in your thesaurus. In other words, CognatrixSearch only searches for terms by name and not by the content of user-defined fields and so on.

SearchKit calculates a relevance ranking for each hit and CognatrixSearch returns hits in order of decreasing relevance. In general, you will obtain the most predictable results using the AND form of multi-word queries.

A word about paths

Note: If you already understand Unix paths, feel free to skip this section...

Consider the following Unix path:

/Library/WebServer/

If you are not overly familiar with Unix paths, the leading “/” means that this is an “absolute path”. Absolute paths always begin with your startup disk so this is the same as saying:

  1. Double-click on the icon of your startup disk.
  2. In the folder that opens, double-click on the Library folder.
  3. In the folder that opens, double-click on the WebServer folder.

A path that does not begin with a “/” is called a “relative path”. As the name suggests, relative paths are relative to some other path. For the Apache web server, this is most often a special path known as the document root. For standard installations of Mac OS X, document root is the absolute path:

/Library/WebServer/Documents/

When Apache receives a URL like:

http://hostname.domain/products/index.html

it breaks the URL into its component parts:

Protocol: http
Host: hostname.domain
Path: products/index.html
The path is a relative path. To find the file “index.html” in the “products” folder, Apache prepends the document root, like this:
/Library/WebServer/Documents/products/index.html

Apache does this for security reasons. Apache will only serve files that have been placed inside the document root folder structure.

With one exception, CognatrixSearch works with absolute paths. The exception is the server “prefix” which must always be a path relative to document root.

Installing CognatrixSearch

To install CognatrixSearch, double-click on the installer package icon and follow the prompts. The package installs the following folders and files at the top level of your startup disk:

/
Library/
Application Support/
LGOSystems/
CognatrixSearch/
example_search.html   (1)

Templates/
example/   (2)
dtd.html
headprefix.html
metadata.html
stylesheet.html
headsuffix.html
bodyprefix.html
bodysuffix.html
noresults.html
licenceHelper   (3)
PreferencePanes/
CognatrixSearch.prefPane   (4)
WebServer/
CGI-Executables/
CognatrixSearch   (5)
Notes:
  1. An example search page showing how to create a web form to accept a query and send it to CognatrixSearch for processing.
  2. A folder containing example templates. All files with the .html extension are templates from which results pages are built. You can tailor the content of these files to your own look and feel. You normally make a copy of this folder for each of your own thesauri.
  3. A helper tool that gives the CognatrixSearch Preference Pane the ability to send configuration settings to the CognatrixSearch CGI (this tool is called licenceHelper for historical reasons; a better name would have been preferenceHelper).
  4. The CognatrixSearch Preference Pane. This becomes available in your System Preferences application and is how you configure the CognatrixSearch CGI.
  5. CognatrixSearch itself (the CGI).

Installing CognatrixSearch for the first time?

If you have just installed CognatrixSearch for the first time, it is worthwhile working through some additional steps so that you can be sure that everything is configured correctly:

  1. Confirm that Apache is running. The simplest way to do that is to open the Sharing panel in System Preferences and see whether Personal Web Sharing is enabled. If not, turn it on.

    While you have the sharing panel open, make a note of the names by which your computer is known:

    • At the top of the panel under the “Computer Name:” field is text saying something like:

      Other computers on your local subnet can access your computer at hostname.local.

      The value “hostname.local” (or whatever it says for your computer) is the Bonjour name for your computer.

    • Click once on the Personal Web Sharing service to select it. The area below the list of services changes to say something like:

      View this computer’s website at http://hostname.domain/ or your personal website at http://hostname.domain/~yourname/

      The first URL “http://hostname.domain/” (or whatever it says for your computer) is the domain name for your computer.

    Always use either the Bonjour name or the domain name of your computer when testing CognatrixSearch. Avoid using shortcuts like “localhost” or the loopback IP address 127.0.0.1 because you will probably get unexpected results. We recommend using the Bonjour name during testing. You should only use a domain name if you are sure that it is a permanent name allocated to your computer. Check with your network administrator to find out whether that is the case.

  2. Test that you can communicate with Apache. Launch Safari and type a URL of the following form:

    http://host/

    where “host” is either the Bonjour name or the domain name of your computer. For example, if your Bonjour name is “MyHappyMac.local” the URL would be:

    http://MyHappyMac.local/

    If your computer’s domain name is “myhappymac.mycompany.com” the URL would be:

    http://myhappymac.mycompany.com/

    When you press the return key after entering the URL, you should either see the default page supplied by Apple or whatever you (or your administrator) replaced that page with. Apple’s default page begins with the words:

    “If you can see this, it means that the installation of the Apache web server software on this system was successful. You may now add content to this directory and replace this page.”

    In you see either Apple’s default page or your own page, it is safe to proceed to the next step.

    Conversely, if you see a message from Safari saying “Safari can’t connect to the server” it either means that Apache is not running or you made a mistake when you typed the URL. Go back and check your work.

    If you see a message from Safari saying “Not Found” or if Safari presents you with a directory listing, it means that Apache is working but that your system has probably been tailored to some extent. In either case, it is safe to proceed to the next step.

Creating a search-base

CognatrixSearch needs a search-base before it can do its work. A search-base is nothing more than a folder containing the output from launching Cognatrix with your thesaurus and choosing the Generate HTML... command.

Every search base needs a name. In principle, a name can be anything you can type on your keyboard. However, you will probably want to choose a name which is appropriate for your thesaurus. At this stage, we recommend keeping things simple: choose a name which is all lower case, begins with a letter, and which does not contain any spaces or other punctuation.

For the remainder of these instructions, we are going to assume the name “mythes”. You should substitute the name you have chosen wherever you see “mythes”.

You create a search-base for each of your thesauri by proceeding as follows:

  1. Launch Cognatrix 1.3 (or later) and do the following:

    • Open your thesaurus.

    • Choose the Generate HTML… command.

    • Press Command-D to move to the Desktop.

    • Click the New Folder button. Type “mythes” (or whatever name you chose) into the “Name of new folder” field and click the Create button.

    • Accept the default file-name for your primary index page of “index.html” and click the Generate button. Cognatrix will create your web pages and the index for CognatrixSearch.

    • Quit Cognatrix.

    Open the folder named “mythes” on your desktop. You should expect to see the following files and folders:

    mythes/
    index.html   (1)

    index-A.html   (2)

    index-Z.html

    QID.uc1szkkb.1/   (3)

    QID.uc1szkkb.n/

    index.qidx   (4)
    Notes:
    1. The primary index page for your thesaurus containing your top terms.
    2. One secondary index page for each letter of the English alphabet.
    3. The folders containing the pages for the terms in your thesaurus.
    4. The index for CognatrixSearch.

    Close the “mythes” folder and leave it on your desktop.

  2. Open the following folder (hint: it is an absolute path):

    /Library/WebServer/Documents/

    Drag the “mythes” folder from your desktop and drop it into the “Documents” folder you just opened. The structure you should expect to see is:

    /
    Library/
    WebServer/
    Documents/
    index.html   (1)

    …   (2)

    mythes/   (3)
    Notes:
    1. The normal home page for your web site.
    2. Other files and folders needed for your web site.
    3. The folder containing the information created by Cognatrix when you executed the Generate HTML… command.
  3. Test your work. Launch Safari and type a URL of the following form. For the remainder of these instructions we are going to use the Bonjour-name form of URL but you can substitute the domain-name form if you wish:

    http://MyHappyMac.local/mythes/

    You should expect to see the primary index page for your thesaurus showing all your top terms. If you do not see that page, go back and check your work.

Creating a set of templates

CognatrixSearch builds pages of results using templates. Templates give you the ability to tailor what your users see.

If you think about what happens when you use most search engines on the web:

  • You start with a page (often the home page of the site) containing a field where you can type in your first query. That page may have other controls such as the maximum number of hits and may also contain hints about how to construct complex queries. Alternatively, the first page may only contain a simple search field plus a link to an “advanced search” page.

  • After you enter a query and press return, the search engine returns a page of hits (or a page telling you nothing was found). Usually, such pages also contain another search field so that you can easily run a new query without having to go back to the first page.

CognatrixSearch can support any or all of the above. At this point, however, we are going to create an initial dedicated search page plus the templates that CognatrixSearch will use to return results and prepare for subsequent queries:
  1. To create the initial search page, start by opening both of the following folders:

    /Library/Application Support/LGOSystems/CognatrixSearch/

    /Library/WebServer/Documents/

    Make a copy of the file called “example_search.html” in the “CognatrixSearch” folder (hint: select the file and press ⌘D), rename the copy to “mythes_search.html”, and then move the renamed file into the “Documents” folder.

    Open “mythes_search.html” in a text editor and find the line containing:

    <input type="hidden" value="example" name="QIDSearchThesaurus">

    The word “example” is the name of the thesaurus to search. Change it to “mythes”, as in:

    <input type="hidden" value="mythes" name="QIDSearchThesaurus">

    and save your work.

  2. CognatrixSearch uses the contents of the templates folder to generate search results. Pages are built by concatenating files in the following order:

    dtd.html
    headprefix.html
    metadata.html
    stylesheet.html
    headsuffix.html
    bodyprefix.html

    bodysuffix.html

    CognatrixSearch creates the HTML for the hits between bodyprefix and bodysuffix. If no hits are found, the contents of the file noresults.html are placed between bodyprefix and bodysuffix.

    You can tailor any or all of these files to your own requirements. By manipulating the style sheet, you can also affect the appearance of the hits. Naturally, you are responsible for ensuring that your changes only use valid HTML.

    To create a set of templates, open the following folder:

    /Library/Application Support/LGOSystems/CognatrixSearch/Templates/

    Make a copy of the “example” folder (⌘D) and rename the copy to “mythes”.

    You have just created the templates CognatrixSearch will need. You do not need to do anything more at this stage. Just remember that if you want to tailor the look-and-feel of your results pages, this is where you should start.

Configuring CognatrixSearch

CognatrixSearch needs to be told how to find all of the components it needs to perform a search. You configure CognatrixSearch using System Preferences:

  1. Launch the System Preferences application (usually found on your Dock). Click on CognatrixSearch (usually in the bottom row). You should expect to see a display similar to Figure 2.

  2. Click on the + button to add a new entry. The thesaurus name is automatically selected and prepared for editing. Type “mythes” and press return. Notice how the fields below the list of thesauri have dimmed text indicating default values.

  3. If you have followed all of the instructions to this point, the default values will be correct so simply click the Apply button.
  4. It is now time to test your work. Launch Safari and type a URL of the following form:

    http://MyHappyMac.local/mythes_search.html

    When you press the return key after entering the URL, you should see a page containing search and maximum hit fields plus usage hints. If Safari says “Not Found” you either made a mistake with the URL or did not do the previous step correctly. Go back and check your work.

  5. Assuming the search page appears, type a word into the search field that you know appears in your thesaurus, and then press return.

    You should see a page containing your search results. If so, congratulations. If not, go back and check your work.

Constructing the HTML for a search

Information can be passed to CGI’s in one of two ways:

  • The “get” method, where all of the information is passed via the URL; or
  • The “post” method, where the information is passed in an attachment.

The main advantages of the “get” method are:

  • Users can bookmark their queries; and
  • All of the search parameters wind up in Apache’s log where they are available for subsequent analysis.

Its main disadvantages are:

  • URLs may become long and unwieldy for complex queries (especially where Unicode™ characters are involved); and
  • Users can see all the gruesome details of how queries are constructed (which you may not want in certain situations, such as if your primary audience is not “tech savvy”).

Although the “post” method solves both of the “get” method’s disadvantages, it comes at the price of hiding all the details, from both your users and Apache’s log.

CognatrixSearch supports both “get” and “post” so you are free to choose the method that suits your needs. You specify the method as an attribute on the <form> tag. The example below uses the “post” method.

Regardless of the method you choose, a query for the word “health” in the thesaurus named “mythes” would look something like this:

QIDSearchQuery=health&QIDSearchThesaurus=mythes&QIDSearchDebug=no ...

By observation you can see that queries follow a “keyword=value” syntax with ampersand separators between keys. This syntax is specified by HTTP. CognatrixSearch understands the following keywords:

QIDSearchThesaurus

The name of the thesaurus to search.

QIDSearchQuery

The search to perform. This can be anything from a single word to a complex Boolean expression.

QIDSearchHitLimit

The maximum number of hits to be returned to the user. The default value and allowable range is controlled by the per-thesaurus search configuration (discussed later).

QIDSearchDebug

If the value of this key is “YES”, debugging will be enabled if this is permitted by the CognatrixSearch preferences. As its name suggests, debugging mode returns information that can be used to diagnose problems.

QIDSearchAdvanced

If the value of this key is “NO” (which is the default), CognatrixSearch will examine the search string to see if it appears to be a simple search. A simple search is one that does not contain any wildcards, parentheses or Boolean operators. If the search string passes this test, CognatrixSearch wraps each word in wildcard symbols. For example, ‘heal me’ would become ‘*heal* *me*’, and will subsequently be interpreted by SearchKit as ‘(*heal* OR *me*)’.

Searches are constructed using HTML’s “form” syntax. The following example (which produces Figure 1) shows all keywords in use. QIDSearchQuery and QIDSearchHitLimit are obtained from the user via text-input boxes, whereas QIDSearchThesaurus, QIDSearchDebug and QIDSearchAdvanced have fixed values:

<form method="post" enctype="application/x-www-form-urlencoded" accept-charset="utf-8" action="/cgi-bin/CognatrixSearch">
<dl>
<dt>
Search for:
</dt>

<dd>
<input type="text" value="" name="QIDSearchQuery" size="50%">
</dd>

<dt>
Maximum hits:
</dt>

<dd>
<input type="text" value="10" name="QIDSearchHitLimit" size="4">
</dd>
</dl>

<input type="hidden" value="mythes" name="QIDSearchThesaurus">
<input type="hidden" value="YES" name="QIDSearchDebug">
<input type="hidden" value="NO" name="QIDSearchAdvanced">

<input type="submit" value="Search">
</form>

To express the above in words (considering only the <form> and <input> tags):

  • When the <form> is triggered, all of its inputs are sent to CognatrixSearch.

  • The user is shown two text <input> boxes. Here we are using a definition list (that’s the <dl> <dt> and <dd> tags) to associate labels (such as “Search for:”) with the corresponding text <input> boxes:

    1. The first has no default value (ie, is empty) and is where the user types the query. Whatever the user types is associated with the QIDSearchQuery keyword when the form is triggered.

    2. The second has the default value of 10 and is where the user types the maximum number of hits that he or she wants to receive. The content of this field is associated with the QIDSearchHitLimit keyword.

  • There are three hidden <input> parameters:

    1. The QIDSearchThesaurus keyword is given the value “mythes”, which is the name of the thesaurus to search.

    2. The QIDSearchDebug keyword is given the value “YES”.

    3. The QIDSearchAdvanced keyword is given the value “NO”.

  • A submission button labelled “Search” allows the user to trigger the form.

By varying which fields are obtained from the user and which have preset values, you can obtain a variety of effects:

  • A simple search field, such as you might expect to find on a corporate home page, needs a small text input field for QIDSearchQuery with QIDSearchThesaurus and QIDSearchHitLimit passed as hidden parameters. The default for QIDSearchAdvanced is NO, which is almost certainly appropriate for this situation, so it can be omitted.

  • If you have multiple thesauri, the QIDSearchThesaurus keyword could be associated with a popup menu.

  • During development and testing, the QIDSearchDebug keyword might be associated with a checkbox.

  • For an “advanced search” page, it would probably be appropriate to set QIDSearchAdvanced to YES via hidden parameter.

Substitutions in results pages

You should be thoroughly familiar with the information in Chapter 8 of the Cognatrix User Guide about how Cognatrix generates HTML using a scheme of templates and substitution keys because CognatrixSearch uses the same approach. That information will not be repeated here.

Keywords in templates are delimited by “‹” and “›” characters which can be produced with the keyboard combinations Shift+Option+3 and Shift+Option+4, respectively. These are what the Cognatrix User Guide calls General Substitution Delimiters.

The following keywords are available:

Keyword Meaning
QIDSearchQuery * The query as passed to the CGI. This is not the same as the Query String. It only contains the actual query (eg “health AND wealth”).
QIDSearchHitLimit * The hit limit as passed to the CGI or determined from the CognatrixSearch preferences pane.
QIDSearchThesaurus * The name of the thesaurus to be searched as passed to the CGI or determined from the default thesaurus in the CognatrixSearch preferences pane.
QIDSearchDebug * The value of the debug flag as passed to the CGI or modified by the setting of the debug popup in the CognatrixSearch preferences pane.
QIDSearchAdvanced * The value of the advanced flag as passed to the CGI.
QIDSearchResultsCount # The number of hits returned on the current page.
QIDSearchResultSetCount # The number of hits found by CognatrixSearch in the corpus. This will always be greater than or equal to QIDSearchResultsCount.

*  These keywords are, of course, the same as those described in Constructing the HTML for a search. They are echoed as substitution keywords so that you can seed a search form on the results page.

#  Typically used to display information to the user such as “n of m hits”

Configuring CognatrixSearch - the details

You configure CognatrixSearch through its preference panel in System Preferences (see Figure 2).

The values in the text fields in the lower part of the display (“B”) are associated with whichever thesaurus is selected in the thesaurus list (“A”).To change values in area “A”, double-click the field you want to edit. To change values in area “B”, simply type in the field.

To reset any text field to its default value, clear its contents.

The version number for the CognatrixSearch preferences panel is shown at “C”. Note that this version number is separate to that of the CognatrixSearch CGI (see Checking the CGI’s version).

Figure 2
Figure 2

The fields and buttons in the preference pane have the following meanings:

plus

The plus button adds a new thesaurus entry and prepares the thesaurus name field for editing. There is no upper limit on the number of thesauri that may be added.

minus

The minus button removes the selected thesaurus. You must always leave at least one thesaurus defined.

Thesaurus Name:

The name of the thesaurus. Note that this name only has meaning to CognatrixSearch and does not necessarily have to be the same as the name you gave to the thesaurus document in Cognatrix. Unless you have good reasons for doing otherwise, it should be the name of the folder you created when you used Cognatrix’s Generate HTML... command (“mythes” in this example).

Hits:

If the “QIDSearchHitLimit” parameter is not passed to CognatrixSearch as part of a query, this value is used as the default for the maximum number of hits that can be returned on a single page. Must be a value greater than zero and less than or equal to the value of the “Max Hits” column.

Max Hits:

Specifies the upper limit for the “QIDSearchHitLimit” parameter. Must be in the range 1 to 5,000. This parameter is also used to instruct SearchKit as to when it should stop searching. This limit should not be too low because it may give a false impression about the actual number of matches in the corpus.

Max Time:

A recommendation to SearchKit as to the maximum amount of time it should spend searching before returning an answer. Must be in the range 1.0 to 30.0 seconds. The default value of 5.0 seconds should be more than sufficient for all but the largest thesaurus.

Default:

If the “QIDSearchThesaurus” parameter is not passed to CognatrixSearch as part of a query, this value is used as the default for the name of the thesaurus to search. Only one thesaurus can be made the default. If you do not want any thesaurus to be the default, do not turn on any checkbox.

Index:

The absolute path to the SearchKit index. By default, this is assumed to be the file named “index.qidx” in a folder of the same name as the thesaurus which is located directly inside Apache’s document root. It can, however, be anywhere on your computer. Either type an absolute path into the field or use the Set button to choose an index directly.

Templates:

The absolute path to the folder containing the templates to be used for constructing replies. By default, this is assumed to be a folder of the same name as the thesaurus which is located in CognatrixSearch’s Templates folder inside the global Application Support folder. Note that multiple thesauri can share a single set of templates. Either type an absolute path into the field or use the Set button to choose a folder directly.

Server and Port:

The name (and, optionally, the port) of the server that CognatrixSearch should use in the URLs it creates for hits. By default, the server is the host and port on which the search is running. Specifying a different host allows you to separate the SearchKit index from the remainder of the search base. Type either a Bonjour name or a domain name for the target host. If the web server on the target host is listening to a port other than 80, type that port number as well.

URL Prefix:

The path, relative to Apache’s document root, where the search base is to be found. By default, this is assumed to be a folder of the same name as the thesaurus which is located directly below the document root. Either type a relative path into the field or use the Set button to choose a folder directly. Note that the Set button will only accept folders that are within Apache’s document root folder on the host on which you are configuring CognatrixSearch. If you want CognatrixSearch to redirect hits to another host, you will need to type the relative path on that server into this field.

CGI Debugging:

This pop-up menu supports three values:

Disabled
No debugging information will be returned. In other words, any “QIDSearchDebug=YES” parameter will be ignored.
Allowed
Debugging information will be returned if “QIDSearchDebug=YES” is passed on the query.
Enabled
Debugging information will always be returned.
revert

The Revert button can be used to restore all settings to the values they had when the panel was last opened or when the Apply button was clicked.

apply

The Apply button commits all changes. Switching to another preference pane or closing the System Preferences window are synonyms for clicking the Apply button.

Understanding debugging information

When debug mode is enabled, CognatrixSearch returns information that can help you diagnose problems. The information is divided into three sections. The debugging examples below are the result of searching for “health” in the thesaurus named “mythes”.

The CGI Environment section contains information that is passed to CognatrixSearch by Apache. Note that the “⅃” symbol is used below to indicate the end of each logical line.

CGI Environment:

__CF_USER_TEXT_ENCODING = 0x46:0:0 ⅃
CONTENT_LENGTH = 106 ⅃
CONTENT_TYPE = application/x-www-form-urlencoded ⅃
DOCUMENT_ROOT = /Library/WebServer/Documents ⅃
GATEWAY_INTERFACE = CGI/1.1 ⅃
HTTP_ACCEPT = */* ⅃
HTTP_ACCEPT_ENCODING = gzip, deflate ⅃
HTTP_ACCEPT_LANGUAGE = en ⅃
HTTP_CONNECTION = keep-alive ⅃
HTTP_HOST = MyHappyMac.local ⅃
HTTP_REFERER = http://MyHappyMac.local/ ⅃
HTTP_USER_AGENT = Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/419.2.1 (KHTML, like Gecko) Safari/419.3 ⅃
PATH = /bin:/sbin:/usr/bin:/usr/sbin:/usr/libexec:/System/Library/CoreServices ⅃
QUERY_STRING =  ⅃
REMOTE_ADDR = 10.0.1.75 ⅃
REMOTE_PORT = 51818 ⅃
REQUEST_METHOD = POST ⅃
REQUEST_URI = /cgi-bin/CognatrixSearch ⅃
SCRIPT_FILENAME = /Library/WebServer/CGI-Executables/CognatrixSearch ⅃
SCRIPT_NAME = /cgi-bin/CognatrixSearch ⅃
SCRIPT_URI = http://myhappymac.mycompany.com/cgi-bin/CognatrixSearch ⅃
SCRIPT_URL = /cgi-bin/CognatrixSearch ⅃
SERVER_ADDR = 10.0.1.75 ⅃
SERVER_ADMIN = [no address given] ⅃
SERVER_NAME = myhappymac.mycompany.com ⅃
SERVER_PORT = 80 ⅃
SERVER_PROTOCOL = HTTP/1.1 ⅃
SERVER_SIGNATURE = <ADDRESS>Apache/1.3.33 Server at myhappymac.mycompany.com Port 80</ADDRESS> ⅃
SERVER_SOFTWARE = Apache/1.3.33 (Darwin) PHP/4.4.4 ⅃

The Preferences Environment section contains the per-thesaurus configuration information and is derived from the preference pane:

Preferences Environment:

{name = mythes; defaultHits = 10; maximumHits = 100; targetResponseTime = 5; pathToIndex = (null); pathToTemplates = (null); server = (null); serverPort = (null); prefix = (null);} ⅃

Null values indicate where CognatrixSearch will calculate a default value dynamically.

The Search Environment section contains configuration information that is calculated from both the CGI Environment and the preference pane:

Search Environment:

SearchDir = /Library/Application Support/LGOSystems/CognatrixSearch/Templates/mythes; ⅃
searchKitIndexPath = /Library/WebServer/Documents/mythes/index.qidx; ⅃
TemplatesPath = /Library/Application Support/LGOSystems/CognatrixSearch/Templates; ⅃
ThesaurusName = mythes; ⅃
WorkingPrefix = mythes; ⅃

The Raw Query section shows the query, exactly as received by CognatrixSearch. It is displayed here over multiple lines but the text is normally run together.

Raw Query:

QUERY_STRING = QIDSearchThesaurus=hic&
QIDSearchQuery=health&
QIDSearchHitLimit=10&
QIDSearchDebug=YES&
QIDSearchAdvanced=YES ⅃

The Parsed Query section shows how CognatrixSearch has parsed the query. Inconsistencies here may indicate a mal-formed query.

Parsed Query:

QIDSearchAdvanced = YES ⅃
QIDSearchDebug = YES ⅃
QIDSearchHitLimit = 10 ⅃
QIDSearchQuery = health ⅃
QIDSearchThesaurus = mythes ⅃

The optional Query Simplification section shows how CognatrixSearch has rewitten the query if QIDSearchAdvanced is NO and the query string does not appear to contain any advanced-search features.

Query Simplification:

input = health ⅃
simplified = *health* ⅃

The Substitutions section (which appears after any results) shows the values that CognatrixSearch has used to perform keyword substitutions in the resulting HTML.

Substitutions:

QIDSearchAdvanced = YES ⅃
QIDSearchDebug = YES ⅃
QIDSearchHitLimit = 10 ⅃
QIDSearchQuery = health ⅃
QIDSearchResultsCount = 10 ⅃
QIDSearchResultSetCount = 119 ⅃
QIDSearchThesaurus = mythes ⅃

About logging

CognatrixSearch does not maintain a separate log of queries. If you use the “get” method, queries are passed via URL so all the information you need to perform any analysis is contained in Apache’s log. On standard installations of Mac OS X, will find the logs at the path:

/var/log/httpd/

If you use the “post” method, no additional logging information is available.

Checking the CGI’s version

To check the version number of the CGI, execute the following command:

/Library/WebServer/CGI-Executables/CognatrixSearch -ShowVersion YES

Note that the CGI’s version number is separate from the CognatrixSearch preference pane (see Configuring CognatrixSearch - the details).

Using CognatrixSearch with PHP

Although the subject of PHP/CGI integration is well beyond the scope of this document, it is possible to call CognatrixSearch from PHP. This section is not intended to be a step-by-step guide but should be treated as a series of hints pointing the experienced administrator in the right direction.

In order to enable PHP services on Mac OS X, you will need root privileges to edit Apache’s configuration file at the path:

/private/etc/httpd/httpd.conf

You need to remove the leading “#” from two lines:

  • The line beginning “#LoadModule php4_module”
  • The line beginning “#AddModule mod_php4.c”

After you have made the changes and saved the file, you should restart Apache (hint: use the Sharing panel in System Preferences).

To use PHP with CognatrixSearch, you must use a text editor to create a small redirector file containing the following lines (hint: use copy & paste):

<?php
$host=$_SERVER[HTTP_HOST];
$port=$_SERVER[SERVER_PORT];
$cgi="cgi-bin/CognatrixSearch";
$query=$_SERVER[QUERY_STRING];
include("http://$host:$port/$cgi?$query");
?>

Save the file with the name “runquery.php”. The file can have any name so long as it ends with “.php”. If you choose a different name, you will need to substitute as appropriate within these instructions.

Move the redirector file into your published thesaurus folder so that it can be seen by Apache. For example, if your thesaurus was named “mythes”, the appropriate folder would be:

/Library/WebServer/Documents/mythes/

The instructions in the redirector file tell the PHP processor to pass queries onto the CognatrixSearch CGI on the same host.

In the previous discussion about Constructing the HTML for a search, the action associated with the “get” or “post” method pointed directly to the CognatrixSearch CGI. To use PHP, you need to change the action so that it executes the redirector you just created:

<form method="get" action="runquery.php">
... rest of form ...
</form>

When you submit a form with this action, Apache calls PHP to process the runquery.php file which does nothing more than call the CognatrixSearch CGI with the original query. However, because PHP sees the the HTML created by CognatrixSearch, it will interpret any PHP commands it finds before returning the page to the user. This allows you to embed PHP commands into the templates (which you should leave with the “.html” extension rather than rename to “.php”).

Please note that CognatrixSearch still creates the full HTML page structure. This implies that runquery.php should not generate any page content (if there is sufficient demand, this arrangement may be reviewed for future releases of CognatrixSearch).