*nix Documentation Project
·  Home
 +   man pages
·  Linux HOWTOs
·  FreeBSD Tips
·  *niX Forums

  man pages->HP-UX 11i man pages -> DtSearchQuery (3)              
Title
Content
Arch
Section
 

Contents


 DtSearchQuery(library call)                     DtSearchQuery(library call)




 NAME    [Toc]    [Back]
      DtSearchQuery - Perform a DtSearch database search for a specified
      query

 SYNOPSIS    [Toc]    [Back]
      #include <Dt/Search.h>
      int DtSearchQuery(
      void *qry,
      char *dbname,
      int search_type,
      char *date1,
      char *date2,
      DtSrResult **results,
      long *resultscount,
      char *stems,
      int *stemcount);

 DESCRIPTION    [Toc]    [Back]
      DtSearchQuery is the DtSearch API search function.

      DtSearchQuery is passed a query string and some search options,
      performs the requested search, and if successful returns a linked list
      of DtSrResult structures representing the documents satisfying the
      search.

      The results list contains information about the documents that can be
      used for subsequent retrievals, as well as information suitable for
      display to an end user.

    Search Types    [Toc]    [Back]
      DtSearchQuery supports three types of searches: P, W, and S.

    Type P Search Query Strings    [Toc]    [Back]
      Query strings for search type P have the simplest syntax, namely a
      sequence of words separated by ASCII whitespace. Punctuation and
      invalid words are silently discarded by the search engine. The only
      possible syntax error is that all query words happen to be invalid in
      the language of the database.

      Search type P is often used to implement a limited Query-by-Example
      (QBE) search paradigm. In this scenario, users typically paste
      document text from whatever source into a query string text field.
      Their expectation is that the search engine will return the documents
      in the database that are "most similar" to the text of the query
      string, and the statistical sort of the results list usually satisfies
      that expectation.

      Note that although search type P does not use boolean syntax, it is
      actually implemented as a stemmed search (type S search) with implied
      boolean ORs between words.




                                    - 1 -       Formatted:  January 24, 2005






 DtSearchQuery(library call)                     DtSearchQuery(library call)




    Types S and W Boolean Query Strings    [Toc]    [Back]
      Query strings for search types S (stemmed boolean) and W (exact word
      boolean) must be syntactically valid boolean expressions as described
      below. Any string that does not match a valid expression rule is
      invalid and will fail with an error message.

      Query words for all search types may be entered in any codeset for a
      supported DtSearch language, including multibyte languages. Words may
      be identified as invalid by the language module of the database for a
      number of reasons including any words that would not have been indexed
      because they are too short, too long, on the stop list, etc. With one
      exception, linguistically invalid words result in a syntax error. The
      exception is in the case of an "all ANDs" query, where invalid words
      and valid words that happen not to be in the database are silently
      erased from the query string.

      The boolean query operators are the ASCII metacharacters: '&' for AND,
      '|' for OR, '~' for NOT, '(' and ')' for open and close parentheses
      respectively, and '@ nnn' for collocation expressions.

      All expression tokens are separated by ASCII whitespace. Typically
      this i 1 or more space or tab characters. Omitting whitespace
      separators is legal if it can be done unambiguously. For example
      "word1&word2" is a legal expression but "word1word2" would be
      interpreted as a single word token.

      The ASCII "at" sign (@) marks a special boolean collocation operator.
      The collocation operator has the syntax "@n...", the ASCII "at" sign
      followed by one or more ASCII numeric digits, representing an integer
      with value greater than zero. Collocation is a variation of the AND
      search where a user can specify the maximum distance in bytes between
      any two words. In most languages a byte is equivalent to a character
      position. For example to find "ice" and "cream" separated by no more
      than five characters, the search query "ice @5 cream" may be used.
      Unlike other boolean operators, the collocation operator can apply
      only to naked word tokens, not other expressions.  Searches including
      collocation operators are slower than searches without them, and can
      be much slower for common words.

      There are a maximum of 8 distinct word tokens. Collocation operators
      count as part of the 8. There is no limit to the number of operators,
      as long as they match the syntax rules.

           Note:

           Collocation operators are only supported for "Austext flavor"
           databases.  The default flavor of database created by dtsrcreate
           is "Dtinfo flavor," which does not support collocation.

    Boolean Query Syntax Rules    [Toc]    [Back]




                                    - 2 -       Formatted:  January 24, 2005






 DtSearchQuery(library call)                     DtSearchQuery(library call)




      There are only 6 syntax rules and the rules are recursive. Ambiguity
      is resolved by precedence and associativity rules.

         1. valid_expression := word_token

                A valid expression can be just a valid naked word token.
                Semantically, the expression returns all documents
                containing the specified word. The word_token must be a
                valid word in the language of the database being searched.

         2. valid_expression := valid_expression '&' valid_expression

                The ASCII ampersand character is the AND character.
                Semantically, it returns all documents satisfying both the
                first and second expressions (boolean intersection). AND is
                also the "implied" boolean operator in the following sense:
                the query parser will insert an ampersand between words or
                expressions that otherwise would be separated only by
                whitespace. For example "word1 word2" becomes "word1 &
                word2".

         3. valid_expression := valid_expression '|' valid_expression

                The ASCII virgule (vertical slash) character is the OR
                character. It means return all documents satisfying either
                the first or the second expression (boolean union).

         4. valid_expression := '(' valid_expression ')'

                Valid expressions may be recursively nested in ASCII open
                and close parentheses characters. The query parser
                "forgives" two common human errors.  It will automatically
                discard excessive close parentheses characters, and it will
                automatically generate close parentheses characters if
                necessary at the end of a query. For example, "aaa | (bbb &
                ccc)))))) ddd" becomes "aaa | ( bbb & ccc) & ddd", and "aaa
                ((bbbb" becomes "aaa ( ( bbb ) )".

         5. valid_expression := '~' valid_expression

                The ASCII tilde character is the unary NOT operator. It
                returns every document in the database that is not in the
                set satisfying the expression.

         6. valid_expression := word_token collocation_operator word_token

                Collocation operators are permitted only between words, not
                expressions.  Each of the word tokens and the collocation
                operator itself occupy slots in the table of 8 maximum word
                tokens.




                                    - 3 -       Formatted:  January 24, 2005






 DtSearchQuery(library call)                     DtSearchQuery(library call)




    Boolean Associativity and Precedence Table    [Toc]    [Back]
      In order from highest precedence to lowest:
      Associativity   Operator     Example
      (none)          COLLOC
      right           NOT          "aaa~bbb" resolved as "aaa &
                                   (~(bbb)"
      left            AND          "aaa bbb ccc" resolved as "(aaa &
                                   bbb) & ccc"
      left            OR           "aaa|bbb|ccc" resolved as "(aaa |
                                   bbb) | ccc"
      (none)          naked word

    Example Boolean Queries    [Toc]    [Back]
      aaa bbb ccc

      Returns all records that contain at least one occurrence of all three
      words.

      aaa | (bbb ~ccc)

      Retrieves all records containing "aaa" and also all records containing
      "bbb", but not "ccc".

      aaa ~(aaa @1 bbb)

      Returns all records containing "aaa" but omits those where "aaa" is
      one character away from "bbb".

      It is possible to formulate a query that requires retrieving all
      records in the database that contain none of the query words (for
      example, ~aaa. Users should be warned that in a large database such a
      search can take a very long time.

      Using the implied associativity and precedence rules, the ambiguous
      query string aaa ~bbb | ccc ~ddd @10 eee is disambiguated as (aaa &
      (~bbb)) | (ccc & (~(ddd @10 eee))).

 ARGUMENTS    [Toc]    [Back]
      search_type
                Specifies the type of search to perform. Valid values are P,
                W, and S.

                Search type P indicates that the query string is a sequence
                of words separated by ASCII whitespace.  It requests that
                the words be stemmed prior to searching, that all documents
                containing any of the words be returned, that the results
                list be statistically sorted, and that no more than the top
                MaxResults list items be returned where MaxResults is the
                current value returned from DtSearchGetMaxResults. Note that
                a type P search is identical to a type S boolean search with
                an implied boolean OR between words.



                                    - 4 -       Formatted:  January 24, 2005






 DtSearchQuery(library call)                     DtSearchQuery(library call)




                Search types W and S are boolean query searches. They
                indicate that the query string is a sequence of words and
                boolean operators matching the syntax described under "Types
                S and W Boolean Query Strings" above.

                Type S requests that words be stemmed prior to searching.
                Type 'W' requests that words be left unstemmed. Both types
                request that all documents containing the combinations of
                query words specified by the boolean operations be returned,
                that the results list be statistically sorted if possible,
                and that no more than the top MaxResults list items be
                returned whereMaxResults is the current value returned from
                DtSearchGetMaxResults.

      dbname    Specifies which database is to be searched. It is any one of
                the database name strings returned from DtSearchInit or
                DtSearchReinit. If dbname is NULL, the first database name
                string is used.

                Within the specified database, searches will be restricted
                to those documents whose DtSrKeytype.is_selected field is
                nonzero.

      date1 and date2" 10 Specify a range of document dates to use for the
                search. Only documents within the specified range will be
                returned on the results list.

                date1 is the older end of the range and if not NULL,
                requests DtSearch to return only those records younger than
                (that is, after) the specified date.

                date2 is the younger end of the range and if not NULL,
                requests DtSearch to return only those records older than
                (that is before) the specified date.

                It is valid to specify just one of the arguments.

                Undated documents always qualify for a results list
                regardless of search date strings. The format of a valid
                date string is described in DtSearchValidDateString(3).

      stems and stemscount" 10 Specify a character buffer to hold parsed and
                stemmed words and a variable to receive the number of stored
                words.  stems and stemscount are optional; they can be NULL.
                However, if either is specified, they must both be
                specified.

                If specified stemsmust point to a character buffer large
                enough to hold DtSrMAX_STEMCOUNT by DtSrMAXWIDTH_HWORD
                bytes. An array of parsed and stemmed query words will be
                stored here by the API for use by a later call to



                                    - 5 -       Formatted:  January 24, 2005






 DtSearchQuery(library call)                     DtSearchQuery(library call)




                DtSearchHighlight.

                The size of the array will be stored in stemscount.

      results and
                resultscount" 10 Specify where a pointer to the results list
                will be stored and a variable to receive the number of items
                on the list.

                Results lists can be manipulated with several utility
                functions.

                In DtSearch, frequency of occurrence information is
                maintained for words across the whole database and within
                documents. For most queries, results lists are sorted by
                this statistical information and presented to the user as a
                "proximity" number for each document on the list. Proximity
                is meant to appear to a user as a distance, or a measure of
                the nearness of the query to the document. Conceptually, the
                smaller the proximity the "closer" the document is to the
                query and the more likely it will be valuable to the user

                DtSearch searches only one database at a time and returns
                only results lists for that single database. However,
                browsers often provide the illusion of simultaneous searches
                in multiple databases, merging the results lists by
                proximity when completed. Since the domain of knowledge and
                density of words and records may vary from database to
                database, the value of proximity numbers may similarly vary,
                and some databases may be underrepresented on merged results
                lists.

 RETURN VALUE    [Toc]    [Back]
      This function has three common return codes.

      DtSrOK is returned, as well as a results list and stems array, when
      the search was completely successful.

      DtSrNOTAVAIL is returned when the query was valid but the search was
      unsuccessful (that is, no set of documents matched the query). There
      are usually no messages with DtSrNOTAVAIL.

      DtSrFAIL is returned when the search was unsuccessful, usually because
      of an invalid query, and user messages on the MessageList explain why.

      Any API function can also return DtSrREINIT and the return codes for
      fatal engine errors at any time.

 SEE ALSO    [Toc]    [Back]
      DtSrAPI(3), DtSearchReinit(3), DtSearchGetMaxResults(3),
      DtSearchSetMaxResults(3), DtSearchGetKeytypes(3),



                                    - 6 -       Formatted:  January 24, 2005






 DtSearchQuery(library call)                     DtSearchQuery(library call)




      DtSearchValidDateString(3), DtSearchSortResults(3),
      DtSearchFreeResults(3), DtSearchHighlight(3)


                                    - 7 -       Formatted:  January 24, 2005
[ Back ]
      
      
 Similar pages
Name OS Title
DtSearch HP-UX Introduces the DtSearch text search and retrieval system
DtSearchGetKeytypes HP-UX Access the Keytypes array for a DtSearch database
lfind Tru64 Perform a linear search and update
lsearch Tru64 Perform a linear search and update
DtSearchExit HP-UX Perform orderly shutdown of search engine
whatis FreeBSD search the whatis database
apropos FreeBSD search the whatis database
tput HP-UX query terminfo database
tput Linux initialize a terminal or query terminfo database
tput IRIX initialize a terminal or query terminfo database
Copyright © 2004-2005 DeniX Solutions SRL
newsletter delivery service