MNEM OTRIX SYSTEMS, INC.
Tech nical Discussion & Frequently Asked Questions
   
   
Company Background
Our Information Mining Technology
Frequently Asked Questions
Return to About Us
HOME

Introduction

Information and documents, books and magazines Words and Ideas consist of ideas and images that are linked together as a person goes through the media by ideas. These ideas are communicated in blocks, or clusters made up of paragraphs, comprised of sentences from a broad vocabulary. As language has evolved through the centuries we learned to use words in context with other words in an ordered stream to communicate complex ideas to each other.

Inherent in the basic design mandate of our original search capabilities from Day One, was the necessity to find very specified yet broad sets of items, intersecting with other very specified yet broad sets of items, within variable yet controllable units of text, such as sentences, or paragraphs, or whole documents, or individual lines, or even just a specified number of characters. To this day, our ability to find things other systems cannot, rests largely on the original search engine requisites which allowed for the finding of many complex types of search items or sets of items, in very precise association with any number of other search items or sets of items, and within a dynamically adjustable unit of text. The ability to fine tune these elements dictates relevance. And the ability to quickly obtain not just any, but specifically designated RELEVANT text and images, greatly aids individual problem solving capability and cognitive acceleration in one's own field.

A basic requirement for the ability to do this, and something which distinguishes this technology, is an advanced package of very rapid pattern matching algorithms and a customizable language/meaning matrix of over 250,000+ word connections in a dense web of associations, the result of decades of private Research and Development. By using our package of development tools the operational time to build a completely customized Competitive Intelligence System, or an online library or information catalog, or an integrated text and image database, is considerably shorter than by using other approaches, as well as more operationally effective in its ability to locate relevant responses which would be otherwise missed using less robust or mature retrieval methodologies.

Over the years the technology we developed was fleshed out and stabilized in a commercially available relational database management system called "Texis", with its own web scripting language called "Vortex". Making use of these tools, we write our own proprietary web scripts making use of all the basic features of this software for which it was originally designed, including the ability to manipulate the complex "Equivalence Matrix", or "Thesaurus", which generates information correlation capability. In this way we can link database dexterity containing our Intelligent Search Engine with the online forms support found in web browsers. This combination allows us to rapidly create globally distributable information services via the Internet, harnessing the power of the software in completely customized applications specific to each new user group.

We maintain our own Unix server on which the base software resides, and from which we host multiple, firewall secured, ID/Password protected applications with custom databases we house and custom interfaces we write. Using best-of-class software and 24-7 network backup and technical support from the company we founded in 1981 we are able to provide economical, precise, and secure information mining applications for special case scenarios.

Our Information Mining Technology

At the core of all its applications, Mnemotrix Systems, Inc. uses the information mining technology contained in Texis, an intelligent Relational Database Management System which is embedded in a huge number of broad applications across the world. Herein we discuss some of the features of this core search technology, as a foundation to the applications which Mnemotrix creates and supports.

These are some of the features which comprise the flexibility of the set of tools we have available, and which we use to create applications which are uniquely customized for each of our clients' special needs.

Frequently Asked Questions

How is Mnemotrix's information mining technology different from other engines?

Our information mining technology uses the only search engine in the world with the structure of a SQL relational database (rdbms = Relational DataBase Management System). SQL as used here means Structured Query Language - not Microsoft's product named with that term! SQL is an industry standard defined by the American National Standards Institute (ANSI), and its counterpart, the International Organization for Standardization (ISO). All major database vendors use SQL as their query language.

SQL provides many advantages for addressing complicated search requirements. It also provides you with the confidence of a reliable, well-defined path for implementing unanticipated new search functionality in the future. SQL is a rich, mature, open standard used by hundreds of thousands of database application developers around the world.

All other information mining engines provide a much narrower range of capabilities based on proprietary interfaces. No other engine provides the versatility of using SQL as its application development model.

How is our technology different from other relational databases?

Our technology is the only relational database that can store and search text documents of unlimited size within standard database tables. All other solutions that purport to accomplish this employ, either explicitly or "under the covers," a loosely coupled external text index, and store documents in a binary large object (blob) field. That approach causes major bottlenecks.

What's so hard about integrating text-search with a relational database?

Text-searching and relational database management and its consequent information mining applications are radically different paradigms for organizing and retrieving information. They were developed over decades as completely separate technologies and do not marry easily. More than 10 years was dedicated and devoted to solving this problem; it is our "core competency," and distinguishes Mnemotrix Systems Inc. with many hundreds of important client applications.

Which database (RDBMS = Relational Database Management System) does our technology use?

Our technology does not "use" another database; it is a complete database itself. However, it can be used as an information mining engine for content residing in any other database.

What platforms do our applications run on?

As a general rule we host our own secure and private applications across the Internet but on a password protected basis where individual users connect from wherever they normally connect to the Internet, and make use of their applications on our own Unix server, which is hosted in a completely secure firewall network environment on a T3. So, generally speaking, platforms are not really an issue. Nevertheless, in the event that an application needs to live in a client's own environment for some special reason, platform is still not really an issue, since Texis and all the major software components of our applications run on the major Unix systems and Windows NT/2000/XP. Supported Unix flavors include Solaris 2.5+, Solaris x86, Linux, Compaq Tru64 (DEC Alpha), FreeBSD, Irix, BSDI, HP-UX, AIX, SCO and Unixware. The compatible platforms are those that are resilient enough to support the requirements of a secure, sensible, and robust hosting environment.

What language was our software written in?

All the code that comprises our information mining software has been written in ANSI compliant 'C' language. Programmer's API source code is available where needed for reference and modification, or when collaborative applications are being supported, or require embedded technology, and has been compiled and tested on at least 22 different Unix (and other) platforms.

Are documents stored within our database, or as separate files?

Either! It depends on the circumstances. Web-searching is a typical example of indexing external documents: we can extract information about the different web pages and build an index (database) based on that; search results consist of links to those pages. On the other hand, in, say, an auction application, the original information typically exists entirely within the database: users input their listings directly into the database; and search results consist of links to records within the database.

Can our technology handle BLOBs (binary large objects)?

Yes. Our technology has a blob-type field useful for storing graphics or other binary data. But note that in our search technology, textual content of any size usually is put in a variable-size (varchar) field. This provides superior text-indexing and searching functionality compared to storing text into blobs. But if you have binary content, our search technology can manage the storage of files much more efficiently than an Operating System file system! That is because our technology keeps track of each record's location on disk, and can fetch it with a single disk seek-and-read operation; whereas operating systems are un-indexed, so that fetching files typically takes four or more seek-and-reads to search through the directory structure.

In plainer terms, we have the architecture to robustly store and manipulate images of great size, especially where the database consists of text descriptions and images together: such as complex medical research applications containing data, MRI's, X-rays, and photos, GPR and GPS data and imagery, and any number of other types of needs in architectural design, city planning, science, and/or defense related, such as strategic studies, manuals, and other complex libraries of data needed. At the same time, we can create a database of pointers to those images, rather than having to store the images themselves, allowing for an extremely flexible database design where images could be located in a variety of places, while the text database was very efficiently searched and managed somewhere else.

How well does our technology scale up? What are the benchmarks?

Our search technology is by far the highest-performance product in the marketplace providing full-text search and data mining within a relational database framework. It powers some of the largest search sites on the internet. When Texis was the search technology engine at eBay it served more than 20 million searches a day, and while eBay has attributed its various outages to unrelated problems; they've never had a crash caused by our search engine! Our own application servers can currently support hundreds of simultaneously running applications under a secure rubric.

How many documents or records can we search?

There is no inherent limit. Our search technology is routinely used on the most heavily trafficked web sites for searching databases of tens of millions of large records. It has been used with hundreds of millions of records with no significant complications.

How quickly are our text indexes updated?

Instantly! Our search technology performs standard database record locking, unlocking, and management of contention. It keeps the data consistent and available for all users while records are being inserted, updated, or deleted. No other search engine performs these database-type functions.

Does our technology do incremental indexing?

Yes. Items added to the database are searchable instantly. Our search technology takes care of all index updating in background.

Can we search data in languages other than English?

Yes, our technology is used in many Latin based languages.

How does our technology handle the 8-bit "accented" characters of Spanish (or French or German or whatever)?

A simple configuration setting tells our search technology which character set you are using. Accent characters and any other non-English characters will be preserved in the data and become fully searchable, if desired.

Can our technology index languages using multi-byte characters (e.g., Chinese, Japanese, Hebrew, etc.)?

Yes, our technology has been used in these languages. However, the issues are somewhat more complicated than for the single-byte alphabets. For example, a specific character in Chinese may sometimes be a word on its own, and other times part of a different word. Chinese readers discern the difference from the context, but there is no indication in the text as to which it is. Any such language application would have to be discussed as a special case application.

Does our technology do "stemming"? How about in other languages?

Stemming refers to a process of stripping a word down to its root by removing suffixes or prefixes (such as the "s" on the end of English plurals), and then searching for valid variations of the root (known as morphemes). Our search technology provides very sophisticated morpheme processing, with default rules that apply to English. Various aspects of morpheme processing may be turned on or off, and the rules customized. A set of morpheme processing rules may be specified for any language. A user organization typically will wish to customize these rules not only for your language, but for a particular type of data or search style.

Does our technology have a thesaurus capability?

Yes, a very extensive one. This is also referred to, and was originally created as and named the "Equivalence Matrix". Our Thesaurus was originally designed to be fully editable, and customized for any special subject or group, and this is one of the features most used by Mnemotrix in customizing an application for some special group or purpose. Our main Thesaurus consists of over 250,000 root English language words with all of their synonyms and concept correlatives, and is automatically drawn upon, along with the add-on thesaurus customized for each new user group, for any query where concept searching is enabled. This ability of the program to build complex sets of synomyms, modeling alternative concept structures, provides the researcher with an automatic means of locating correlated and conceptually linked data, thus making the mining of information relevant to a query significantly enhanced.

Much of the power of this basic feature has gone along the wayside as concentration has been on mass market applications. Mnemotrix is the only company in the world that has mastered the enormous potential of this "User Thesaurus" facility, allowing us to build advanced mining applications based on custom user profiles. These custom applications take information mining into zones of capability simply not possible by other means, and accounts for a large measure of the functionality of the custom systems provided to the list of clients. We have learned by years of hard experience that this action cannot and should not be completely automated, and this is where our personal expertise has been most necessary, and made use of on a consulting basis towards the creation of unique company and application profiles.

Can we mine data according to geographical locations, such as zip or country code?

Yes. Our search technology is unique in its ability to store text records containing geographical locations and their associated image records, and efficiently perform a text search restricted to some distance from a particular point ("swimming pool repair within 10 miles of Columbus, Ohio"). This is accomplished by converting the locations into longitude and latitude. Such applications can be set up with a visual geographic overlay for ease of user interface.

Does our technology handle natural language queries?

Yes. Users may enter any natural language question. By default, matching records or documents may be presented in relevance rank order. There are many settings for "tuning" the rankings.

Does our technology handle phrases? Wildcards?

Yes, both. A typical search form will consider text within quote marks as a phrase, and the asterisk character as a wildcard. If desired, our search technology will accept wildcards within or at the beginning of a word, as well as at the end. These features are under the control of the application developer, who may turn them on or off, or change their behavior in various ways.

Does our technology support Boolean logic?

Yes. Full Boolean logic is standard within the SQL language. Our search technology also understands the + and - operators popularized by web search engines. And our search technology understands set logic, which can be used to express a command of the style "Find records containing n or more words of my query." Absent explicit operators, the default logic is specified by the application developer.

Does our technology contain fuzzy logic?

Yes. The facility we use to accomplish this is called approximate pattern matching. This generates a similarity measure between any two words or patterns, expressed as a percentage of closeness. The user or application developer may control the degree of closeness. This capability most commonly is desired to accommodate spelling mistakes in either the queries or the data. It can be useful in searching scanned documents, which tend to have errors resulting from the imperfect OCR process. Developers should use this feature with caution, however. Fuzzy logic, by its nature, brings back some records unrelated to either the user's query words or the intended meaning. This tends to confuse and annoy users not expecting this style of response.

Do we handle numeric quantities in any special way?

Yes, in fact this feature is exclusive to our text search engine. It allows you to find quantities in textual information in any way they may be represented. For example, our query language allows you to put in a query for, say some numeric quantity greater than a million, and you would be able to find a reference to "1.6 billion dollars" buried within the text.

Can we mix and match these special search items dynamically in one query?

Definitely. We usually use our own experience to design effective, precise queries which will continue to work reliably on a changing data stream, and store them in an easily understandable pull-down menu which the user can simply click on, to profile information related to their needs on an ongoing basis. The combination of specially designed queries with a completely open query capability allows maximum freedom to the researcher. We have also devised ways to help the user build complex queries easily, which can be passed out to multiple databases.

Can we index documents stored on multiple servers?

Yes, elementary! Our information mining technology may create a searchable index of documents anywhere on a network or on the Internet.

Can we sort results by date (or by price, or rating, or whatever)?

Yes. Our sorting power is one of the most popular features. You may sort the results of a text search by any field in your data. For example, if your database contains an "author" field, you can sort search results by author. This works efficiently even on large result sets, by taking advantage of the powerful sorting capability inherent within relational database technology. Our search technology can quickly sort tens of thousands of hits or more. Other search engines either bog down sorting more than a few hundred items, or else their sorting capabilities are much more limited. For example, one major search engine cannot perform relevance-ranking together with sorting; another can sort by date only, not by other fields.

Can our technology find related results ("More like this")?

Yes, that is a standard feature. Our search technology can take any document or text selection and turn it into a search for similar records. This is sometimes called "query by example."

Can our technology search document "zones" separately?

Elementary! What some people call zones, are in database lingo, fields. With our search technology you may query any field separately or in combination with other fields. And queries are not limited to text! If one field (zone) contains a postal code, for example, you could query that with a numeric range such as 90011 through 97000.

Can we get past the complexity of a heavily fielded database?

We can redesign how data is stored and searched so that the user has much better access to all of the information in proximity to other important information. We can write feedback into the results process so that the user can more rapidly ascertain the specific relevance of the result list and accept and reject data faster, getting to the crux of a research problem much more efficiently.

What is our relevance ranking algorithm? Is it tunable?

Our information mining technology contains a sophisticated automated ranking system that may be tuned in various ways. Factors it uses include: closeness of query words to the beginning of a document; order of occurrence of the query words; and proximity (closeness) of query words to each other within a record. These factors may be weighted to change the ranking behavior. As an example of how that might be useful: newspaper articles tend to have the most important material close to the beginning, so in a newspaper search application, you might give that factor more weight.

How can we make a large result listing more meaningful?

We have techniques which allow us to pull smart abstracts from the full text which can be listed in the result listing without having to first read the article. We can also pull smart excerpts which bring up the matched search results into the result listing so that it is faster and more meaningful and requires a fewer number of clicks to get to the heart of a research matter. All this is done on an automated basis by the program so that hand-written abstracts and excerpts need not be done in most cases where the full text is available to the indexed database.

What about our Query Language?

Our Query Language allows us to put our Intelligent Search right into the middle of a completely efficient, robust, relational database management system, so that finally all the best of the world of relational databases can be married to all the best of intelligent text searching. And using our web script language, we can design a web friendly interface which is easy to use, and yet harnesses the power of all this for any type of application. For more information on this aspect, take a look at our Query Help and Tutorial for Intelligent Text Searching.

References and History

For an extensive list and full text of references and publications going back 25+ years click here.

For a list of clients Mnemotrix has supported with this technology click here.

Copyright © 1986-2015 Mnemotrix Systems, Inc.
All International Rights Reserved