Navigation Toggle

Thunderstone Parametric Search Appliance Lets Users Dig Deeper (1)

November 6, 2007

Brings More Texis SQL Advantages in an Easy-To-Implement Appliance

CLEVELAND, OH - November 6, 2007 - Thunderstone Software LLC (http://www.thunderstone.com) has added customizable “sort by” and “group by” capabilities to the Thunderstone Parametric Search Appliance, further enhancing this newest product in Thunderstone's innovation-leading line of powerful and flexible enterprise search solutions.

Parametric or attribute-enhanced search helps users narrow down their field of search in an attempt to increase the chances of finding relevant information. It provides context to results that are based on unstructured information, and it allows an administrator or user to sort results in the most meaningful way.

For example, you may be searching for information on a particular subject but are only interested in results written by a specific author and published within a certain timeframe. The parameters (defined attributes) of “author” and “publish date“ help narrow the search for the precise information you want. Keyword, full-text search alone does not offer this opportunity.

Another example would be a search for a particular product-item description across an entire enterprise. Keyword, full-text search will certainly provide a good list of results from a variety of documents and content sources. But, what if you were only interested in the item's occurrence within a specific document type (say, a PO,) and one that was issued after a certain date and by a specific purchasing agent? Keyword, full-text search simply cannot provide this context for the information you're pursuing.

The Thunderstone Parametric Search Appliance can index and make searchable large quantities of full-text data with defined attributes, as well as targeted web content, file server-based content, application database content and other enterprise-based content in a variety of popular formats. Key features of the new Appliance include:

 

  • SQL plus Full-Text Search - Supporting full-text key word searches combined with a user-selected “filter” on up to 50 data fields, the Thunderstone Parametric Search Appliance allows complex queries utilizing standard SQL commands while also taking advantage of capabilities such as concept-based, fuzzy logic, full-text searching and “bounded box” geographic searching.
  • Sorting and Grouping - Administrators and searchers can optimize ‘findability’ by specifying how they want the Thunderstone Parametric Search Appliance to sort and/or group its query results. Results can be sorted and grouped by any defined attributes in the data, and any of the available fields can be used to provide navigation links enabling the searcher to browse the results.
  • Convenient Data Integration - The DataLoad API allows data to be populated from any source, and a wide variety of connectors have been developed for many common enterprise data sources. In addition, the Appliance can directly access existing databases. And built-in, advanced extraction tools can extract data from web pages or files that already exist.

The Thunderstone Parametric Search Appliance provides an easy way to create applications that combine SQL structured queries with advanced full-text retrieval without programming. It delivers more of the power of Thunderstone Texis on a quickly deployed and easily maintained Appliance platform. Now organizations can have the application they want for less time and effort than they believed possible.

About Thunderstone
Thunderstone Software LLC (http://www.thunderstone.com) pioneered simultaneous searching of both structured and unstructured data with the Texis relational database optimized for full-text search. Since 1981 Thunderstone has continued to develop its global reputation as provider of the world's most powerful, scalable and flexible enterprise search solutions.

Sales contact: Fred Harmon

+1 216 820 2200 ext.105

Media contact: Peter Thusat

+1 216 820 2200 ext.118

How Ariba built an enterprise-wide Knowledge Management system based on Texis

October 24, 2007
How Ariba built an enterprise-wide Knowledge Management system based on Texis

A major challenge in today's organization is being able to have a search that is flexible enough to search across all the repositories in the appropriate manner to build an enterprise-wide knowledge management system. In this article we look at how Ariba, Inc. created an enterprise-wide knowledge management system using Thunderstone's Texis as a search solution.

Ariba offers the world's leading Spend Management software and services to a wide range of customers that include people from all industries. It provides a set of both CD-based and on-demand software, along with services related to sourcing products and commodities, negotiating contracts, buying against negotiated contracts and other key components of a complete, end-to-end Spend Management solution. Ariba helps organizations analyze, understand and manage their spending in order to rapidly achieve sustainable cost savings and to improve business process efficiency.

Why knowledge management?

Derek Matthews, Ariba's Lead Knowledge Architect, was initially focused on their Professional Services organization and their implementation methodology. He wanted to be able to drive consistent best practices into every implementation engagement. During every engagement they would learn things and want to capture the assets and templates that were the best practices for each phase of a project engagement with the customer.

According to Matthews, “A key piece of knowledge management is searching. Process-wise, the first thing you do when you've got an issue in front of you is that you search our knowledge base to discover if your issue has already been addressed. Is there already a best practice out there that addresses what you're trying to do, or is there a content item out there that is immediately leverageable for what you're wanting to do? If none of those are the case, then we do what we call research. Research means having to dig really deep into content items and collaboration, discussion forums and all this sort of thing. And eventually maybe you'll find your solution on page 75, paragraph 4 of some technical document in combination with some other user guide and whatever else. Then the idea is to build a solution, put that into our knowledge base and enable other users around the globe to find it when searching. Researching is more costly than searching. So, we want to push everything forward to that less-costly searching process.”

Native Power and Flexibility of Thunderstone's TEXIS,

Ariba initially licensed Thunderstone Texis due to the availability of integration code from their content management system vendor who supported Texis because of the attribute storage and search available in Texis. The integration code turned out to be inefficient and was sluggish for Ariba's end users when pushed beyond its design. Because Thunderstone provides the ability to completely customize everything from the Texis databases to Vortex scripts, Ariba's software engineers developed their own integration with the content management system. The result was greatly improved efficiency with several customized interfaces that enabled users of the company's various portals to search for content items containing full text and file attachments plus all the associated metadata needed for efficient filtering, security and results sorting. Over time, Ariba built their current knowledge base. With every engagement it became smoother and smoother and smoother, and they are able to have everyone, globally, following the same process. And then from Professional Services implementation methodology it started growing into Sales and Marketing material, and then into Customer Support -- the entire CRM environment, engineering, HR, and eventually every department in Ariba got on-board with their Knowledge Management platform, and now they're all big users of it (Ariba has around 2,000 employees in North America, Europe, Asia/Pacific, Latin America and the Middle East).

As Matthews explains, “We started rolling out enhancement wave after enhancement wave and have continued that to this day. One of the latest enhancements that we've rolled out has been federated searching capabilities, what Thunderstone calls the meta-searching. We actually have portlets that we use on various portals across our extranet that allow people to pull together incongruent content based on metasearches that they want to perform. It is possible now, within the last year, for an Ariba employee to do a search and be able to pull a service request from our CRM system, a defect from our engineering quality system, a marketing presentation from our content management system, as well as searching any number of other internal web sites that have been indexed. You can search over all of the sources through one integrated portlet that we built on top of Thunderstone Texis.

“Texis allows Ariba to index widely varying content into a platform that can be securely accessed by users from various portals for our customers, partners, suppliers, prospects and employees. We are able to use the search engine to present dynamic, context-sensitive views of content for users to browse with the ability to refine through full-text searching.”

Customized Search Engine for Knowledge Management

Thunderstone has always believed in providing tools to users, letting them add the intelligence and knowledge they have in their field to the core tools Thunderstone provides, enabling the creation of powerful applications that can give a competitive advantage. By providing a SQL based database that is optimized for full-text searching in combination with traditional database attributes Thunderstone enables customers to use their existing expertise in database application creation and efficiently extend it into the world of search. Another benefit Thunderstone provides is the Texis Web Script language, known as Vortex. This is designed to allow developers to quickly create applications using the Texis database by providing a simple but powerful syntax and library. Customers can take Thunderstone's applications and modify them to suit their needs easily, while at the same time allowing Thunderstone to create new applications for customers, usually in a matter of days.

Matthews continued, “From a knowledge management perspective we were excited to get involved with Thunderstone, because what they brought to us was a completely customizable interface. A lot of companies, when you talk about knowledge management, tend to believe there's one out-of-the-box solution you can install that's going to solve all your problems with no customization. But, that's not reality. What Thunderstone allowed us to do was to completely look at any and every requirement that we have. With Thunderstone Texis we get a platform that we can customize down to the most technical level.

“With many search engine solutions on the market you basically install their software, point it at a URL and say 'go index that data.' And that's fine. For some situations that's no problem. But in our case we need to be able to customize, even down to the database level. We need to be able to modify how this Texis database works. We need to make modifications to the Vortex scripts that go out and search for content so that they can make certain metadata changes based on certain conditions that might be present. Those Vortex scripts need to be able to combine text from within attached files together with the metadata describing that document in addition to the full text that might be within those files. For example, there will be categorization that's tied to those files. There will be filtering tags that are applied, as well as security tags that get applied.

“The last piece is critical. We need to be able to fully customize the interface to our search engine in a secure manner. If you're a customer accessing our search engine, we want to be able to allow you to search all of the content that we provide to every customer as well as the content that's provided to you specifically as a customer. Maybe it's a benchmark report that we want you to be able to see. But, at the same time, we don't want to allow you to search against a benchmark report for another customer. Security is very important. We're allowing you to search the content that you, depending on whatever attributes you have as a company, are able to see. If you bought a certain product, we may want you to be able to search content for that product, but we don't want you to get bogged down searching content for another product that you don't even own. It's useful both from a filtering for ease of use perspective as well as a security perspective -- both equally important aspects.

“Thunderstone's ability to customize based on those very, very specific requirements that we had made it a great choice, because at each level we are able to make it do exactly what we want it to do. With Thunderstone Texis you have the option of using it right out of the box, one size fits all, just install it and run. However, and this is what I think so many people don't realize out in the industry, Texis also enables a level of customization that lets you go down to the 'nth degree' depending on whatever your requirements are.”

“I am amazed at the whole design of the engine itself. It really is powerful. Thunderstone doesn't get enough credit, and I've never quite understood why. When it comes to customizing the database, customizing the indexing and how that works, and customizing the user interface -- those three things -- I have not seen any of Thunderstone's competitors demonstrate the ability to hit all three of those the way Thunderstone does and with the depth that Thunderstone does,” said Matthews.

Starting with a flexible and easily customized search solution from Thunderstone Software in one area of the business, Ariba has been able to deploy an enterprise-wide search based knowledge management system by extending it application by application, incorporating the specific knowledge from each part of the organization, all using the same core skill sets.

Webinator as a customizable way to add vertical search engines to multiple industry web portals

October 3, 2007
Webinator as a customizable way to add vertical search engines to multiple industry web portals

When implementing an optimal solution for the heavy search demands of multiple online properties, a website administrator needs a practical way to easily create and provide a high quality retrieval interface to collections of HTML documents. In this article we review how Trade Press Publishing successfully added powerful and flexible vertical search engines to its popular web portals with the help of Thunderstone's Webinator web index and retrieval system. Webinator serves as an example of the type of applications that can be built around Thunderstone's Texis RDBMS and Web Script.

Trade Press Publishing Corporation http://www.tradepress.com) is a privately-held company based in Milwaukee and a leading provider of market intelligence to the facilities management, building service contractor, housekeeping, cleaning supplies distribution and railroad industries. In addition to publishing business-to-business magazines and eNewsletters, it also produces trade shows and conferences, as well as offering variety of related educational and marketing opportunities.

Jesus Carrillo, Director of Information Technology, joined the company's Pre-Press Division more than 16 years ago. According to him, “I started in the Desktop Publishing Department at an entry - level position that was my first job out of college. And I've been at the same place ever since. The company grew. About ten years ago they dissolved the pre-press part of the business to focus on educational media products and business-to-business publishing. They wanted someone to lead their technology efforts, and they asked me to do that. So, I stayed around and have continued to search out technology applications in the b-to-b publishing space.”

Special Requirements to Index and Search Industry-Focused Web Content

Trade Press Publishing Corporation uses Webinator on four “vertical portal” web sites, including two in the facility management space and two in the sanitary distribution/cleaning space. The main site is at FaciltyZone.com (http://www.facilityzone.com.) Carrillo said the biggest reason he selected Webinator as the indexing and searching tool for Trade Press had to do primarily with Webinator's open-ended customizability.

According to Carrillo, “Probably the single identifying characteristic of the Webinator software, for us, was the ability to get to the source code. And that allowed us the flexibility to put it to work to do the things that we wanted to accomplish with its back-end. For example, we were indexing over six thousand web sites, which is quite a bit of data. And the first results that came up were kind of cool. We could see how, out of these millions of pages, you do a search, and there's some logic in there that says 'these are the ten best ones' out of the millions of pages I've got.

“Taking a closer look at them, we felt that to really bring it to a marketplace and have it be as meaningful as possible to our end users -- we needed to go in and play with a lot of the settings to get the search engine to produce the particular kind of results we believed that our users would typically want to find. The straight-out-of-the-box algorithm for searching didn't have an immediate correlation to precisely what we thought our users would be looking for.

“We spent some pretty significant time working with Thunderstone's tech support, doing tests and evaluations and changes and modifications, trial and error, to get things to the point where now it seems on a regular basis the terms we're punching in are getting the types of results that we know will make our users happy,” Carrillo explained.

Webinator's user features include:

  • simple navigation
  • intelligent query capabilities with natural language processes, special pattern matchers (regular expressions, numeric quantities, fuzzy patterns,) document similarity searches, in-context result listings, link reference reports, proximity controls and set logic

Carrillo continued,” The setup and deployment of Webinator is extremely easy and straightforward. All the core functionality is there plus the ability to access the source code and be as creative and as customized as your capabilities will let you be. In other words, Thunderstone doesn't hold you back. Thunderstone lets you take the product to whatever level you're ready, willing and able to take it. For that reason we've stuck with it, we've used it, and it's been great in that regard. That's not something you're going to get from the Googles of the world.”

‘Locked Box’ Approach of Others Inadequate

“We took a look at the Google appliance. It was brand new at the time. And the reason we didn't go with the Google appliance was we had no control over it. No flexibility. No ability to customize. Basically it was a ‘locked box’ sitting in our office, you know? And that's really not the way we wanted to go about it. We've got technical expertise on staff. We can go in. We can study and learn the scripts. We can make our modifications.

“For instance, when you execute a search on faciltyzone.com -- it executes a search first off of a SQL Server database that we've got on our end. Then it goes and executes it against the Webinator database and combines the two sets of results. So, we've got results that are built into a page that kind of fall on top of the results that come out of the search engine. There's no way that you're going to be able to do that with the Google Search Appliance. You just won't be able to.

“The access to the source code and the flexibility of Webinator were definitely both something of value to us. Basically, we could not have done what we did without it. Working with the tech support team at Thunderstone, we have access to people who will actually call you back and work with you on some crazy questions.”

“We're hoping that Thunderstone will continue to be a leader and help pave the way for how search technology is going to evolve. We'd like to take advantage of the new applications that Thunderstone develops and apply them to our industries and to our users,” Carrillo said.

Web portal administrators looking for a web walking and indexing package to help them add vertical search engines to multiple online properties will appreciate the fact that Thunderstone's Webinator:

  • indexes multiple sites into one common index
  • offers administrators detailed verification and logging of document linkages
  • can index/update documents while the database is in use
  • permits multiple databases at a site
  • features a simple browser interface
  • is written in Texis Web Script for complete flexibility
  • provides an SQL query interface to the database for maintenance and reports
  • allows remote sites to be copied to the local file system
  • lets multiple index engines run concurrently against a common database

Change In Daylight Saving Rules

September 28, 2007

Are Thunderstone Software's products affected?

Texis and Webinator do not require any patches to accommodate the upcoming changes to daylight saving rules as they store dates in UTC, and use the configured timezone to output and import dates. Daylight saving tracking is a feature provided by the operating system and Texis will respect the OS configuration. However, your OS may need patching if it's not already properly configured to handle the new rules.

 

Thunderstone will be issuing patches for the Search Appliance on March 1 (possibly earlier). Use your appliance's "Maintenance->Check for updates" feature if you haven't configured it for automatic updates. Select the package called "timezone-1.0.0".

Are the dates in my database correct?

If you have used, or are using Texis on an unpatched OS, and you import or convert string dates whose string values lie between the old and new DST changeover dates (e.g. Sun Mar 11 02:00:00 2007 and Sun Apr 01 02:00:00 2007 local time for the US), for any year after 2006, then the imported value will be based on the prior rules, and will be output differently after the patch because the OS parsed it wrong. You will need to re-import/convert those dates after patching your OS.

For example the string "2007-03-20 16:30:00" (4:30pm on March 20, 2007) imported to a date field on an unpatched operating system configured for a US timezone will print as "2007-03-20 17:30:00" (5:30pm on March 20, 2007) after patching.

Updating your operating system

Unix and Linux

Your best bet would be to check with your OS distribution vendor for timezone or tzdata updates. Here are links to a few popular ones.
For Solaris see Sun Alert ID 102775.
For RedHat see knowledge base article 7909.
For SUSE see document 3853518.
For doit-your-selfers check out tzdata.tar.gz at ftp://elsie.nci.nih.gov/pub/ for new timezone data. Be sure to link or copy your timezone file to /etc/localtime or whatever's appropriate for your system.

Microsoft Windows

Visit windowsupdate.com to download Update KB928388 in the Optional category. Note that this fix is NOT included in automatic updates. For full details from Microsoft visit http://www.microsoft.com/windows/timezone/dst2007.mspx.

Testing compliance for Texis and Webinator

Note: The following tests assume a US timezone that used and will use the conventional DST rules. Some localities and other countries follow different rules.
The lines here may wrap to fit the page. Enter each command on one long line.

Open a shell or msdos window and cd to the Texis install directory. Then run (for Windows use "texis"1 instead of "bin/texis" in the examples below)

bin/texis -h -d texis/testdb -s "select convert('2007-03-11 03:01:00','date')-convert('2007-03-11 01:59:00','date')"

This test compares one minute before and after the new transition time.
Unpatched you should get "3720". Patched you should get "120".

Then run

bin/texis -h -d texis/testdb -s "select convert('2007-04-01 03:01:00','date')-convert('2007-04-01 01:59:00','date')"

This test compares one minute before and after the old transition time.
Unpatched you should get "120". Patched you should get "3720".

 


1Particularly old installations of Texis may not havetexis.exe in the installation directory but only in the webserver's CGI directory. In that case use the full path to texis.exe to run it.

Confirming updates on the Search Appliance

After installing the timezone-1.0.0 package via "Check For Updates" confirm the installation by going to "Maintenance->Manage Logs" and clicking on "messages". You should see something similar to the following, but with your machine name and timezone (note that the lines are in reverse chronological order).

Feb 27 14:12:58 host logger: timezone finished
Feb 27 14:12:58 host logger: patch ok
Feb 27 14:12:57 host logger: Your timezone is America/New_York
Feb 27 14:12:57 host logger: Installing updated timezone info
Feb 27 14:12:57 host logger: timezone-1.0.0-1
Feb 27 14:12:57 host logger: Preparing packages for installation...

If necessary you can adjust your timezone setting via "Maintenance->Webmin Interface->Change Time Zone".

Further questions

Please contact Thunderstone Support if you have questions.

Webinator as an indexing and retrieval tool for creating vertical search portals on network hubs

September 21, 2007
Webinator as an indexing and retrieval tool for creating vertical search portals on network hubs

Ecological Internet (EI) maintains up-to-date climate, forests and environment portals that serve more than 35,000 visitors a day. By implementing Thunderstone's Webinator, EI enables its website users to search the indexed content of five million URLs and quickly retrieve the desired information.

Why Ecological Internet?

Having earned his B.A. degree in Political Science at Marquette University, Glen R. Barry joined the Peace Corps and went to Papua New Guinea -- where he fell in love with the rainforest while witnessing the tragedy of their very extensive destruction for the sake of making cardboard boxes and other such stuff.

According to him, “During my Peace Corps service in Papua New Guinea from about 1990 I became an early adopter of the Internet and began looking seriously at how networking technologies could be used to facilitate environmental conservation. In the early days of the Internet I was struck by the fact that communication between people anywhere in the world could be used to spread information that would lead to better resource management decisions and better conservation decisions.”

After returning from the Peace Corps he completed an M.S. degree in Conservation Biology and Sustainable Development, as well as a Ph. D. in Land Resources, both from the University of Wisconsin-Madison. His primary research revolved around the creation and maintenance of environmental web portals such as Forests.org -- which became one of the first 10,000 web sites on the Internet. Dr. Barry's Ph.D. dissertation was entitled Global Forests and the Internet: Assessing the Reach and Usefulness of the Forest Conservation Portal.

In 1999 he decided to add search capabilities to Forests.org, while also launching a climate site and an environmental sustainability site.

Customized Search Engine for Web Sites

Dr. Barry explained, “We wanted to be able to make our own customized search engine. We preferred an off-the-shelf solution that we could easily install to crawl, index, search and retrieve content from more than 4,000 reviewed scientific-content sites of interest to our target audience of conservation professionals. I remember searching on the Internet and finding a huge list of spidering and robot software that had about a hundred products on it. A lot of them were ‘open source,’ with little snippets of code. I was more concerned with having a fully implemented product that does what you need it to do. I wasn't interested in doing an open source sort of thing. Where do you go for technical support in those situations? Going through the list, most of them weren't fully implemented packages. Many of them were free, but the amount of time that a small organization would need to spend getting them operational would have offset any cost benefits. There were a few other options, but they were going to be much more expensive than Webinator.

“At that time our entire budget was like fifteen thousand dollars a year (even now it's only about seventy-five thousand dollars a year in mostly $25 - $100 donations.) So, we're a really small organization. We chose Webinator. I think our initial license with Thunderstone was eight thousand dollars, which was a major purchase for us. It was a big deal. We were trying to do something that hadn't been done before. We had a vision that we wanted to create a specialized search engine on forests content, on climate change content and on water conservation content. The whole purchasing and installation process was straightforward. And Webinator was very, very stable. It just ran. I'm using it on a Windows platform. My operating system is Windows.

“We wanted to walk about four thousand sites we were feeding, and then we also wanted to do off-site pages. Here's where I think customized search is so good. Not only are we getting the content of the reviewed four thousand sites that I as a scientist have identified, but also each of those sites has links to other sites that are included in our index. So, you have some synergy where you find unexpected things at other good sites. Webinator is a really well thought-out product that has a lot of different tools built into it. It's a full-functioning web indexing and retrieval package. You can even include or exclude specified external links. For instance, we don't want Green Peace's online store and merchandise in our search engine..”

Network “Hubs” to Support Environmental Professionals

Ecological Internet (EI) does not directly focus on the general audience that's looking for fluffy pictures of panda bears. There are other web sites that do that very well. EI's target audience is primarily conservation professionals who need information retrieval tools and who seek useful data to factually support their own work. These people tend to be already highly motivated on the issues, and what they get from Ecological Internet are practical tools to do their work better.

Dr. Barry had been employed in the biology department at the University of Wisconsin as their ‘bioinformatics person’ until he left several years ago to run Dennmark, Wisconsin-based Ecological Internet, Inc. (http://www.ecologicalinternet.org) on a full-time basis.

“There's a whole branch of science, network science, that over the last decade has studied how diseases spread or how the Internet's organized in a ‘hub’ design comprised of nodes with disproportionately high numbers of links to them. It's like the whole Kevin Bacon ‘six degrees of separation.’ We're all networked, and there are hubs. The Internet is a good demonstration of a lot of these networks. What we tried to do with Ecological Internet was to make a network hub on climate change, a network hub on forests, etc. where all of the best content is linked, indexed and made available in support of intelligent activities to protect the environment. Part of this is awareness, but it's awareness with a purpose to actually achieving something. There is reason to be hopeful. The forces of ignorance and corruption are ominous, but we have new tools - like Webinator - that we've never had before,” said Dr. Barry.

He continued, “I went up there to Thunderstone's headquarters in Cleveland, Ohio to participate in a Webinator training program two years ago. I had already been using the product for six years. During this whole time I think that the Thunderstone Software team has always been very responsive. I don't know of any other comparable product that brings full-text customized search to non-profits at a reasonable price. We wholeheartedly support Thunderstone and would recommend the Webinator search platform highly..”

Ecological Internet (EI) now maintains up-to-date climate, forests and environment portals that serve more than 35,000 visitors a day. By implementing Webinator, EI enables its website users to search the indexed content of five million URLs and quickly retrieve the desired information.

The nonprofits' conservation portals currently include:

EcoEarth.Info (http://www.ecoearth.info)
ClimateArk.org (http://www.climateark.org)
WaterConserve.org (http://www.waterconserve.org)
Forests.org (http://www.forests.org)
OceanConserve.org (http://www.ocenconserve.org)
My.EchoEarth.Info (http://my.ecoearth.info)

Recent