Navigation Toggle

May 2009 Newsletter

May 31, 2009

May 2009 - Archive

CONTENTS


HAPPENINGS

SUPERIOR TECH SUPPORT LEADS TO DEVELOPMENT OF THUNDERSTONE SEARCH APPLIANCE VERSION 7

As a direct response to ongoing practical input from customers who regularly interact with our tech support engineers, Thunderstone Software will release Version 7 of the Thunderstone Search Appliance on June 8, 2009. We've added a number of desirable new features — which also apply to our Parametric Search Appliances and to Thunderstone's entire line of Appliance products. These performance enhancements include:

 

    • Faster Category Searching
      We've provided a new setting to improve category search speed when categories are distinct/non-overlapping (i.e., when no URL belongs to more than one category.)

 

 

    • Enhanced Federated Search (Thunderstone Meta Search)
      Search users can now select which back-end profiles to actually search, from a list configured by their Appliance administrator.

 

 

    • Character Match Mode
      Non-English/Unicode character support is greatly improved with this setting, which not only allows case-insensitive searching of foreign characters but also enables ignore-accents ("e" matches "é"), ligature expansion ("oe" matches "œ") and more.

 

 

    • Language Analysis Module
      An optional feature for Appliance owners who pay to have it activated, this setting improves CJK (Chinese/Japanese/Korean) searches with extra processing to put spaces between words so they can be found without wildcards when adjacent to others.

 

 

    • MySQL in DB Walker
      We've added support for crawling MySQL databases to our DB Walker.

 

 

    • Updated "Look and Feel"
      As well as receiving some improvement from having its text slightly reorganized, the administrative interface also now employs Cascading Style Sheets (CSS) for more modern HTML usage.

 

All Thunderstone customers with current Appliance maintenance agreements in place automatically qualify to receive the downloadable Version 7 software update. Please phone +1 216 820 2200 if you have any questions (business days, 10 a.m. - 6 p.m. Eastern Time.)


CUSTOMER QUOTE OF THE MONTH

 

“We use our Thunderstone Search Appliance 1000 not for web searching but for internal directory searches of call logs, packing lists, sales lists, invoices, load sheets, etc. People would formerly request needed documents, and we'd have to manually make photocopies and deliver them — which required lots of time and effort. Now that we have a Thunderstone Search Appliance, everybody can search and find what they want quickly and easily. It seems to work very well for what we need.”

 

Michael Lee
Director of I.T.
Carry-On Trailer Corporation
http://www.carry-ontrailer.com


UPCOMING EVENTS

Thunderstone's R & D team has several other exciting enterprise search development projects in the 2009 pipeline, as our staff continues to work on:

 

    • TEXIS Version 6

 

 

    • Webinator Version 6

 

 

    • TEXIS Catalog — our newest eCommerce search engine for online catalogs

 

Plus, we will continue adding even more features to the Search Appliance products this year. Look for details on all these scheduled releases in future issues of Thunderstone News.


TECH TIPS: USING KEEP/IGNORE TAGS ON YOUR SEARCH APPLIANCE OR WEBINATOR SOFTWARE

Just the Facts, Ma'am — Specifying the content with Keep/Ignore tags

As websites grow larger and more complex, they contain more and more "cruft" that search engines stumble upon. Things that may be small or even hidden by javascript and CSS are front and center to search engines. Standard headers, menus and breadcrumbs are just some of the things that may be polluting your search's data.

Keep Tags and Ignore Tags allow your page authors to indicate what should and shouldn't be used from a webpage. It allows you to trim the fat from your pages so the only thing that gets searched is the content, instead of all the other fluff.

 

    • Keep Tags specify a beginning and ending expression, and only the content between the beginning and end are kept. This is performed on the HTML source, so it's common to use comments as the begin/end tags.

 

 

    • Ignore Tags specify a beginning and ending expression, and the content between the beginning and end tags is DISCARDED. This is also performed on the HTML source, so HTML tags/comments are fair game.

 

Which to use? Both can accomplish the same goal. It's just a question of which will be logistically easier for you. If you have mostly content with just a little bit of extra info, Ignore Tags will probably be easier. If you have a small amount of content awash in a sea of cruft, then putting Keep Tags around the content may be easiest. Plus, you're not limited to using just one or the other. You can also use a combination of Keep Tags and Ignore Tags on your content as you see fit.


Feedback, suggestions and questions are welcome. Send your email to

Thunderstone Offers Version 7 of the Thunderstone Search Appliance

May 1, 2009

Superior Technical Support Continues to Drive Further Product Enhancements

CLEVELAND, OH — Thunderstone Software LLC has released Version 7 of the Thunderstone Search Appliance, adding a number of desirable new features — which also apply to Thunderstone Parametric Search Appliances and to Thunderstone's entire line of Appliance products. These performance enhancements include:

 

    • Faster URL Category Searching
      Thunderstone has provided a new Appliance setting to improve URL category search speed when categories are distinct/non-overlapping (i.e., when no URL belongs to more than one category.)

 

 

    • Enhanced Federated Search (Thunderstone Meta Search)
      Search users can now select which back-end profiles to actually search, from a list configured by their Appliance administrator.

 

 

    • Character Match Mode
      Non-English/Unicode character support is greatly improved with this setting, which not only allows case-insensitive searching of foreign characters but also enables ignore-accents ("e" matches "é"), ligature expansion ("oe" matches "œ") and more.

 

 

    • Language Analysis Module
      An optional feature for Appliance owners who pay to have it activated, this setting improves CJK (Chinese/Japanese/Korean) searches with extra processing to put spaces between words so they can be found without wildcards when adjacent to others.

 

 

    • MySQL in DB Walker
      Thunderstone has added support for crawling MySQL databases to its DB Walker — which already supported Oracle, Microsoft SQL Server, Sybase and PostgreSQL databases. Administrators can schedule the Appliance to automatically perform a database crawl just as they can for file servers, web servers, intranet servers, etc. And, as always, they have the ability to stop/pause database crawls in the same way they can with any other crawl.

 

 

    • Updated "Look and Feel"
      As well as receiving some improvement from having its text slightly reorganized, the administrative interface of the Appliance also now employs Cascading Style Sheets (CSS) for more modern HTML usage.

 

The Thunderstone Search Appliance is a plug-and-play device combining the simplicity of a hosted service with the security and performance of a local solution. Built on Thunderstone's advanced Texis software, the Appliance can handle more than 1,000 typical queries a minute — providing excellent value without adding administrative overhead.

The Thunderstone Parametric Search Appliance combines the flexibility and power of Texis with the ease of use of an appliance. It provides an easy way to create applications that combine full-text and structured data without programming.

Whether configured as a Thunderstone Search Appliance SBE (Small Business Edition,) a Thunderstone Search Appliance (Enterprise Edition) or a Thunderstone Parametric Search Appliance, the Appliance comes with:

  • a one-time, perpetual license that saves customers 40-60 percent (or more) compared to Thunderstone's closest competitor
  • two years of included maintenance, easily extended for additional years at affordable annual rates
  • superior technical support from software engineers readily accessible to customers by phone, email and message board
  • no restrictions on indexing third-party websites for user-empowering applications and for competitive intelligence purposes
  • ability to fully search targeted repositories (file servers, web servers, intranet/portal servers, database servers, application databases, etc.) and to handle files that exceed 30 MB in size
  • an attractive Product Investment Protection Program that makes upgrading a breeze, applying 100 percent of the initial Thunderstone product's purchase price to any desired upgrade

All Thunderstone customers with current Appliance maintenance agreements in place automatically qualify to receive the downloadable Version 7 software update.

Reporters, editors, analysts, I.T. integrators/resellers and prospective customers who would like to see the Thunderstone Search Appliance “in action” may phone Thunderstone at +1 216 820 2200, Monday - Friday, 10 a.m. - 6 p.m. Eastern Time, to arrange a free and personalized product demo. After the scheduled demo, you may also request shipment of a pre-configured Thunderstone Search Appliance that you can thoroughly evaluate in your own unique environment for up to 30 days.

About Thunderstone
As a true industry pioneer — providing some of the world's most powerful, flexible and scalable search solutions since 1981 — Thunderstone Software LLC (http://www.thunderstone.com) has developed hard-to-match expertise in creating high-performance products with tremendous value for governments, NGOs, educational institutions and businesses of all sizes.

Sales contact: Fred Harmon

+1 216 820 2200 ext.105

Media contact: Peter Thusat

+1 216 820 2200 ext.118

April 2009 Newsletter

April 30, 2009

April 2009 - Archive

CONTENTS


HAPPENINGS

Thunderstone Software's John Turnbull (President & CEO) presented a workshop session entitled The Next Generation in Search and Best Practices Today on Friday, April 17, 2009, during the DigitalNow 2009 Conference at Disney's Yacht and Beach Club Resorts in Lake Buena Vista, Florida.

Fred Harmon (Channel Director & CSO) and Peter Thusat (Communication Director & CMO) participated March 30 - April 2, 2009 as exhibitors at the AIIM Expo + Conference in Philadelphia, Pennsylvania.

 

UPCOMING

Keep an eye out for our new case studies coming to this newsletter and to the Thunderstone.com website, including recent success stories on:

 

    • "Using TEXIS to Enable Native American Mixed-Language Searching on a Heritage Education Site"

 

 

    • "Deploying TEXIS to Rapidly Create an eCommerce Search Engine for Online Flooring Catalogs"

 

You may click here to find a current selection of white papers and case studies.


SOME OF TODAY'S BEST PRACTICES IN ENTERPRISE SEARCH

  1. Understand your users.
  2. Don't assume "good searchers."
  3. Keep the search box simple.
  4. Present clear results, quickly.
  5. Get feedback from users, both by asking for it and by collecting statistics.
  6. Build as much knowledge in as you can.
  7. Start incrementally, test, try, and continue to improve.

If you are interested in the completed whitepaper on Best Practices in Search please contact us and we will send it to you when available. We can also arrange one on one consultation if you need some advice or assistance in improving your search capabilities.


QUOTE OF THE MONTH

"A man of genius makes no mistakes. His errors are volitional and are the portals of discovery."

-- from the novel Ulysses
by James Joyce (Irish writer, novelist, poet, playwright, 1882-1941)


TECH TIPS: BACKING UP YOUR SEARCH APPLIANCE SETTINGS

No computer hardware is immune to wear and tear, and the Thunderstone Search Appliances are no exception. Even though your maintenance plan will overnight you a replacement appliance for free in the event of hardware failure, a crashed hard drive would still take all your profiles with it.

Backing up your appliance's settings allows you to quickly recover your users and profiles in the event of a catastrophe. Plus, it provides you with reference data you'll find useful later to see what settings have changed over time.

To back up your Thunderstone Search Appliance settings:

  1. After logging in, click the "Maintenance" link on the left.
  2. Click the "Save Search Appliance settings" link.
  3. Click the "Download" button to download the appliance's settings in XML format.

You can also choose "Restore Search Appliance settings" from the maintenance screen and provide an XML file to load the settings onto an appliance.

It's recommended you back up a copy of your settings in a secure location periodically, rather than just when they're initially set up, as small changes over time can result in big headaches when trying to re-create them all at once.


Feedback, suggestions and questions are welcome. Send your email to

March 2009 Newsletter

March 31, 2009

March 2009 - Archive

CONTENTS


CUSTOMER QUOTE OF THE MONTH

 

"We use the Thunderstone Search Appliance to crawl, index and search Word files, PDFs and other content in our law firm's internal document management system. The Appliance gives us a lot of customization options in the way it operates, with excellent control over precisely what we want to make searchable and what we don't want included. It does everything we need it to do. You can just plug it in and forget about it. It works great. After years of trouble-free performance, when we finally did have a hardware failure — Thunderstone had us quickly up and running again on the same day we received our replacement unit. Their level of customer support is almost unheard of in the I.T. industry."

 

Michael E. Salopek
I.T. Manager
Janik, Dorman & Winter, L.L.P.
http://www.janiklaw.com


TECH TIPS: CONTROLLING YOUR CRAWL WITH WEBINATOR OR THUNDERSTONE SEARCH APPLIANCES — EXCLUDE BY FIELD

Last time we discussed exclusions and requirements for managing what pages your crawler gets, but there's one setting that gets a Tech Tips all to its own: Exclude by Field. It gives you extra power in how you're excluding and what exactly is being excluded.

    • "Metamorph query" matching

      Rather than a prefix or substring match, Exclude by Field uses a "Metamorph query", which is the full-text matching engine used for our normal searches. You can simply type in words to match, or if you begin with a slash (/) then it is treated as a REX expression (our RegEx-like pattern matching language; see the "REX" section in the Vortex documentation on our website for more details).

    • Multiple fields for exclusion

      All previously discussed exclusion & requirement options operate only on the URL itself. Exclude by Field allows you to exclude based on a number of different other areas:

      • HTML — Matches against the raw HTML of the page. Useful if there's something in an HTML comment that you'd like to base the match on.
      • Text — The formatted text of the URL. This is the same text you'd see if you looked at the list/edit info of a page or at the "Match Info" in the search results. Useful if you what to match text but want to ignore any HTML markup that may or may not be present.
      • All Meta — The contents of all available meta fields are put together and then matched against.
      • Meta Field -> — Matches against the contents of a specific meta field, which you specify in the next column "From Meta Field".
      • Keywords, Description, & Mime Type — Matches against the text of these common meta fields.
      • URL — Matches against the URL, just like Exclusion REX. You may want to use this to get the extra Exclude options, listed below.

 

    • What to exclude

      Beyond more power in specifying what to match, Exclude by Field also gives you more control with what to do when you get a match.

      • Pages and Links — This acts like any other exclusion rule. The page and its links are kept completely out of the walk data.
      • Pages only — The content of the page is not included in the walk, but the links from the page ARE followed.
      • Links only — The page is included in the walk, but the links from the page are not followed.

 

    • A word on efficiency

      A disadvantage that Exclude by Field has when using any Field except URL is the page must be fully fetched before the rule can be applied.

      With all other exclusion rules (and Exclude by Field on URL), the URL can be thrown out before the page is fetched an processed.

      When performing Exclude by Field on the content of the page, though, the page must be downloaded and fully processed before we can know if it has HTML or a Body that matches the rules specified.

      When possible, it's better to use other exclusion rules or the URL target for Exclude by Field, as this will allow you to prune URLs before they are fetched. Still, there are many things that Exclude by Field can do that the other settings simply can't (as mentioned below).

    • Example — Excluding directories from a file crawl

      A perfect example of Exclude by Field is directories when performing a file crawl — we can't fully exclude directories because they are what link to all the files, and without them we'd have nothing. Still, we might want them not to show up in the search. We can get this with Exclude by Field.

      • Metamorph Query "//=>>=" (without the quotes) — This is a REX expression for "match anything that ends in a slash". Please see the REX section of the Vortex documentation if you'd like more details on REX syntax.
      • Field - URL
      • Exclude - Pages only — This will keep the contents of the directory "pages" out of the crawl but will still follow the links to get the actual files and use them in the search.

If you have any questions about how to use Exclude by Field, please feel free to contact Thunderstone Support — and we'll discuss it.


HAPPENINGS

The February 2009 issue of CRN, a publication of Everything Channel and ChannelWeb.com, recognized the "top Channel Chiefs in the industry based upon their record of business innovation and dedication to the partner community." This annual list, which CRN calls "Our definitive guide to the movers and shakers of I.T. channel management," included Frederick A. Harmon (Thunderstone's Channel Director & CSO.)

You can visit the CRN website (http://www.crn.com/crn/chiefs/2009cc.jhtml?chief=136) to view pertinent information about Fred Harmon in the 2009 Channel Chiefs list.

 

UPCOMING

Thunderstone's John Turnbull (President and CEO) will present a workshop session entitled The Next Generation in Search: Today's Best Practices on Friday, April 17, 2009, (2:00 p.m. - 3:30 p.m.) during the DigitalNow 2009 Conference at Disney's Yacht and Beach Club Resorts in Lake Buena Vista, Florida.

Session Description
Search has progressed from a complex tool used by librarians through simple tools that let users perform a keyword search, to today's information access tools that can still provide users a simple interface but make use of much of an association's collective knowledge. In this workshop participants will learn what sorts of information can be behind a search engine and how to make it more valuable to users. The session includes a case study from IEEE, the world's largest technical membership association that significantly improved their business by focusing on their customers and helping them access content in new ways.

DigitalNow (http://www.fusionproductions.com/digitalnow/) is an annual conference that brings together senior-level executives and volunteer leaders from some of the most influential professional and trade associations in America. Produced by Fusion Productions and Disney Institute, two of the foremost authorities in adult educational design, with input from registered attendees and a conference advisory board, DigitalNow addresses the critical issues facing association leaders in the digital age.


GET YOUR FREE FLOOR PASS (A $75 VALUE) TO THE MARCH 30 - APRIL 2, 2009 AIIM INTERNATIONAL EXPOSITION + CONFERENCE IN PHILADELPHIA

The AIIM International Exposition + Conference, the yearly gathering for information management professionals across industries and lines of business, will take place Monday, March 30, through Thursday, April 2, 2009, at the Pennsylvania Convention Center in Philadelphia, PA. With 19 tracks, more than 135 conference sessions featuring more than 100 real-world case studies, and an Expo floor showcasing 200+ information management technology solution providers, the event aims to provide attendees with actionable insight they can use.

REGISTER TODAY FOR YOUR FREE EXPO FLOOR PASS
and get access to all keynotes, general sessions,
Expo floor education and the co-located ON DEMAND Expo!

To receive your free pass, use Registration Code: 615M
when you register at WWW.AIIMEXPO.COM
or call +1 888 824 3004.

Your FREE pass comes to you compliments of Thunderstone Software. Please stop by and visit Fred Harmon (Channel Director & CSO) and Peter Thusat (Communication Director & CMO) at Booth 1045.


 

January 2009 Newsletter

January 31, 2009

January 2009 - Archive

CONTENTS


IMPROVING EFFICIENCY IN THE NEW YEAR

Was one of you New Year's resolutions to improve the efficiency of your I.T. infrastructure? Our expert engineering staff is available to clients with a current maintenance contract, and for the next month you can schedule a free 15-minute consultation on a first come first served basis to discuss your architecture and get immediate advice on improving your solution. If you need more time, that can also be arranged. There are a limited number of time slots available. So, make sure to call today at +1 216 820 2200.


TECH TIPS: THE MANY WAYS OF SPECIFYING URLS FOR THE CRAWL WITH WEBINATOR OR YOUR SEARCH APPLIANCE

There are a number of ways to specify what URLs you'd like the software to crawl, and which will be easiest to use can depend on your situation.

 

    • Base URL
      The old standby -- URLs listed in the Base URL will be crawled, and the entirety of all pages they link to will be included. If you only have one or two sites and start from the top, this is definitely the way to go.

 

 

    • URL URL
      Sometimes you may have dozens or hundreds of base URLs, maybe for doing many different folders on a site (but not all of them). If putting them all in a text box is starting to feel unwieldy, you can use the URL URL instead.

       

      Create a text file somewhere on your website that contains the URLs you want to use as Base URLs, each on its own line. Then you can specify the URL to _that_ in the URL URL setting. The URL URL is fetched by the crawler, and every URL is treated as a Base URL. This can make it easier to manage a frequently-changing list of URLs.

      This is the only benefit of URL URL. The pages will still be crawled EXACTLY as if they were all listed as Base URLs. It exists only to make the list easier to manage for you.

 

 

    • Single Page
      Sometimes you have a single page, or a handful of pages, where you just want that page crawled, but none of its links. This is exactly what Single Page is for.

       

      The URLs listed in Single Page are fetched, and their links are ignored.

 

  • Page URL
    Just as "URL URL" is a list of URLs for "Base URL", "Page URL" is a list of URLs for "Single Page". URLs listed here should point to a plain text file on your server, each URL on its own line. Every one of those pages is fetched, and their links are completely ignored.

     


QUOTE OF THE MONTH

"I should take a moment to let you know how much we appreciate the Webinator product. For us, it's very fast, easy to configure and meets all our needs. Thanks for such a great product!"

David Arbuthnot
VP IT
MS Society of Canada
http://www.mssociety.ca


MORE CUSTOMER SUCCESS STORIES COMING THIS YEAR

In the coming months you'll find a number of interesting new case studies at the Thunderstone.com website. As usual, we'll also feature links to them them here in this newsletter. Keep an eye out for them. You won't want to miss these case studies of TEXIS, Webinator and Thunderstone Search Appliances "in action".


Feedback, suggestions and questions are welcome. Send your email to editor@thunderstone.com.

Recent