An idea has been tossing & turning in my head for months now – 3 months actually. Rarely does an idea stick around that long without me finding some way to dismiss it.
The idea is a hosted customizable search engine, similar in ease of use to GSA (Google Search Appliance) but more capable – more like Lucene.
After searching for hours & hours for a decent hosted search engine, guess what? I found nothing.
The closest thing I was able to find was GAELucene (on Google Code). It’s a Google App Engine version of Lucene. However, the index can only be read-only. It does not support a dynamic index. Without that, it’s useless to me.
Hosted Applications – Some Examples
Just so you are less inclined to think I have finally lost my marbles:
- IIS –> Windows Azure
- SQL Server –> SQL Azure
- Outlook –> Gmail
- Backup –> Mozy
- Bugzilla
- SVN
There is a clear trend towards traditional server/desktop applications moving over to hosted services.
Getting a Customized Search Engine … the Traditional Way:
These days, if you need a customized search solution, your options are as follows:
- Purchase, deploy, and maintain a gigantic enterprise search application (e.g. Google search appliance, Endeca, FAST).
- Integrate Lucene (or another free search engine) into your application (Java/ASP.NET) and develop your own management interface for it.
- Drop the customization and integrate a basic Google search box into your web app, that can only index & search your HTML pages.
Clearly, none of these options are particularly appealing to a small or medium-sized business. Why?
- Option 1 is super-expensive. Not only is the entry cost in excess of $20,000, the cost of maintenance and operation also exceeds $10,000 per month.
- Option 2 is less expensive, although certainly not free, and very time consuming. It could take at least 5 weeks to get a working solution, resulting in more than $5,000 in development costs. Then there’s the cost of hosting Lucene yourself. With a large index, you probably need a dedicated server – around $200 per month!
- Option 3 is super cheap, but it’s not at all what you want. It’s basically the same as giving up.
Is there a better way?
Yes! There is one more option – option 4. But nobody’s built it yet.
Option 4 is a hosted search engine where you control what data flows in & how it comes out, but the management & maintenance is handled by someone else.
Think of it as cloud search.
Two words! Simple.
Enter Project Vmana. Thinking search? Think Vmana.
How would it work?
- You go to vmana.com and sign up for your free entry-level search account.
- Just like App Engine, Vmana is metered. Let’s say your entry-level account has 500 MB of index space and 100,000 queries per month.
- Using an easy-to-use admin interface, you configure your data sources:
- You want some data to be pulled in from your blog, so you give it your RSS feed URL.
- You want it to crawl your website, so you give it your home page URL.
- You set up some exclusion lists using regular expressions to filter out unwanted URLs.
- You have some custom objects with metadata that you will feed in with your own feeder application.
- Vmana handles all of your object types & indexes them regularly. You can check your stats using the built-in dashboard.
- Vmana provides a testing console – a simple web page where you can type in queries, see results, and build out customized result templates for use later.
- You then use the Vmana XML API to send queries from your web application. Your web app just builds the query, sends it to Vmana, and retrieves the results in XML format. Then, you apply a bit of XSL and magic happens – you’ve got your fully-customized search results page.
How much development effort is involved? Probably about 5 days. Under $1,000.
The value in Vmana lies mainly in the management & admin interface. Lucene does not provide it. If you did option 2, you’d have to build it yourself from scratch. Building all that crawling logic and pretty reporting UIs is not that easy, which is why I said 5 weeks, and that’s probably a conservative estimate!
Why “Vmana”? I like the name – it’s short, and the domain was available.
Summary
With Vmana, your costs are reduced to about one-fifth relative to comparable options and you get better value & peace of mind!
The project has already begun. Stay tuned for additional status updates – probably in about 3 days.
Release Schedule
The first phase is a working internal prototype that we can showcase via screenshots. That is probably about 2 weeks away. Following that, a public beta – if one happens at all – would arrive around late August. The quality would be similar to the App Engine beta or the Azure CTP. The beta would continue probably for at least two months. Expect heavy promotional giveaways during the beta (i.e. high quotas).
This is all I will divulge at this time. I have nothing more anyway.
Option 4 is currently being built. nosle.com uses option 4 underneath, but option 4 is not released for public yet. Send email from contact page in sysadmin .AT. nosle.com
The Vmana beta program is now accepting testers!
The beta program starts on September 1, 2010. We are only allowing up to 200 beta testers.