Running a distributed search application in the Amazon cloud

March 24, 2010 | 2 min Read

Doing live presentations is always challenging, isn’t it? Especially here, at EclipseCon, I like to integrate some demo elements in my talks, but in case you are relying on external resources, such as a network, there is always the chance that it is not going to work during your presentation. And that’s exactly what happened when we did our talk on Monday.

For those who couldn’t attend you can find the slides below, for those who were in the room, here is the URL to our distributed demo search application that is still running on the cloud:

cloudle.eclipse.org/search - server has been shut down

We asked the audience at the very beginning of the talk to give us a URL of a website. Then we used g-Eclipse with a small JMX-management extension that we implemented for this talk in order to configure the SMILA framework running on several cloud nodes. (If you don’t know what SMILA is: It is a framework for building search solutions, in our case it was the glue between our exemplary back-end Apache Solr and a small RAP-based search front-end.) The next step was to start the web crawler on this remote machine with g-Eclipse to give it some time to download the web pages below the given URL and to build up an index.

At that point in time I saw some network timeouts. Not a good sign but maybe only a temporary problem that goes away after some minutes, some minutes that we were using to explain what we did. See the slides yourself:

Our exemplary (and simplified) architecture has one front-end node that is running our little RAP search-UI, and several back-end nodes, each of them with a search index of its own and crawling a different set of URLs. At the end of the talk it was planned to make the remote machines known to each other and that means in this case that the front-end needs the addresses of the back-end nodes. Once again, we used g-Eclipse to add the other back-end nodes to its configuration.

Just for the records… it worked well when we tested it before the talk, and it worked immediately after it. Unfortunately we had some kind of weird network problems in our session.