@ Hurriyet, we finally launched our new search http://arama.hurriyet.com.tr/
It runs perfect.
I am going to tell you about the technology behind it.
After long time of research and evaluation, we finally decided to go with Lucene, which is a Java Library. But the problem was, we needed a solution that supports any technology. Then we come up with a sub-project of Lucene called SOLR. Which is a HTTP gateway for Lucene. This way we could support any kind of technology and use search as a service (SaaS :P).
SOLR is an highly extendable application that runs behind Tomcat. The output results are XML (but can produce JSON, PHP as well). Each search query is a HTTP GET request and each update (including delete) is HTTP POST request. That's it, this was what we needed.
So basically we had two Linux Servers, each running almost same configuartion with Tomcat and SOLR installed on a port other than 80. One is named "Master" the other named "Slave". Master is responsible for collecting data from SQL Server and update the Search Schema. This is a cron-job that runs in every 15 minutes. Only modified records are updated on Master. Master also has the ability to fully import data from scratch (this can be done in 40 minutes). Master creates a snapshot of the archive in every 15 minutes. Slave is the actual box that is queried. All select/search requests go to Slave. It updates its own schema from Master. This way we can increase the number of Slaves and load balance them.
The Search web server asks search requests to the Slave instead of a regular SQL Server. Web server is an ASP.Net application, that's running on Windows IIS and makes HTTP requests Slave, to get XML results. We also created an Application Block with Enterprise Library, so that Search concept is abstracted from the Web Application.
With this design we get the following benefits:
1. Amazingly fast web requests
2. Clever search engine
3. Ability to scale (both on backend and frontend)
4. Failover clustering
5. Highly customizable design
Monday, July 13, 2009
Subscribe to:
Posts (Atom)

