Multicore Solr on Ubuntu 10.04

UPDATE: New post on getting Multicore Solr 3.4 running on Ubuntu 10.04

Been working a lot lately with the Apache Solr project.

Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world’s largest internet sites.

Solr is written in Java and runs as a standalone full-text search server within a servlet container such as Tomcat. Solr uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it easy to use from virtually any programming language. Solr’s powerful external configuration allows it to be tailored to almost any type of application without Java coding, and it has an extensive plugin architecture when more advanced customization is required.

One of the features of Solr is called multicore. Multicore in the context of Solr simply means running multiple instances of Solr using the same servlet container allowing for separate configurations and indexes per core while still allowing administration through one interface. The Solr wiki defines it as:

Multiple cores let you have a single Solr instance with separate configurations and indexes, with their own config and schema for very different applications, but still have the convenience of unified administration. Individual indexes are still fairly isolated, but you can manage them as a single application, create new indexes on the fly by spinning up new SolrCores, and even make one SolrCore replace another SolrCore without ever restarting your Servlet Container.

Although I’ve setup a few instances of Solr using tomcat, I thought I’d write out just how easy it is to get Solr up and running using Ubuntu Server 10.04 as well as talk about some of the scripts I’ve written to make the process of adding, removing and reloading cores easier. This post assumes you have already installed Ubuntu server with internet access as well having a basic understanding of how to use Ubuntu and Linux in general.

Installing Solr
On your Ubuntu server, become root using ‘sudo su -‘ and issue the following command:

This will install Solr from Ubuntu’s repositories as well as install and configure Tomcat. At this point, you have a fully working Solr installation that only needs to be tweaked for your environment. Solr itself lives in three spots, /usr/share/solr, /var/lib/solr/ and /etc/solr. These directories contain the solr home director, data directory and configuration data respectively.

Enable Multicore
Enabling multicore is as simple as creating solr.xml in the /usr/share/solr directory and restarting Tomcat. Once you’ve done this, you only need to restart under certain conditions. Under normal operations, you should never need to restart Tomcat.

Using your favorite text editor create a file called solr.xml at /usr/share/solr with the following contents:

Next, you need to ensure that Tomcat is able to write out new versions of the solr.xml file.  As cores are added or removed, this file is updated.  The following commands ensure Tomcat has write permissions to needed directory and file

That’s it.  You can now issue the following command to restart Tomcat and in turn Solr:

Managing Cores
At this point you’re ready to start creating new cores. Before you can do so however you need create config files, directories and set permissions. In order to make this process a bit easier I created a set of scripts that do all of this for you based on a template config directory.

Create the template config directory by issuing the following command:

Next, edit /etc/solr/conftemplate/solrconfig.xml and find the dataDir option. Change the dataDir line from:

To:

This will ensure the scripts work correctly.

Creating a new Core

Below is the newCore script.  Copy and paste it into a file and call it newCore

You can now create a new core by issuing the following command

On screen you should get something similar to this if it was successful:

If you get any other response, particularly one about permissions, go back and review this post as you’ve most likely missed something.

This script has created a new Solr core with the configuration directory set to /etc/solr/conf/core0/conf.  There you can edit the schema.xml file.  To view the default schema.xml file, you can visit http://localhost:8080/solr/core0/admin/. Replace localhost with the hostname or IP address of your Solr server if it is not localhost.

Next time I’ll talk about how to import documents into a core as well as how to reload a core, swap cores or remove/unload a core and merge the index between two or more cores.

Update:  Here are the rest of the scripts I’ve written for Solr

Reload a Core

Save to a file called reloadCore

 

Swap Cores

Save to a file called swapCores

Unload/Delete a Core

Save to a file called unloadCore

Merge Cores

Save to a file called mergeCores

7 comments

  1. Hi:

    I just wanted to say thank you so much for your clarity and help. I’ve been driving crazy trying to find a decent piece of documentation.

    Great work! 😀

  2. I also want to say thank you for these instructions. I was going to give up and try the multi-core approach by starting with a scratch tomcat server and installing solr after but this works out in a much better format.

  3. Happy to hear it is working well. I’m actually working today on getting solr3 running side by side with the old solr 1.4 on Ubuntu 10.04. Everything is running, just need to modify the scripts for working with cores.

  4. Excellent, thanks for this.
    on ubuntu 11.04 needed first

    apt-get install sun-java6-jdk

    I then replaced solr.xml and solrconf.xml
    with drupal specific files from apache solar module and your scripts work great.

    Thanks!

Leave a Reply

Your email address will not be published. Required fields are marked *