My multicore Solr on Ubuntu 10.04 has proven to be one of my most popular posts yet.  Seeing the success of that post I decided it was time to show how to get the latest version of Solr up and running on Ubuntu 10.04.  As of this writing the latest version of Solr is 3.4.0.

Before we get started you should read and follow my previous post because I borrow all of the config settings from Ubuntu’s Solr 1.4 packages.  The default config settings from the Ubuntu maintainers is still a decent starting point with Solr 3.4.  Once finished you can safely remove the old Solr 1.4 package if you want to.

With a working Solr 1.4 installation in place, we can get started on getting Solr 3.4 running.  You can change some of the following paths if you want, just remember to change them in all of the appropriate places.  Everything you’re about to see should be done as the root user.

Create some required paths

Next, re-own the data dir to the proper user

Download the latest version of Solr

You can get the latest version of Solr from http://lucene.apache.org/solr/ and extract the files into root’s home directory.

Extract the war Solr war file

Extract the Solr war file into a location.  You may need to install the unzip utility with apt-get install unzip.

Install additional libs

There are a few other libs included with the Solr distribution.  You can install anything else you need, I specifically need to have the dataimporthandler add ons.

Configure Multicore

If you want to have multicore enabled you’ll need to perform the following actions.  The rest of this post assumes you have copied this file and will require you to make some changes to support multicore.  I’ve marked steps that can be skipped if you also wish to skip the multicore functionality.

Copy in the multicore config file:

You should now edit the solr.xml file at this point, doing the following:

  • Set persistent to true
  • Remove entries for core0 and core1

Next, change the ownership and permissions so that tomcat is able to modify this file when needed

Copy existing config files

This is where we’re going to borrow some files from Ubuntu’s Solr package maintainer.

Because we simply copied the config files we need to modify them to fit our new environment.  Change the following in the solr-tomcat.xml file:

  • Change docBase to /usr/local/share/solr3
  • Change Environment value to /usr/local/share/solr3

Also edit tomcat.policy file changing:

  • Modify all entries referencing solr to point to appropriate /usr/local location

Change the following in conf/solrconfig.xml:

  • Change <dataDir> to /usr/local/lib/solr3/data

If you are using multicore and you followed the Solr 1.4 multicore post you’ll have a conftemplate directory and you’ll need make changes to conftemplate/solrconfig.xml

  • Change <dataDir> to /usr/local/lib/solr3/data/CORENAME

Create symlinks

Here we’ll create some symlinks to support the way Ubuntu packages Solr.  This is necessary because we copied Ubuntu’s config files and those files reference a few locations.  Creating the symlinks also allows us to continue using the scripts created in the previous post with minimal modifications.

  • cd /usr/local/share/solr3
  • ln -s /usr/local/etc/solr3/conf
  • ln -s /usr/local/etc/solr3/ /etc/solr3
  • ln -s /usr/local/lib/solr3 /var/lib/solr3

Enable/Start the new Solr instance

We can now enable our new Solr 3.4 instance in tomcat by doing the following:

Note that the name of the symlink is important as it will define where we find this instance (/solr vs /solr3).  At this point you can create a new core.  I’ve provided the updated scripts here.

 

UPDATE: New post on getting Multicore Solr 3.4 running on Ubuntu 10.04

Been working a lot lately with the Apache Solr project.

Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world’s largest internet sites.

Solr is written in Java and runs as a standalone full-text search server within a servlet container such as Tomcat. Solr uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it easy to use from virtually any programming language. Solr’s powerful external configuration allows it to be tailored to almost any type of application without Java coding, and it has an extensive plugin architecture when more advanced customization is required.

One of the features of Solr is called multicore. Multicore in the context of Solr simply means running multiple instances of Solr using the same servlet container allowing for separate configurations and indexes per core while still allowing administration through one interface. The Solr wiki defines it as:

Multiple cores let you have a single Solr instance with separate configurations and indexes, with their own config and schema for very different applications, but still have the convenience of unified administration. Individual indexes are still fairly isolated, but you can manage them as a single application, create new indexes on the fly by spinning up new SolrCores, and even make one SolrCore replace another SolrCore without ever restarting your Servlet Container.

Although I’ve setup a few instances of Solr using tomcat, I thought I’d write out just how easy it is to get Solr up and running using Ubuntu Server 10.04 as well as talk about some of the scripts I’ve written to make the process of adding, removing and reloading cores easier. This post assumes you have already installed Ubuntu server with internet access as well having a basic understanding of how to use Ubuntu and Linux in general.

Installing Solr
On your Ubuntu server, become root using ‘sudo su -‘ and issue the following command:

This will install Solr from Ubuntu’s repositories as well as install and configure Tomcat. At this point, you have a fully working Solr installation that only needs to be tweaked for your environment. Solr itself lives in three spots, /usr/share/solr, /var/lib/solr/ and /etc/solr. These directories contain the solr home director, data directory and configuration data respectively.

Enable Multicore
Enabling multicore is as simple as creating solr.xml in the /usr/share/solr directory and restarting Tomcat. Once you’ve done this, you only need to restart under certain conditions. Under normal operations, you should never need to restart Tomcat.

Using your favorite text editor create a file called solr.xml at /usr/share/solr with the following contents:

Next, you need to ensure that Tomcat is able to write out new versions of the solr.xml file.  As cores are added or removed, this file is updated.  The following commands ensure Tomcat has write permissions to needed directory and file

That’s it.  You can now issue the following command to restart Tomcat and in turn Solr:

Managing Cores
At this point you’re ready to start creating new cores. Before you can do so however you need create config files, directories and set permissions. In order to make this process a bit easier I created a set of scripts that do all of this for you based on a template config directory.

Create the template config directory by issuing the following command:

Next, edit /etc/solr/conftemplate/solrconfig.xml and find the dataDir option. Change the dataDir line from:

To:

This will ensure the scripts work correctly.

Creating a new Core

Below is the newCore script.  Copy and paste it into a file and call it newCore

You can now create a new core by issuing the following command

On screen you should get something similar to this if it was successful:

If you get any other response, particularly one about permissions, go back and review this post as you’ve most likely missed something.

This script has created a new Solr core with the configuration directory set to /etc/solr/conf/core0/conf.  There you can edit the schema.xml file.  To view the default schema.xml file, you can visit http://localhost:8080/solr/core0/admin/. Replace localhost with the hostname or IP address of your Solr server if it is not localhost.

Next time I’ll talk about how to import documents into a core as well as how to reload a core, swap cores or remove/unload a core and merge the index between two or more cores.

Update:  Here are the rest of the scripts I’ve written for Solr

Reload a Core

Save to a file called reloadCore

 

Swap Cores

Save to a file called swapCores

Unload/Delete a Core

Save to a file called unloadCore

Merge Cores

Save to a file called mergeCores