My multicore Solr on Ubuntu 10.04 has proven to be one of my most popular posts yet.  Seeing the success of that post I decided it was time to show how to get the latest version of Solr up and running on Ubuntu 10.04.  As of this writing the latest version of Solr is 3.4.0.

Before we get started you should read and follow my previous post because I borrow all of the config settings from Ubuntu’s Solr 1.4 packages.  The default config settings from the Ubuntu maintainers is still a decent starting point with Solr 3.4.  Once finished you can safely remove the old Solr 1.4 package if you want to.

With a working Solr 1.4 installation in place, we can get started on getting Solr 3.4 running.  You can change some of the following paths if you want, just remember to change them in all of the appropriate places.  Everything you’re about to see should be done as the root user.

Create some required paths

mkdir /usr/local/share/solr3
mkdir /usr/local/etc/solr3
mkdir -p /usr/local/lib/solr3/data

Next, re-own the data dir to the proper user

chown -R tomcat6.tomcat6 /usr/local/lib/solr3/data

Download the latest version of Solr

You can get the latest version of Solr from http://lucene.apache.org/solr/ and extract the files into root’s home directory.

wget http://mirrors.axint.net/apache//lucene/solr/<version>/apache-solr-<version>.tgz
tar zxvf apache-solr-<version>tgz

Extract the war Solr war file

Extract the Solr war file into a location.  You may need to install the unzip utility with apt-get install unzip.

cd /usr/local/share/solr3 
unzip /root/apache-solr-<version>/dist/apache-solr-<version>.war

Install additional libs

There are a few other libs included with the Solr distribution.  You can install anything else you need, I specifically need to have the dataimporthandler add ons.

cp /root/apache-solr-3.4.0/dist/apache-solr-dataimporthandler-* WEB-INF/lib/

Configure Multicore

If you want to have multicore enabled you’ll need to perform the following actions.  The rest of this post assumes you have copied this file and will require you to make some changes to support multicore.  I’ve marked steps that can be skipped if you also wish to skip the multicore functionality.

Copy in the multicore config file:

cp /root/apache-solr-3.4.0/example/multicore/solr.xml .

You should now edit the solr.xml file at this point, doing the following:

  • Set persistent to true
  • Remove entries for core0 and core1

Next, change the ownership and permissions so that tomcat is able to modify this file when needed

chown tomcat6.tomcat6 /usr/local/share/solr3
chown tomcat6.tomcat6 /usr/local/share/solr3/solr.xml

Copy existing config files

This is where we’re going to borrow some files from Ubuntu’s Solr package maintainer.

cd /usr/local/etc/solr3
cp -av /etc/solr/* .

Because we simply copied the config files we need to modify them to fit our new environment.  Change the following in the solr-tomcat.xml file:

  • Change docBase to /usr/local/share/solr3
  • Change Environment value to /usr/local/share/solr3

Also edit tomcat.policy file changing:

  • Modify all entries referencing solr to point to appropriate /usr/local location

Change the following in conf/solrconfig.xml:

  • Change <dataDir> to /usr/local/lib/solr3/data

If you are using multicore and you followed the Solr 1.4 multicore post you’ll have a conftemplate directory and you’ll need make changes to conftemplate/solrconfig.xml

  • Change <dataDir> to /usr/local/lib/solr3/data/CORENAME

Create symlinks

Here we’ll create some symlinks to support the way Ubuntu packages Solr.  This is necessary because we copied Ubuntu’s config files and those files reference a few locations.  Creating the symlinks also allows us to continue using the scripts created in the previous post with minimal modifications.

  • cd /usr/local/share/solr3
  • ln -s /usr/local/etc/solr3/conf
  • ln -s /usr/local/etc/solr3/ /etc/solr3
  • ln -s /usr/local/lib/solr3 /var/lib/solr3

Enable/Start the new Solr instance

We can now enable our new Solr 3.4 instance in tomcat by doing the following:

cd /etc/tomcat6/Catalina/localhost
ln -s /usr/local/etc/solr3/solr-tomcat.xml solr3.xml

Note that the name of the symlink is important as it will define where we find this instance (/solr vs /solr3).  At this point you can create a new core.  I’ve provided the updated scripts here.

 

UPDATE: New post on getting Multicore Solr 3.4 running on Ubuntu 10.04

Been working a lot lately with the Apache Solr project.

Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world’s largest internet sites.

Solr is written in Java and runs as a standalone full-text search server within a servlet container such as Tomcat. Solr uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it easy to use from virtually any programming language. Solr’s powerful external configuration allows it to be tailored to almost any type of application without Java coding, and it has an extensive plugin architecture when more advanced customization is required.

One of the features of Solr is called multicore. Multicore in the context of Solr simply means running multiple instances of Solr using the same servlet container allowing for separate configurations and indexes per core while still allowing administration through one interface. The Solr wiki defines it as:

Multiple cores let you have a single Solr instance with separate configurations and indexes, with their own config and schema for very different applications, but still have the convenience of unified administration. Individual indexes are still fairly isolated, but you can manage them as a single application, create new indexes on the fly by spinning up new SolrCores, and even make one SolrCore replace another SolrCore without ever restarting your Servlet Container.

Although I’ve setup a few instances of Solr using tomcat, I thought I’d write out just how easy it is to get Solr up and running using Ubuntu Server 10.04 as well as talk about some of the scripts I’ve written to make the process of adding, removing and reloading cores easier. This post assumes you have already installed Ubuntu server with internet access as well having a basic understanding of how to use Ubuntu and Linux in general.

Installing Solr
On your Ubuntu server, become root using ‘sudo su -‘ and issue the following command:

apt-get install solr-tomcat curl -y

This will install Solr from Ubuntu’s repositories as well as install and configure Tomcat. At this point, you have a fully working Solr installation that only needs to be tweaked for your environment. Solr itself lives in three spots, /usr/share/solr, /var/lib/solr/ and /etc/solr. These directories contain the solr home director, data directory and configuration data respectively.

Enable Multicore
Enabling multicore is as simple as creating solr.xml in the /usr/share/solr directory and restarting Tomcat. Once you’ve done this, you only need to restart under certain conditions. Under normal operations, you should never need to restart Tomcat.

Using your favorite text editor create a file called solr.xml at /usr/share/solr with the following contents:

<solr persistent="true" sharedLib="lib">
 <cores adminPath="/admin/cores">
 </cores>
</solr>

Next, you need to ensure that Tomcat is able to write out new versions of the solr.xml file.  As cores are added or removed, this file is updated.  The following commands ensure Tomcat has write permissions to needed directory and file

chown tomcat6.tomcat6 /usr/share/solr/solr.xml
chown tomcat6.tomcat6 /usr/share/solr

That’s it.  You can now issue the following command to restart Tomcat and in turn Solr:

service tomcat6 restart

Managing Cores
At this point you’re ready to start creating new cores. Before you can do so however you need create config files, directories and set permissions. In order to make this process a bit easier I created a set of scripts that do all of this for you based on a template config directory.

Create the template config directory by issuing the following command:

cp -av /etc/solr/conf /etc/solr/conftemplate

Next, edit /etc/solr/conftemplate/solrconfig.xml and find the dataDir option. Change the dataDir line from:

<dataDir>/var/lib/solr/data</dataDir>

To:

<dataDir>/var/lib/solr/data/CORENAME</dataDir>

This will ensure the scripts work correctly.

Creating a new Core

Below is the newCore script.  Copy and paste it into a file and call it newCore

#!/bin/bash

# creates a new Solr core
if [ "$1" = "" ]; then
echo -n "Name of core to create: "
read name
else
name=$1
fi

mkdir /var/lib/solr/data/$name
chown tomcat6.tomcat6 /var/lib/solr/data/$name

mkdir -p /etc/solr/conf/$name/conf
cp -a /etc/solr/conftemplate/* /etc/solr/conf/$name/conf/
sed -i "s/CORENAME/$name/" /etc/solr/conf/$name/conf/solrconfig.xml
curl "http://localhost:8080/solr/admin/cores?action=CREATE&name=$name&instanceDir=/etc/solr/conf/$name"

You can now create a new core by issuing the following command

./newCore core0

On screen you should get something similar to this if it was successful:

<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int name="QTime">352</int></lst><str name="core">core0</str><str name="saved">/usr/share/solr/solr.xml</str>
</response>

If you get any other response, particularly one about permissions, go back and review this post as you’ve most likely missed something.

This script has created a new Solr core with the configuration directory set to /etc/solr/conf/core0/conf.  There you can edit the schema.xml file.  To view the default schema.xml file, you can visit http://localhost:8080/solr/core0/admin/. Replace localhost with the hostname or IP address of your Solr server if it is not localhost.

Next time I’ll talk about how to import documents into a core as well as how to reload a core, swap cores or remove/unload a core and merge the index between two or more cores.

Update:  Here are the rest of the scripts I’ve written for Solr

Reload a Core

Save to a file called reloadCore

#!/bin/bash

# reloads a Solr core
if [ "$1" = "" ]; then
  echo -n "Name of core to reload: "
  read name
else
  name=$1
fi

if [ ! -d /var/lib/solr/data/$name ] || [ $name = "" ]; then
  echo "Core doesn't exist"
  exit
fi

curl "http://localhost:8080/solr/admin/cores?action=RELOAD&core=$name"

 

Swap Cores

Save to a file called swapCores

#!/bin/bash

# swaps two Solr cores
if [ "$2" = "" ]; then
  echo -n "Name of first core: "
  read name1
  echo -n "Name of second core: "
  read name2
else
  name1=$1
  name2=$2
fi

if [ ! -d /var/lib/solr/data/$name ] || [ $name2 = "" ]; then
  echo "Core doesn't exist"
  exit
fi

curl "http://localhost:8080/solr/admin/cores?action=SWAP&core=$name1&other=$name2"

Unload/Delete a Core

Save to a file called unloadCore

#!/bin/bash

clear
echo "*************************************************************************"
echo "*************************************************************************"
echo
echo "            You are about to *permanently* delete a core!"
echo "                      There is no going back"
echo
echo "*************************************************************************"
echo "*************************************************************************"
echo
echo -n "Type 'delete core' to continue or control-c to bail: "
read answer

if [ "$answer" != "delete core" ]; then
 exit
fi
# removes a Solr core
if [ "$1" = "" ]; then
 echo -n "Name of core to remove: "
 read name
else
 name=$1
fi

if [ ! -d /var/lib/solr/data/$name ] || [ $name = "" ]; then
 echo "Core doesn't exist"
 exit
fi

curl "http://localhost:8080/solr/admin/cores?action=UNLOAD&core=$name"
sleep 5
rm -rf /var/lib/solr/data/$name

rm -rf  /etc/solr/conf/$name

Merge Cores

Save to a file called mergeCores

#!/bin/bash

# merges two Solr cores
if [ "$2" = "" ]; then
  echo -n "Name of first core: "
  read name1
  echo -n "Name of second core: "
  read name2
else
  name1=$1
  name2=$2
fi

if [ ! -d /var/lib/solr/data/$name ] || [ $name2 = "" ]; then
  echo "Core doesn't exist"
  exit
fi

curl "http://localhost:8080/solr/$name1/update" --data-binary '' -H 'Content-type:text/xml; charset=utf-8'
curl "http://localhost:8080/solr/$name2/update" --data-binary '' -H 'Content-type:text/xml; charset=utf-8'
curl "http://localhost:8080/solr/admin/cores?action=mergeindexes&core=$name1&indexDir=/var/lib/solr/data/$name2/index"
curl "http://localhost:8080/solr/$name1/update" --data-binary '' -H 'Content-type:text/xml; charset=utf-8'
curl "http://localhost:8080/solr/$name2/update" --data-binary '' -H 'Content-type:text/xml; charset=utf-8'