Full Text Indexing with Zend_Search_Lucene

To provide a robust search facility for content you have stored on a MySQL database, you may be thinking of using MySQLs builting full text indexing facility. This comes with the MyISAM storage engine and can be slightly limiting especially if you would like to use foreign keys and such. So queries such has the following:

 "SELECT *, MATCH (page_content, page_title) AGAINST
(:keywords IN BOOLEAN MODE) AS relevance
FROM pages WHERE MATCH (page_content, page_title) AGAINST
(:keywords2 IN BOOLEAN MODE) ORDER BY relevance DESC";

may not work for you although this may be how you want to go.

The best way I have found is to leave the indexing to another service in general. Enter Zend_Search_Lucene, i believe this is a PHP port of the Apache Lucene project written in Java….

So basically we start by creating our index and document, personally i like to remove my personal classes and all that from the Zend Application as often as I can that way it’s mostly untainted and I can then use the same Zend Application with multiple custom code bases, enough on that already. However you chose to go you need a version of the following:

class My_SearchService {

    protected
    $indexPath,
    $pageService,
    $pageIndexPath,
    $newsIndexPath,
    $document,
    $pageIndex;

    public function setIndexPath($indexPath) {
        $this->indexPath = $indexPath;
    }

    public function __construct($indexPath = NULL) {
        if (is_null($indexPath)) {
            $indexPath = APPLICATION_PATH . '/indexes/';
        }
        $this->setIndexPath($indexPath);
        $this->pageIndexPath = $this->indexPath . 'pageindex';
        $this->newsindexPath = $this->indexPath . 'newsindex';
        $this->pageService = new My_PageService();
    }

    public function createPageIndex() {
        $this->pageIndex = Zend_Search_Lucene::create($this->pageIndexPath);
// this is a simple Zend_Db_Table object returning data from my database
        $pages = $this->pageService->getAllPages()->toArray();

        foreach ($pages as $page) {
            $this->pageIndex
->addDocument(new My_Controller_Plugin_PageIndexer($page));
        }

        // commit index
        $this->pageIndex->commit();
    }

}

This is assuming you have a folder structure that looks a bit like

/applications
–/everything else here
/library
–/My
–/PageService.php
—–/Controller
———-/Plugin
————–/PageIndexer.php

Our PageIndexer is just out extension of Zend_Search_Lucene_Document this is what we will be querying for all our page search needs.

/**
 * Description of PageIndexer
 *
 * @author kaning
 */
class STEMNET_Controller_Plugin_PageIndexer extends Zend_Search_Lucene_Document {

    /**
     * Constructor. Creates our indexable document and adds all
     * necessary fields to it using the passed in document
     */
    public function __construct($document) {
       $this->addField(Zend_Search_Lucene_Field::Keyword('page_id', $document['page_id']));
        $this->addField(Zend_Search_Lucene_Field::UnIndexed('name', $document['page_name']));
        $this->addField(Zend_Search_Lucene_Field::UnIndexed('created', $document['publishdate']));
        $this->addField(Zend_Search_Lucene_Field::UnIndexed('caption', $document['page_caption']));
        $this->addField(Zend_Search_Lucene_Field::Text('title', $document['page_title']));
        $this->addField(Zend_Search_Lucene_Field::UnStored('content', $document['page_content']));
    }

}

Not much introduction here besides the fact that I am adding the fields I want in my index bear in mind that the field data I am adding simply corresponds to what the Zend_Db_table select query returned for me.

Bear in mind that you really need to create the index only once. You will be updating it in subsequent times.

I formed the basis of this post from this tutorial. Look it up for more information

class STEMNET_SearchService {

protected
$indexPath,
$pageService,
$pageIndexPath,
$newsIndexPath,
$document,
$pageIndex;

public function setIndexPath($indexPath) {
$this->indexPath = $indexPath;
}

public function __construct($indexPath = NULL) {
if (is_null($indexPath)) {
$indexPath = APPLICATION_PATH . ‘/indexes/’;
}
$this->setIndexPath($indexPath);
$this->pageIndexPath = $this->indexPath . ‘pageindex’;
$this->newsindexPath = $this->indexPath . ‘newsindex’;
$this->pageService = new STEMNET_PageService();
}

public function createPageIndex() {
$this->pageIndex = Zend_Search_Lucene::create($this->pageIndexPath);
$pages = $this->pageService->getAllPages();

foreach ($pages as $page) {
$this->pageIndex->addDocument(new STEMNET_Controller_Plugin_PageIndexer($page));
}

// commit index
$this->pageIndex->commit();
}

}

A question on Zend_Acls

So I have been playing with the Zend_Acl for a while now and I managed to integrate it with this site I am working on.

I am however asking myself a few questions here. What is the best way to implement an ACL, which by the way is an Access Control List. On one hand I can do my “isAllowed” checks at the controller level, but do I want the user to get that far?

The other option is to implement our acl in the bootstrap file which always runs first and so makes it easier to check access even before the user gets to the routing and all that.

Now my problem is that I am a big fan of the MVC concept and by the very dynamic nature of the way I am working on this project, my ACLs are provided by a separate datastore i.e a database.

I don’t want to start calling in database adaptors and all that at bootstrap level, because that is just not nice. So I suppose the option here is to go with restricting access at in the init function of my controller.

Par example:

$this->authObject = Zend_Auth::getInstance();
        // if not logged in, redirect to login form
        if (!$this->authObject->hasIdentity()) {
            $returnURL = urlencode('/admin');
            $this->_redirect('/login?returnUrl=' . $returnURL);
        } else {
            $this->userData = $this->authObject->getStorage();
            $this->userRole = $this->userData->read()->role;
        }
//some other instantiations here
if($this->accessControl->isAllowed($this->userRole)){
            $adminNavigation = Zend_Registry::get('AdminNavigation');
            $this->view->sideNavigation = $adminNavigation;

            $uri = $this->_request->getPathInfo();
            $this->view->uri = $uri;
        }else{
            $miscNavigation = Zend_Registry::get('MiscNavigation');
            $this->view->sideNavigation = $miscNavigation;
            $this->view->errorMessage = "This account does not have
enough permissions to be here";
        }

You may notice from all this that I have an Admin Controller and based on our isAllowed value, we provide either an admin navigation or a misc one.

Obviously this can be rewritten to fit the purpose but it’s an example of using ACLs at the controller level.

Feedback welcome

A note on CAPTCHA decorators and Zend_Form_…

So if you have been struggling with setting up decorators with Zend_Form_Element_Captcha, here’s a note.

I used to get an additional textbox with the hash returned by Zend creating the captcha image. And if you were stumped as to how to remove them from your display try the following.

First of all your form element please note that I extracted this from my extension of the Zend_Form class

$captcha = new Zend_Form_Element_Captcha('captcha',
 array('label' => 'Type in the text you see in the image',
 'captcha' => array('captcha' => 'Image',
 'wordLen' => 6,
 'timeout' => 300,
 'height' => 60,
 'width' => 250,
 'font' => APPLICATION_PATH . '/../public/assets/fonts/arial.ttf',
 'fontSize' => 30,
 'imgDir' => APPLICATION_PATH . '/../public/assets/captcha/',
 'imgUrl' => 'http://' . $_SERVER['HTTP_HOST'] . '/assets/captcha/',
 ))
 );// there are certainly more options here but not necessary for my purposes.
//You can try altering the noise levels in the captcha image such as how many
// dots and lines should be created wiuth the image
// google noise levels in Zend_Form_Element_Captcha

This generates my captcha and I can add it to my form by going with

$this->addElements(array($email,$captcha,$submit));

Bear in mind that $name and $email and $submit are also new instances of their respective Zend_Form elements. In other words don’t include them if you’re copying and pasting.

Now after adding your elements you have to define a separate decorator for the Captcha because it’s obviously different from other form elements so here goes:

$captcha->setDecorators(array(
 'Captcha',
 'Errors',
 array('Label', array('separator' => '<br />', 'requiredPrefix' => '* ')),
 array('HtmlTag', array('tag' => 'p', 'class' => 'form-element'))
 )
 );

Hope it’s helpful