Category Archives: Software

Installing Elasticsearch Client on PHP

For a simple demonstration of using Elasticsearch programmatically as a web app, it’s a little more practical to use PHP as a starting point to learn how to connect and display search results. As a guideline, the quick-start instruction from Elastic site is a starting point. To expand (possibly complete) the out of the box setup, below are the steps to setup PHP to enable Elasticsearch support.

First, install the PHP Curl support for Apache on Linux:

apt-get -y install php-curl

Setup the PHP Composer in the doc-root folder, as outlined from elasticsearch-php github. Setup the php libraries via Composer:

php composer.phar init
curl -s http://getcomposer.org/installer | php
php composer.phar install --no-dev

Be sure to get the dependency package “elasticsearch/elasticsearch” and use the latest version as default. Note, skip the development package as it’s not really necessary.

Then, edit the composer.json file to include the directive:

   "require": {
            "elasticsearch/elasticsearch": "~6.0"
   }

Finally, create a test page to see if it can connect to the Elasticsearch server:

<?php

require 'vendor/autoload.php';

use Elasticsearch\ClientBuilder;

$hosts = [
   'http://myelasticsearchhost:9200'
];

$client = ClientBuilder::create()
   ->setHosts($hosts)
   ->build();

$params = [
    'index' => 'myindexname',
    'body' => [
        'query' => [
            'match' => [
                'post_title' => 'elasticsearch'
            ]
        ]
    ]
];

$response = $client->search($params);

$totalhits = $response['hits']['total'];
echo "We have $totalhits total hits\n";

echo "<P>The hits are the following:</P>";
$result = null;
$i=0;
while ($i <= $totalhits)
{
        $result[$i] = $response['hits']['hits'][$i]['_source'];
        $i++;
}

foreach ($result as $key => $value)
{
        echo $value['post_title'], "<br>";
}

?>

Output will look something like this:

We have 2 total hits

The hits are the following:


Using Elasticsearch for JBOSS Logs
Deleting Entries in Elasticsearch Based On Timestamp

Update Nov/2019: Since Elasticsearch updated their basic license to include basic username/password security, it’s advisable to set them up. It’s a straight-forward addition:

$hosts = [
   [
      'host' => 'myelasticsearchhost',
      'port' => '9200',
      'scheme' => 'http',
      'user' => 'myElasticUser',
      'pass' => 'myPassword'
   ]
];

Edit November 6, 2020: If there’s an upgrade or re-install of the OS into the latest version (such as from Ubuntu 16.x to 18.x), it is possible the version of cURL installed for PHP is a different one. For example, running php -m reveals:

PHP 7.2.34-8+ubuntu18.04.1+deb.sury.org+1 (cli) (built: Oct 31 2020 16:57:15) ( NTS )
Copyright (c) 1997-2018 The PHP Group
Zend Engine v3.2.0, Copyright (c) 1998-2018 Zend Technologies
with Zend OPcache v7.2.34-8+ubuntu18.04.1+deb.sury.org+1, Copyright (c) 1999-2018, by Zend Technologies

Since it is version 7.2 of PHP, install the cURL PHP library: apt-get install php7.2-curl

Recovering Kibana After Upgrade

Kibana

Elastic is doing rapid development with Elasticsearch. As of this writing, they’re now on version 6.5.3 – when 6.5.2 was released less than 2 weeks ago!  Luckily, with a package install from repo (such as RPM on CentOS/RHEL), the upgrade process to minor versions is less painful.  However, it’s not without its pitfall. For example, an  upgrade from version 6.4.x to the latest 6.5.x could lead to Kibana not able to start due to incompatible indices.

In order to alleviate this, shutdown the Kibana service, and instruct Elasticsearch to perform a recovery on the .kibana index:

curl --user elasticuser:userpassword -s https://search.mydomain.net:9200/.kibana/_recovery?pretty

If it’s connected to a big cluster with a lot of shards, speed up the recovery process without using replicas:

curl --user elasticuser:userpassword -H 'Content-Type: application/json' -XPUT 'https://search.mydomain.net:9200/.kibana/_settings' -d '{ "index" : { "number_of_replicas" : 0 } }'

Give it a few minutes (depending how much data is there) and then start up Kibana service.  If, for some reason, it still takes a long time, there may be a problem with the migration process.  The kibana.log may indicate something like this:

{“type”:”log”,”@timestamp”:”2018-12-12T17:17:40Z”,”tags”:[“warning”,”stats-collection”],”pid”:15141,”message”:”Unable to fetch data from kibana_settings collector”}
{“type”:”log”,”@timestamp”:”2018-12-12T17:17:42Z”,”tags”:[“reporting”,”warning”],”pid”:15141,”message”:”Enabling the Chromium sandbox provides an additional layer of protection.”}
{“type”:”log”,”@timestamp”:”2018-12-12T17:17:42Z”,”tags”:[“info”,”migrations”],”pid”:15141,”message”:”Creating index .kibana_2.”}
{“type”:”log”,”@timestamp”:”2018-12-12T17:17:44Z”,”tags”:[“warning”,”migrations”],”pid”:15141,”message”:”Another Kibana instance appears to be migrating the index. Waiting for that migration to complete. If no other Kibana instance is attempting migrations, you can get past this message by deleting index .kibana_2 and restarting Kibana.”}

Shutdown Kibana again, and delete the .kibana_2 index:

curl --user elasticuser:userpassword -XDELETE https://search.mydomain.net:9200/.kibana_2

Start the Kibana service again and give it a few more minutes to perform house-keeping.  Kibana should be up and running now.

Deleting Entries in Elasticsearch Based On Timestamp

It’s inevitable after ingesting lots of server logs into Elasticsearch, there’s a requirement to delete partial logs, either they were incorrect data or loaded more than once.  When there are millions of data, it’s just inefficient to drop all of the index and start over from the beginning.  Luckily, there’s a solution by using Elasticsearch range by query API:

POST apachelogs-2018.11.02/_delete_by_query?wait_for_completion=false
{
   "query": {
      "range": {
          "@timestamp": { 
               "gte" : "02/11/2018",
               "lte" : "02/11/2018",
               "time_zone": "-07:00",
               "format": "dd/MM/yyyy||yyyy"
          }
      }
   }
}

The directive ?wait_for_completion=false is for use in Kibana dev tools since the GUI will give a gateway timeout if the task takes more than 30 seconds.  Instead, the option will send the task into the background and not wait for it to complete in Kibana UI.

Another important note, the logs are stored in UTC time zone, by default.  Elastic Support and Training staff have confirmed this. Deleting without specifying a timezone will look like partial deletion.  This same problem happens when dropping just one particular day (ie. apachelogs-2018.11.12) index since the entries will overlap with the next day’s index.  Thus, in this case, since it’s a requirement to delete the entire Nov 2 timestamped data, a specific time zone (Pacific Daylight Time) “-07:00” is necessary.

The data will then look like this in Kibana’s Discovery tool:

Deleting An Entire Day Out Of Elasticsearch