Tag Archives: logs

Re-Index With Elasticsearch

Elasticsearch Logo

When dealing with indices, it’s inevitable there will be a need to change the mapped fields. For example, in a firewall log, due to default mappings, a field like “RepeatCount” was stored as text instead of integer. To fix this, first write an ingest pipeline (using Kibana) to convert the field from text to integer:

PUT _ingest/pipeline/string-to-long
{
  "description": "convert RepeatCount field from string into long",
    "processors": [      
      {
        "convert": {
          "field": "RepeatCount",
          "type": "long",
          "ignore_missing": true
        }
      }
    ]
}

Next, run the POST command reindex the old index into the new one, while running the pipeline for conversion:

POST _reindex 
{ 
   "source": { 
      "index": "fwlogs-2019.02.01" 
   }, 
   "dest": { 
      "index": "fwlogs-2019.02.01-v2", 
      "pipeline": "string-to-long"
   } 
}

If there are multiple indices, it’s recommended to use a shell script to deal with the individual index systematically, such as “fwlogs-2019.02.01”, “fwlogs-2019.02.02”, etc.

#!/bin/sh
# The list of index names in rlist.txt file
LIST=`cat rlist.txt`
for index in $LIST; do
  curl -HContent-Type:application/json --user elastic:password -XPOST https://mysearch.domain.net:9200/_reindex?pretty -d'{
    "source": {
      "index": "'$index'"
    },
    "dest": {
      "index": "'$index'-v2",
      "pipeline": "string-to-long"
    }
  }'
done

Finally, clean up the old indices by deleting them. It’s a temptation to use Kibana to DELETE fwlogs-2019.02*, but beware the new indices have the suffix “-v2” and it will be deleted if the wildcard argument is used. Instead use the shell script to delete based on the names specifically listed in the txt file.

#!/bin/sh
# The list of index names in rlist.txt file
LIST=`cat rlist.txt`
for index in $LIST; do
  curl --user elastic:password -XDELETE "https://mysearch.domain.net:9200/$index"
done

Deleting Entries in Elasticsearch Based On Timestamp

It’s inevitable after ingesting lots of server logs into Elasticsearch, there’s a requirement to delete partial logs, either they were incorrect data or loaded more than once.  When there are millions of data, it’s just inefficient to drop all of the index and start over from the beginning.  Luckily, there’s a solution by using Elasticsearch range by query API:

POST apachelogs-2018.11.02/_delete_by_query?wait_for_completion=false
{
   "query": {
      "range": {
          "@timestamp": { 
               "gte" : "02/11/2018",
               "lte" : "02/11/2018",
               "time_zone": "-07:00",
               "format": "dd/MM/yyyy||yyyy"
          }
      }
   }
}

The directive ?wait_for_completion=false is for use in Kibana dev tools since the GUI will give a gateway timeout if the task takes more than 30 seconds.  Instead, the option will send the task into the background and not wait for it to complete in Kibana UI.

Another important note, the logs are stored in UTC time zone, by default.  Elastic Support and Training staff have confirmed this. Deleting without specifying a timezone will look like partial deletion.  This same problem happens when dropping just one particular day (ie. apachelogs-2018.11.12) index since the entries will overlap with the next day’s index.  Thus, in this case, since it’s a requirement to delete the entire Nov 2 timestamped data, a specific time zone (Pacific Daylight Time) “-07:00” is necessary.

The data will then look like this in Kibana’s Discovery tool:

Deleting An Entire Day Out Of Elasticsearch

Converting UTF-16 to ASCII Format Text Files

For those dealing with Windows applications, sometimes the app writes logs with UTF-16 character encoding.  This is the case with Titan-FTP (commercial) software that writes the logs in that format.  Using cygwin or Linux grep command to search through the file will not return any result!  When using vi to examine the file, the escape characters doesn’t show up. The dos2unix command will not strip them, neither.

Fortunately, there is an easy way out. The utility iconv will help convert the file to make it searchable (read: Useful) again.

iconv -c -f utf-16 -t ascii file.log > newfile.log