Friday, February 27, 2015

Java Flight Recorder Continuous Recordings

When we are trying to find performance issues, it is sometimes necessary to do continuous recordings with Java Flight Recorder.

Usually we debug issues in an environment similar to a production setup. That means we don't have a desktop environment and we cannot use Java Mission Control for flight recording.

That also means we need to record & get dumps using command line in servers. We can of course use remote connection methods, but it's more easier to get recordings from the server.

With continuous recordings, we need to figure out how to get dumps. There are few options.
  1. Get a dump when the Java application exits. For this, we need to use dumponexit and dumponexitpath options.
  2. Get a dump manually from JFR.dump diagnostic command via "jcmd"
Note: The "jcmd" command is in $JAVA_HOME/bin. If you use the Oracle Java Installation script for Ubuntu, you can directly use "jcmd" without including  $JAVA_HOME/bin in $PATH.

Enabling Java Flight Recorder and starting a continuous recording


To demonstrate, I will use WSO2 AS 5.2.1. First of all we need to enable Java Flight Recorder in WSO2 AS. Then we will configure it to start a default recording.

$ cd wso2as-5.2.1/bin
$ vi wso2server.sh

In VI editor, press SHIFT+G to go to the end of file. Add following lines in between "-Dfile.encoding=UTF8 \" and "org.wso2.carbon.bootstrap.Bootstrap $*"

    -XX:+UnlockCommercialFeatures \
    -XX:+FlightRecorder \
    -XX:FlightRecorderOptions=defaultrecording=true,disk=true,repository=./tmp,dumponexit=true,dumponexitpath=./ \

As I mentioned in my previous blog post on Java Mission Control, we use the default recording option to start a "Continuous Recording". Please look at java command reference to see the meanings of each Flight Recorder option.

Please note that I'm using disk=true to write a continuous recording to the disk. I'm also using ./tmp directory as the repository, which is the temporary disk storage for JFR recordings.

It's also important to note that the default value of "maxage" is set to 15 minutes.

To be honest, I couldn't exactly figure out how this maxage works. For example, if I set to 1m, I see events for around 20 mins. If I use 10m, I see events for around 40 mins to 1 hour. Then I found an answer in Java Mission Control forum. See the thread Help with maxage / limiting default recording disk usage.

What really happens is that maxage threshold is checked only when a new recording chunk is created. We haven't specified the "maxchunksize" above and the default value of 12MB is used. It might take a considerable time to fill the data and trigger removal of chunks.

If you need infinite recordings, you can set maxage=0 to override the default value.

Getting Periodic Java Flight Recorder Dumps


Let's use "jcmd" to get a Java Flight Recorder Dump. For this, I wrote a simple script (jfrdump)


#!/bin/bash
now=`date +%Y_%m_%d_%H_%M_%S`
if ps -p $1 > /dev/null
then
   echo "$now: The process $1 is running. Getting a JFR dump"
   # Dump
   jcmd $1 JFR.dump recording=0 filename="recording-$now.jfr"
else
   echo "$now: The process $1 is not running. Cannot take a JFR dump"
   exit 1
fi

You can see that I have used "JFR.dump" diagnostic command and the script expects the Java process ID as an argument.

I have used recording id as 0. The reason is that the default recording started from the wso2server.sh has the recording id as 0.

You check JFR recordings via JFR.check diagnostic command.

isuru@isurup-ThinkPad-T530:~/test/wso2as-5.2.1$ jcmd `cat wso2carbon.pid` JFR.check
21674:
Recording: recording=0 name="HotSpot default" maxage=15m (running)


I have also used the date for the recording name, which will help us to have multiple files with the date and time of the dump. Note that the recordings will be saved in the CARBON_HOME directory, which is the working directory for the Java process.

Let's test jfrdump script!

isuru@isurup-ThinkPad-T530:~/test/wso2as-5.2.1$ jfrdump `cat wso2carbon.pid`
2015_02_27_15_02_27: The process 21674 is running. Getting a JFR dump
21674:
Dumped recording 0, 2.3 MB written to:

/home/isuru/test/wso2as-5.2.1/recording-2015_02_27_15_02_27.jfr

Since we have a working script to get a dump, we can use it as a task for Cron.

Edit the crontab.

$ crontab -e

Add following line.

*/15 * * * * (/home/isuru/programs/sh/jfrdump `cat /home/isuru/test/wso2as-5.2.1/wso2carbon.pid`) >> /tmp/jfrdump.log 2>&1

Now you should get a JFR dump every 15 minutes. I used 15 minutes as the maxage is 15 minutes. But you can adjust these values depending on your requirements.

See also: Linux Crontab: 15 Awesome Cron Job Examples

Troubleshooting Tips

  • After you edit wso2server.sh, always run the server once in foreground (./wso2server.sh) to see whether there are issues in script syntax. If the server is running successfully, you can start the server in background.
  • If you want to get a dump at shutdown, do not kill the server forcefully. Always allow the server to gracefully shutdown. Use "./wso2server.sh stop"
  • You may not be able to connect to the server if you are running jcmd with a different user. Unless you own the process, following error might happen with jcmd.
isuru@isurup-ThinkPad-T530:~/test/wso2as-5.2.1$ sudo jcmd `cat wso2carbon.pid` help
[sudo] password for isuru: 
21674:
java.io.IOException: well-known file is not secure
 at sun.tools.attach.LinuxVirtualMachine.checkPermissions(Native Method)
 at sun.tools.attach.LinuxVirtualMachine.<init>(LinuxVirtualMachine.java:117)
 at sun.tools.attach.LinuxAttachProvider.attachVirtualMachine(LinuxAttachProvider.java:63)
 at com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:214)
 at sun.tools.jcmd.JCmd.executeCommandForPid(JCmd.java:139)
 at sun.tools.jcmd.JCmd.main(JCmd.java:128)





Thursday, February 26, 2015

Monitor WSO2 Carbon logs with Logstash

The ELK stack is a popular stack for searching and analyzing data. Many people use it for analyzing logs. WSO2 also has a full-fledged Big Data Analytics Platform, which can analyze logs and do many more things.

In this blog post, I'm explaining on how to monitor logs with Elasticsearch, Logstash and Kibana. I will mainly explain logstash configurations. I will not show how to set up Elasticsearch and Kibana. Those are very easy to setup and there are not much configurations. You can just figure it out very easily! :)

If you want to test an elasticsearch server, you can just extract the elasticsearch distribution and start an elasticsearch server. If you are using Kibana 3, you need to use a web server to host the Kibana application. With Kibana 4, you can use the standalone server provided in the distribution.

Configuring Logstash


Logstash is a great tool for managing events and logs. See Getting Started with Logstash if you haven't used logstash.

First of all, we need to configure logstash to get the wso2carbon.log file as an input. Then we need to use a filter to parse the log messages and extract data to analyze.

The wso2carbon.log file is written using log4j and the configurations are in $CARBON_HOME/repository/conf/log4j.properties.

For WSO2 log message parsing, we will be using the grok filter to extract the details configured via log4j pattern layout

For example, following is the pattern layout configured for wso2carbon.log in WSO2 AS 5.2.1 (wso2as-5.2.1/repository/conf/log4j.properties).

log4j.appender.CARBON_LOGFILE.layout.ConversionPattern=TID: [%T] [%S] [%d] %P%5p {%c} - %x %m {%c}%n

In this pattern, the class name ("{%c}") is logged twice. So, let's remove the extra class name. (I have created a JIRA to remove the extra class name from log4j configuration. See CARBON-15065)

Following should be the final configuration for wso2carbon.log.

log4j.appender.CARBON_LOGFILE.layout.ConversionPattern=TID: [%T] [%S] [%d] %P%5p {%c} - %x %m %n

Now when we start WSO2 AS 5.2.1, we can see all log messages have the pattern specified in log4j configuration.

For example:

isuru@isurup-ThinkPad-T530:~/test/wso2as-5.2.1/bin$ ./wso2server.sh start
isuru@isurup-ThinkPad-T530:~/test/wso2as-5.2.1/bin$ tail -4f ../repository/logs/wso2carbon.log 
TID: [0] [AS] [2015-02-25 18:02:00,345]  INFO {org.wso2.carbon.core.init.JMXServerManager} -  JMX Service URL  : service:jmx:rmi://localhost:11111/jndi/rmi://localhost:9999/jmxrmi
TID: [0] [AS] [2015-02-25 18:02:00,346]  INFO {org.wso2.carbon.core.internal.StartupFinalizerServiceComponent} -  Server           :  Application Server-5.2.1
TID: [0] [AS] [2015-02-25 18:02:00,347]  INFO {org.wso2.carbon.core.internal.StartupFinalizerServiceComponent} -  WSO2 Carbon started in 29 sec
TID: [0] [AS] [2015-02-25 18:02:00,701]  INFO {org.wso2.carbon.ui.internal.CarbonUIServiceComponent} -  Mgt Console URL  : https://172.17.42.1:9443/carbon/


Let's write a grok pattern to parse a log message (single line). Please look at grok filter docs for basic syntax in grok patterns. Once you are familiar with grok syntax, it's very easier to write patterns.

There is also an online "Grok Debugger" application to test grok patterns.

Following is the Grok pattern written for parsing above log lines.

TID:%{SPACE}\[%{INT:tenant_id}\]%{SPACE}\[%{WORD:server_type}\]%{SPACE}\[%{TIMESTAMP_ISO8601:timestamp}\]%{SPACE}%{LOGLEVEL:level}%{SPACE}{%{JAVACLASS:java_class}}%{SPACE}-%{SPACE}%{GREEDYDATA:log_message}

You can test this Grok pattern with Grok Debugger. Use one of the lines in above log file for the input.

Grok Debugger
Grok Debugger

We are now parsing a single log line in logstash. Next, we need to look at how we can group exceptions or multiline log messages into one event.

For that we will use the multiline filter. As mentioned in the docs, we need to use a pattern to identify whether a particular log line is a part of the previous line. As configured in the log4j, all logs must start with "TID". If not we can assume that the particular log line belongs to the previous log line.

Finally we need to configure logstash to send output to some destination. We can use "stdout" output for testing. In a production setup, you can use elasticsearch servers.

Logstash Config File


Following is the complete logstash config file. Save it as "logstash.conf"

input { 
    file {
        add_field => {
            instance_name => 'wso2-worker'
        }
        type => "wso2"
        path => [ '/home/isuru/test/wso2as-5.2.1/repository/logs/wso2carbon.log' ]
    }
}

filter {
    if [type] == "wso2" {
        grok {
            match => [ "message", "TID:%{SPACE}\[%{INT:tenant_id}\]%{SPACE}\[%{WORD:server_type}\]%{SPACE}\[%{TIMESTAMP_ISO8601:timestamp}\]%{SPACE}%{LOGLEVEL:level}%{SPACE}{%{JAVACLASS:java_class}}%{SPACE}-%{SPACE}%{GREEDYDATA:log_message}" ]
        }
        multiline {
            pattern => "^TID"
            negate => true
            what => "previous"
        }
    }
}

output {
    # elasticsearch { host => localhost }
    stdout { codec => rubydebug }
}

Please note that I have used "add_field" in file input to show that you can add extra details to the log event.

Running Logstash


Now it's time to run logstash!

$ tar -xvf logstash-1.4.2.tar.gz 
$ cd logstash-1.4.2/bin

We will first test whether the configuration file is okay.

$ ./logstash --configtest -f ~/conf/logstash.conf 
Using milestone 2 input plugin 'file'. This plugin should be stable, but if you see strange behavior, please let us know! For more information on plugin milestones, see http://logstash.net/docs/1.4.2/plugin-milestones {:level=>:warn}
Configuration OK

Let's start logstash

$ ./logstash -f ~/conf/logstash.conf

Now start the WSO2 AS 5.2.1 server. You will now see log events from logstash.

For example:

{
          "message" => "TID: [0] [AS] [2015-02-26 00:31:41,389]  INFO {org.wso2.carbon.core.internal.StartupFinalizerServiceComponent} -  WSO2 Carbon started in 17 sec",
         "@version" => "1",
       "@timestamp" => "2015-02-25T19:01:49.151Z",
             "type" => "wso2",
    "instance_name" => "wso2-worker",
             "host" => "isurup-ThinkPad-T530",
             "path" => "/home/isuru/test/wso2as-5.2.1/repository/logs/wso2carbon.log",
        "tenant_id" => "0",
      "server_type" => "AS",
        "timestamp" => "2015-02-26 00:31:41,389",
            "level" => "INFO",
       "java_class" => "org.wso2.carbon.core.internal.StartupFinalizerServiceComponent",
      "log_message" => "WSO2 Carbon started in 17 sec"
}

Troubleshooting Tips


  • You need to run the server as the file input will read only new lines. However if you want to test a log file from the beginning, you can use following input configuration.


input { 
    file {
        add_field => {
            instance_name => 'wso2-worker'
        }
        type => "wso2"
        start_position => "beginning"
        sincedb_path => "/dev/null"
        path => [ '/home/isuru/test/wso2as-5.2.1/repository/logs/wso2carbon.log' ]
    }
}

  • When you are using multiline filter, the last line of the log file may not be processed. This is an issue in logstash 1.4.2 and I didn't notice the same issue in logstash-1.5.0.beta1. 


I hope these instructions are clear. I will try to write a blog post on using Elasticsearch & Kibana later. :)