最近在搞Kafka集群監(jiān)控,之前也是看了網(wǎng)上的很多資料。之所以使用jmxtrans+influxdb+grafana是因為界面酷炫,可以定制化,缺點是不能操作Kafka集群,可能需要配合Kafka Manager一起使用。
環(huán)境信息
CentOS Linux release 7.6.1810 (Core)
jdk1.8.0_201
zookeeper-3.4.14
kafka_2.11-2.2.0
開啟Kafka JMX端口
JMX(Java Management Extensions,即Java管理擴(kuò)展)是一個為應(yīng)用程序、設(shè)備、系統(tǒng)等植入管理功能的框架。JMX可以跨越一系列異構(gòu)操作系統(tǒng)平臺、系統(tǒng)體系結(jié)構(gòu)和網(wǎng)絡(luò)傳輸協(xié)議,靈活的開發(fā)無縫集成的系統(tǒng)、網(wǎng)絡(luò)和服務(wù)管理應(yīng)用。Kafka做為一款Java應(yīng)用,已經(jīng)定義了豐富的性能指標(biāo),(可以參考Kafka監(jiān)控指標(biāo)),通過JMX可以輕松對其進(jìn)行監(jiān)控。
在${KAFKA_HOME}/bin/路徑下修改kafka-run-class.sh腳本,第一行增加JMX_PORT=9999即可。
JMX_PORT=9999
重啟Kafka
./bin/kafka-server-stop.sh
./bin/kafka-server-start.sh -daemon ./config/server.properties
重啟后查看Kafka以及JMX端口狀態(tài)
ps -ef | grep kafka
root 8273 1 99 02:32 pts/0 00:00:09 /opt/jdk1.8.0_201/bin/java -Xmx1G -Xms1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 …… kafka.Kafka ./config/server.properties
netstat -anop | grep 9999
tcp6 0 0 :::9999 :::* LISTEN 8273/java off (0.00/0/0)
安裝InfluxDB
InfluxDB是一個時間序列數(shù)據(jù)庫,用于處理海量寫入與負(fù)載查詢。InfluxDB旨在用作涉及大量時間戳數(shù)據(jù)的任何用例(包括DevOps監(jiān)控,應(yīng)用程序指標(biāo),物聯(lián)網(wǎng)傳感器數(shù)據(jù)和實時分析)的后端存儲。
下載InfluxDB rpm安裝包
wget https://dl.influxdata.com/influxdb/releases/influxdb-1.7.5.x86_64.rpm
–2019-04-10 02:52:30– https://dl.influxdata.com/influxdb/releases/influxdb-1.7.5.x86_64.rpm
Resolving dl.influxdata.com (dl.influxdata.com)… 54.192.151.21, 54.192.151.81, 54.192.151.87, …
Connecting to dl.influxdata.com (dl.influxdata.com)|54.192.151.21|:443… connected.
HTTP request sent, awaiting response… 200 OK
Length: 46536692 (44M) [application/octet-stream]
Saving to: ‘influxdb-1.7.5.x86_64.rpm’
100%[================================================================================================================================================================================>] 46,536,692 440KB/s in 60s
2019-04-10 02:53:37 (756 KB/s) – ‘influxdb-1.7.5.x86_64.rpm’ saved [46536692/46536692]
安裝rpm包
rpm -ivh influxdb-1.7.5.x86_64.rpm
Preparing… ################################# [100%]
Updating / installing…
1:influxdb-1.7.5-1 ################################# [100%]
Created symlink from /etc/systemd/system/influxd.service to /usr/lib/systemd/system/influxdb.service.
Created symlink from /etc/systemd/system/multi-user.target.wants/influxdb.service to /usr/lib/systemd/system/influxdb.service.
啟動InfluxDB
service influxdb start
Redirecting to /bin/systemctl start influxdb.service
查看InfluxDB狀態(tài)
ps -ef | grep influxdb
influxdb 8475 1 2 03:01 ? 00:00:00 /usr/bin/influxd -config /etc/influxdb/influxdb.conf
root 8486 7007 0 03:02 pts/0 00:00:00 grep –color=auto influxdb
service influxdb status
Redirecting to /bin/systemctl status influxdb.service
● influxdb.service – InfluxDB is an open-source, distributed, time series database
Loaded: loaded (/usr/lib/systemd/system/influxdb.service; enabled; vendor preset: disabled)
Active: active (running) since Wed 2019-04-10 03:01:48 EDT; 22s ago
Docs: https://docs.influxdata.com/influxdb/
Main PID: 8475 (influxd)
CGroup: /system.slice/influxdb.service
└─8475 /usr/bin/influxd -config /etc/influxdb/influxdb.conf
Apr 10 03:01:48 node1 influxd[8475]: ts=2019-04-10T07:01:48.375804Z lvl=info msg=”Starting precreation service” log_id=0EiWgWRl000 service=shard-precreation check_interval=10m advance_period=30m
Apr 10 03:01:48 node1 influxd[8475]: ts=2019-04-10T07:01:48.375810Z lvl=info msg=”Starting snapshot service” log_id=0EiWgWRl000 service=snapshot
Apr 10 03:01:48 node1 influxd[8475]: ts=2019-04-10T07:01:48.375816Z lvl=info msg=”Starting continuous query service” log_id=0EiWgWRl000 service=continuous_querier
Apr 10 03:01:48 node1 influxd[8475]: ts=2019-04-10T07:01:48.375826Z lvl=info msg=”Starting HTTP service” log_id=0EiWgWRl000 service=httpd authentication=false
Apr 10 03:01:48 node1 influxd[8475]: ts=2019-04-10T07:01:48.375830Z lvl=info msg=”opened HTTP access log” log_id=0EiWgWRl000 service=httpd path=stderr
Apr 10 03:01:48 node1 influxd[8475]: ts=2019-04-10T07:01:48.375936Z lvl=info msg=”Listening on HTTP” log_id=0EiWgWRl000 service=httpd addr=[::]:8086 https=false
Apr 10 03:01:48 node1 influxd[8475]: ts=2019-04-10T07:01:48.375949Z lvl=info msg=”Starting retention policy enforcement service” log_id=0EiWgWRl000 service=retention check_interval=30m
Apr 10 03:01:48 node1 influxd[8475]: ts=2019-04-10T07:01:48.376138Z lvl=info msg=”Listening for signals” log_id=0EiWgWRl000
Apr 10 03:01:48 node1 influxd[8475]: ts=2019-04-10T07:01:48.376389Z lvl=info msg=”Storing statistics” log_id=0EiWgWRl000 service=monitor db_instance=_internal db_rp=monitor interval=10s
Apr 10 03:01:48 node1 influxd[8475]: ts=2019-04-10T07:01:48.376534Z lvl=info msg=”Sending usage statistics to usage.influxdata.com” log_id=0EiWgWRl000
使用InfluxDB客戶端
influx
Connected to http://localhost:8086 version 1.7.5
InfluxDB shell version: 1.7.5
Enter an InfluxQL query
>
創(chuàng)建用戶和數(shù)據(jù)庫
> CREATE USER “admin” WITH PASSWORD ‘admin’ WITH ALL PRIVILEGES
> create database “jmxDB”
創(chuàng)建完成InfluxDB的用戶和數(shù)據(jù)庫暫時就夠用了,其它簡單操作如下,后面會用到
#創(chuàng)建數(shù)據(jù)庫
create database “db_name”
#顯示所有的數(shù)據(jù)庫
show databases
#刪除數(shù)據(jù)庫
drop database “db_name”
#使用數(shù)據(jù)庫
use db_name
#顯示該數(shù)據(jù)庫中所有的表
show measurements
#創(chuàng)建表,直接在插入數(shù)據(jù)的時候指定表名
insert test,host=127.0.0.1,monitor_name=test count=1
#刪除表
drop measurement “measurement_name”
#退出
quit
安裝jmxtrans
jmxtrans的作用是自動去jvm中獲取所有jmx格式數(shù)據(jù),并按照某種格式(json文件配置格式)輸出到其他應(yīng)用程序(本例中的influxDB)。
下載jmxtrans rpm安裝包
wget http://central.maven.org/maven2/org/jmxtrans/jmxtrans/270/jmxtrans-270.rpm
–2019-04-10 03:18:14– http://central.maven.org/maven2/org/jmxtrans/jmxtrans/270/jmxtrans-270.rpm
Resolving central.maven.org (central.maven.org)… 151.101.40.209
Connecting to central.maven.org (central.maven.org)|151.101.40.209|:80… connected.
HTTP request sent, awaiting response… 200 OK
Length: 18750744 (18M) [application/x-rpm]
Saving to: ‘jmxtrans-270.rpm’
100%[================================================================================================================================================================================>] 18,750,744 342KB/s in 43s
2019-04-10 03:18:59 (422 KB/s) – ‘jmxtrans-270.rpm’ saved [18750744/18750744]
安裝rpm包
rpm -ivh jmxtrans-270.rpm
Preparing… ################################# [100%]
Updating / installing…
1:jmxtrans-270-1 ################################# [100%]
jmxtrans相關(guān)路徑
jmxtrans安裝目錄:/usr/share/jmxtrans
json文件默認(rèn)目錄:/var/lib/jmxtrans/
日志路徑:/var/log/jmxtrans/jmxtrans.log
配置json,jmxtrans的github上有一段示例配置
{
“servers” : [ {
“port” : “1099”,
“host” : “w2”,
“queries” : [ {
“obj” : “java.lang:type=Memory”,
“attr” : [ “HeapMemoryUsage”, “NonHeapMemoryUsage” ],
“resultAlias”:”jvmMemory”,
“outputWriters” : [ {
“@class” : “com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory”,
“url” : “http://127.0.0.1:8086/”,
“username” : “admin”,
“password” : “admin”,
“database” : “jmxDB”,
“tags” : {“application” : “kafka”}
} ]
} ]
} ]
}
host:監(jiān)控服務(wù)器
port:jmx端口
obj:對應(yīng)jmx的ObjectName,就是我們要監(jiān)控的指標(biāo)
attr:對應(yīng)ObjectName的屬性,可以理解為我們要監(jiān)控的指標(biāo)的值
resultAlias:對應(yīng)metric 的名稱,在InfluxDB里面就是MEASUREMENTS名
tags:對應(yīng)InfluxDB的tag功能,對與存儲在同一個MEASUREMENTS里面的不同監(jiān)控指標(biāo)可以做區(qū)分,我們在用Grafana繪圖的時候會用到,建議對每個監(jiān)控指標(biāo)都打上tags
啟動jmxtrans
service jmxtrans start
Starting JmxTrans…
查看日志沒有報錯即為成功
tail /var/log/jmxtrans/jmxtrans.log
INFO | jvm 1 | 2019/04/10 04:44:31 | Using thread pool ‘org.quartz.simpl.SimpleThreadPool’ – with 10 threads.
INFO | jvm 1 | 2019/04/10 04:44:31 | Using job-store ‘org.quartz.simpl.RAMJobStore’ – which does not support persistence. and is not clustered.
INFO | jvm 1 | 2019/04/10 04:44:31 |
INFO | jvm 1 | 2019/04/10 04:44:31 | 2019-04-10 04:44:31 [WrapperSimpleAppMain] INFO org.quartz.impl.StdSchedulerFactory – Quartz scheduler ‘ServerScheduler’ initialized from an externally opened InputStream.
INFO | jvm 1 | 2019/04/10 04:44:31 | 2019-04-10 04:44:31 [WrapperSimpleAppMain] INFO org.quartz.impl.StdSchedulerFactory – Quartz scheduler version: 1.8.6
INFO | jvm 1 | 2019/04/10 04:44:31 | 2019-04-10 04:44:31 [WrapperSimpleAppMain] INFO org.quartz.core.QuartzScheduler – JobFactory set to: com.googlecode.jmxtrans.guice.GuiceJobFactory@23822296
2019-04-10 04:44:31 [WrapperSimpleAppMain] level com.googlecode.jmxtrans.JmxTransformer [JmxTransformer.java:177] – Starting Jmxtrans on : /var/lib/jmxtrans
2019-04-10 04:44:31 [WrapperSimpleAppMain] level org.quartz.core.QuartzScheduler [QuartzScheduler.java:519] – Scheduler ServerScheduler_$_node11554885871753 started.
INFO | jvm 1 | 2019/04/10 04:44:31 | 2019-04-10 04:44:31 [WrapperSimpleAppMain] INFO c.googlecode.jmxtrans.JmxTransformer – Starting Jmxtrans on : /var/lib/jmxtrans
INFO | jvm 1 | 2019/04/10 04:44:31 | 2019-04-10 04:44:31 [WrapperSimpleAppMain] INFO org.quartz.core.QuartzScheduler – Scheduler ServerScheduler_$_node11554885871753 started.
附上兩段通用的json文件
base_127.0.0.1.json
View Code
topicA_1.json
View Code
安裝Grafana
Grafana是一個跨平臺的開源的度量分析和可視化工具,可以通過將采集的數(shù)據(jù)查詢?nèi)缓罂梢暬恼故?,并及時通知。
下載jmxtrans rpm安裝包
wget https://s3-us-west-2.amazonaws.com/grafana-releases/release/grafana-6.0.2-1.x86_64.rpm
–2019-04-10 04:53:15– https://s3-us-west-2.amazonaws.com/grafana-releases/release/grafana-6.0.2-1.x86_64.rpm
Resolving s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)… 52.218.144.92
Connecting to s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)|52.218.144.92|:443… connected.
HTTP request sent, awaiting response… 200 OK
Length: 56002012 (53M) [application/x-RedHat-package-manager]
Saving to: ‘grafana-6.0.2-1.x86_64.rpm’
100%[================================================================================================================================================================================>] 56,002,012 177KB/s in 2m 52s
2019-04-10 04:56:08 (318 KB/s) – ‘grafana-6.0.2-1.x86_64.rpm’ saved [56002012/56002012]
安裝rpm包
rpm -ivh grafana-6.0.2-1.x86_64.rpm
warning: grafana-6.0.2-1.x86_64.rpm: Header V4 RSA/SHA1 Signature, key ID 24098cb6: NOKEY
error: Failed dependencies:
fontconfig is needed by grafana-6.0.2-1.x86_64
urw-fonts is needed by grafana-6.0.2-1.x86_64
缺少依賴,下載依賴
yum install –downloadonly –downloaddir=./ fontconfig
yum localinstall fontconfig-2.13.0-4.3.el7.x86_64.rpm
yum install –downloadonly –downloaddir=./ urw-fonts
yum localinstall urw-fonts-2.4-16.el7.noarch.rpm
rpm -ivh grafana-6.0.2-1.x86_64.rpm
warning: grafana-6.0.2-1.x86_64.rpm: Header V4 RSA/SHA1 Signature, key ID 24098cb6: NOKEY
Preparing… ################################# [100%]
Updating / installing…
1:grafana-6.0.2-1 ################################# [100%]
### NOT starting on installation, please execute the following statements to configure grafana to start automatically using systemd
sudo /bin/systemctl daemon-reload
sudo /bin/systemctl enable grafana-server.service
### You can start grafana-server by executing
sudo /bin/systemctl start grafana-server.service
POSTTRANS: Running script
啟動Grafana
service grafana-server start
Starting grafana-server (via systemctl): [ OK ]
打開瀏覽器
http://127.0.0.1:3000
先輸入默認(rèn)用戶名密碼admin/admin
設(shè)置新密碼
點擊Add data source
選擇InfluxDB
輸入連接信息后點擊Save & Test
通過后點擊Back返回
左側(cè) + 可以創(chuàng)建或引入儀表盤
類似于數(shù)據(jù)庫SQL語句,查詢相應(yīng)的指標(biāo)
計算平均每秒數(shù)值可以使用如上語法,用當(dāng)前值減1分鐘之前的值再除以60
具體展示效果就看各位的審美能力,這里就不貼出來了。至此,Kafka的JMX指標(biāo)監(jiān)控就完成了。