AWS - ELK

Posted Jul 18, 2020

By Grace L

16 min read

The ELK stack
AWS Elasticsearch Service

The ELK stack

The ELK stack is an acronym used to describe a stack that comprises of 3 open-source projects: Elasticsearch, Logstash, and Kibana.

Often referred to as Elasticsearch
the ELK stack can:
- aggregate logs from all systems and applications,
- analyze these logs,
- create visualizations for application and infrastructure monitoring,
- faster troubleshooting,
- security analytics, and more.

ELK stack fulfills a need in the log analytics space.

a log management and analytics solution to monitor this infrastructure and process any server logs, application logs, and clickstreams.
a simple yet robust log analysis solution for developers and DevOps engineers to gain valuable insights on failure diagnosis, application performance, and infrastructure monitoring
can choose to deploy and manage the ELK stack yourself.
- scaling up and down to meet business requirements or achieving security and compliance is a challenge with the self-managed option.
Or choose Amazon Elasticsearch Service
- fully managed service
- deploy, secure, and operate Elasticsearch at scale.
- offers support for Elasticsearch APIs, built-in Kibana, and integration with Logstash
- can continue to use existing tools and code
- Amazon Elasticsearch Service also integrates with other AWS services such as Amazon Kinesis Data Firehose, Amazon CloudWatch Logs, and AWS IoT giving you the flexibility to select the data ingestion tool that meets use case requirements.

ELK

Logstash
- Data collection and transportation pipeline. We will use Logstash to read in our syslog files and store them in an Elasticsearch index.
Elasticsearch
- A distributed search and analytics engine designed for scalability. This is what indexes our data and allows us to create usability visualizations with Kibana.
Kibana
- a data visualization platform that is easy to use and nice on the eyes.

The data lifecycle for ELK goes a little something like this:

Syslog Server feeds Logstash
Logstash filters and parses logs and stores them within Elasticsearch
Elasticsearch indexes and makes sense out of all the data
Kibana makes millions of data points consumable by us mere mortals

Elasticsearch `log analytics and search use cases`

release in 2010
most popular search engine
- ideal choice for log analytics, full-text search, security intelligence, business analytics, and operational intelligence use cases.
How does Elasticsearch work?
- send data in JSON to Elasticsearch by the API or ingestion tools
  - such as Logstash and Amazon Kinesis Firehose
- Elasticsearch automatically stores the original document
- and adds a searchable reference to the document in the cluster’s index
- then search and retrieve the document using the Elasticsearch API
- can also use Kibana with Elasticsearch to visualize data and build interactive dashboards.
run Elasticsearch
- on-premises or on Amazon EC2: responsible for installing Elasticsearch and other necessary software, provisioning infrastructure, and managing the cluster.
- or on Amazon Elasticsearch Service: fully managed service, no worry about time-consuming cluster management tasks such as hardware provisioning, software patching, failure recovery, backups, and monitoring.

functions:

fast time-to-value
- free, open source software,
- restful based api, distributed search and analytics engine built on apache lucene.
- a simple http interface,
- uses schema-free json documents
- easy to get started and quickly build applications for a variety of use-cases.
high performance
- process large volumes of data in parallel,
- quickly finding the best matches for queries.
near real-time operations
- elasticsearch operations such as reading or writing data usually take less than a second to complete.
- use elasticsearch for near real-time use cases such as application monitoring and anomaly detection.
easy application development
- elasticsearch provides support for various languages
  - including java, python, php, javascript, node.js, ruby, and many more.
complimentary tooling and plugins
- elasticsearch comes integrated with kibana, visualization and reporting tool.
- also offers integration with beats and logstash, easily transform source data and load it into elasticsearch cluster.
- a number of open-source elasticsearch plugins such as language analyzers and suggesters to add rich functionality to your applications.

Logstash `collect data`

light-weight, open-source data ingestion tool
- 数据采集的，类似于flume。
most often used as a data pipeline for Elasticsearch, analytics and search engine.
powerful log processing capabilities,
- Logstash is a popular choice for loading data into Elasticsearch.

Logstash架构：

Batcher: 负责批量的从queue中取数据;
Queue 分类：
- In Memory:
  - 无法处理进程Crash、机器宕机等情况
  - 会导致数据丢失
- Persistent Queue In Disk：
  - 可处理进程Crash等情况，
  - 保证数据不丢失，保证数据至少消费一次，
  - 充当缓冲区，可以替代kafka等消息队列的作用。

functions:

easily load unstructured data
- 输入：采集各种样式、大小和来源的数据
- easily ingest unstructured data from a variety of data sources
  - including system logs, website logs, and application server logs.
- server-side data processing pipeline
  - collect data from a variety of sources,
  - transform it on the fly,
  - and send it to desired destination.
- 支持各种输入选择，可以在同一时间从众多常用来源捕捉事件。
  - 能够以连续的流式传输方式，轻松地从日志、指标、Web 应用、数据存储以及各种 AWS 服务采集数据。
offers pre-built filters
- 过滤器：实时解析和转换数据
- pre-built filters, readily transform common data types, index them in Elasticsearch, and start querying without having to build custom data transformation pipelines.
- 数据从源传输到存储库的过程中
  - Logstash 过滤器能够解析各个事件，
  - 识别已命名的字段以构建结构，并将它们转换成通用格式，以便更轻松、更快速地分析和实现商业价值
- Logstash 能够动态地转换和解析数据，不受格式或复杂度的影响：
  - 利用 Grok 从非结构化数据中派生出结构
  - 从 IP 地址破译出地理坐标
  - 将 PII(个人验证信息) 数据匿名化，完全排除敏感字段
  - 整体处理不受数据源、格式或架构的影响
flexible plugin architecture
- support over 200 plugins already available on Github
- the plugin for customize the data pipeline, to easily index the data,
- or easily create one yourself.
输出：选择你的存储，导出你的数据
- 尽管 Elasticsearch 是首选输出方向，能够为我们的搜索和分析带来无限可能，但它并非唯一选择。
- Logstash 提供众多输出选择，
  - 可以将数据发送到您要指定的地方，并且能够灵活地解锁众多下游用例。

Kibana `visualization and reporting tool`

free, open-source, data visualization and exploration tool for reviewing logs and events.
easy-to-use, interactive charts, pre-built aggregations and filters, and geospatial support and making it the preferred choice for visualizing data stored in Elasticsearch.
设计出来用于和Elasticsearch一起使用的。
- 你可以用kibana搜索、查看存放在Elasticsearch中的数据。
- Kibana与Elasticsearch的交互方式是各种不同的图表、表格、地图等，直观的展示数据，从而达到高级的数据分析与可视化的目的。
run Kibana
- on-premises, on Amazon EC2,
- or on Amazon Elasticsearch Service: Kibana is deployed automatically with your domain as a fully managed service, automatically taking care of all the heavy-lifting to manage the cluster. simply load your data into an Amazon Elasticsearch Service domain and analyze it using the provided Kibana end-point.

functions:

interactive charts
- kibana offers intuitive charts and reports to interactively navigate through large amounts of log data.
- dynamically drag time windows, zoom in and out of specific data subsets, and drill down on reports to extract actionable insights from data.
- powerful and easy-to-use features
  - such as histograms, line graphs, pie charts, heat maps, and built-in geospatial support.
mapping support
- kibana comes with powerful geospatial capabilities
- seamlessly layer in geographical information on top of data and visualize results on maps.
pre-built aggregations and filters
- can run a variety of analytics like histograms, top-n queries, and trends with just a few clicks.
easily accessible dashboards
- easily set up dashboards and reports and share them with others.
- all you need is a browser to view and explore the data.

AWS Elasticsearch Service

Fully managed, scalable, and secure Elasticsearch service, easy to deploy, secure, and run Elasticsearch cost effectively at scale.
build, monitor, and troubleshoot applications using the tools love, at the scale need.
provides support for
- open source Elasticsearch APIs,
- managed Kibana,
- integration with Logstash and other AWS services,
- and built-in alerting and SQL querying.
pay what use – no upfront costs or usage requirements.

Benefits

Easy to deploy and manage
- deploy Elasticsearch cluster in minutes.
- The service simplifies management tasks such as hardware provisioning, software installation and patching, failure recovery, backups, and monitoring.
- To monitor clusters, Elasticsearch includes built-in event monitoring and alerting so get notified on changes to data to proactively address any issues.
Highly scalable and available
- Elasticsearch can store up to 3 PB of data in a single cluster,
- enable to run large log analytics workloads via a single Kibana interface.
- can easily scale cluster up or down via a single API call or a few clicks in the AWS console.
- highly available using multi-AZ deployments, can replicate data between 3 Availability Zones in the same region.
Highly secure
- For data in Elasticsearch Service, can achieve network isolation with Amazon VPC, encrypt data at-rest and in-transit using keys create and control through AWS KMS, and manage authentication and access control with Amazon Cognito and AWS IAM policies.
- Elasticsearch is also HIPAA eligible, and compliant with PCI DSS, SOC, ISO, and FedRamp standards to help meet industry-specific or regulatory requirements.
Cost-effective
- pay only for the resources consume. on-demand pricing with no upfront costs or long-term commitments, or achieve significant cost savings via Reserved Instance pricing.
- As a fully managed service, Elasticsearch further lowers total cost of operations by eliminating the need for a dedicated team of Elasticsearch experts to monitor and manage clusters.

Use cases

Application monitoring

Store, analyze, and correlate application and infrastructure log data to find and fix issues faster and improve application performance.
receive automated alerts if application is underperforming, enabling to proactively address any issues.

for example An online travel company can use Elasticsearch to analyze logs from its applications to identify and resolve performance bottlenecks or availability issues, ensuring streamlined booking experience.

Security information and event management (SIEM)

Centralize and analyze logs from disparate applications and systems across network for real-time threat detection and incident management.

for example A telecom company can use Elasticsearch with Kibana to quickly index, search, and visualize logs from its routers, applications, and other devices to find and prevent security threats such as data breaches, unauthorized login attempts, DoS attacks, and fraud.

Search

Provide a fast, personalized search experience for applications, websites, and data lake catalogs, allowing users to quickly find relevant data.

For example a real estate business can use Elasticsearch to help its consumers find homes in their desired location, in a certain price range from among millions of real-estate properties. get access to all of Elasticsearch’s search APIs, supporting natural language search, auto-completion, faceted search, and location-aware search.

Infrastructure monitoring

Collect logs and metrics from servers, routers, switches, and virtualized machines
to get a comprehensive visibility into infrastructure,
reducing mean time to detect (MTTD) and resolve (MTTR) issues and lowering system downtime.

for example, A gaming company can use Elasticsearch to monitor and analyze server logs to identify any server performance issues that could lead to application downtime.

features

Easy to deploy and manage

Setup and configuration:
- Getting started with Elasticsearch is easy.
- setup and configure Elasticsearch cluster using the AWS Management Console or a single API call through the AWS CLI.
- can specify the number of instances, instance types, storage options, and modify/delete existing clusters at any time.
In-place upgrades:
- to easily upgrade Elasticsearch clusters to newer versions without any downtime, using in-place version upgrades.
- With in-place upgrades, you no longer need to go through the hassle of taking a manual snapshot, restoring it to a new cluster running the newer version of Elasticsearch, and updating all of endpoint references.
Event monitoring and alerting
- Elasticsearch provides built-in event monitoring and alerting, monitor the data stored in cluster and automatically send notifications based on pre-configured thresholds.
- Built using the Open Distro for Elasticsearch alerting plugin, allows to configure and manage alerts using Kibana interface and the REST API and receive notifications via custom webhooks, Slack, Amazon Simple Notification Service (SNS), and Amazon Chime.
- can view cluster health metrics including number of instances, cluster health, searchable documents, CPU, memory, and disk utilization for data and master nodes through Amazon CloudWatch, at no additional charge.
SQL querying:
- Elasticsearch supports querying of Elasticsearch cluster using the SQL syntax.
- Built using the Open Distro for Elasticsearch SQL plugin
  - provides more than 40 SQL functions, data types, and commands, including direct export to CSV and query translation from SQL to Elasticsearch JSON.
- can also connect to existing SQL-based business intelligence and ETL tools via a JDBC driver.
Integration with open source tools:
- Elasticsearch offers built-in Kibana and integration with Logstash, to ingest and visualize the data using the open source tools.
- can continue to use existing code with direct access to Elasticsearch APIs and plugins such as Kuromoji, Phonetic Analysis, Ingest Processor Attachment, Ingest User Agent Processor, and Mapper Murmur3.
Highly scalable and available
- Scalability: Elastisearch store up to 3 PB data in a single Elasticsearch cluster and scale up/down as needs change.
- You can monitor the state of cluster through Amazon CloudWatch metrics and add or remove instances via a simple API call or a few clicks in the AWS console.
- can also modify SSD-powered Amazon Elastic Block Store (EBS) volumes to accommodate workload requirements.
Availability:
- supports 3 Availability Zones (AZ) deployments,
- to deploy instances across multiple AZs for better availability and failure tolerance.
- can enable 3 AZ deployments for both existing and new clusters at no extra cost using the AWS console, CLI, or SDKs.
- If you enable replicas for indexes, the primary and replica shards will automatically be distributed across nodes providing cross-zone replication.
Durability:
- build data durability for Elasticsearch cluster through automated and manual snapshots.
- use snapshots to recover cluster or to create a new cluster with preloaded data.
- By default, the Elasticsearch will automatically create hourly snapshots of each domain and retain them for 14 days at no extra charge.
- These snapshots are stored in Amazon S3, 99.999999999% (11 9’s) durability.
Highly secure
- securely connect applications to managed Elasticsearch environment from VPC or via the public Internet, configuring network access using VPC security groups or IP-based access policies.
- can also securely authenticate users and control access using Amazon Cognito, AWS Identity and Access Management (IAM), or basic authentication using username and password.
- Elasticsearch leverages the Open Distro for Elasticsearch security plugin to define granular permissions for indices, documents, or fields and to extend Kibana with read-only views and secure multi-tenant support.
- Elasticsearch supports built-in encryption for data at-rest and in-transit so you can protect data both when it is stored in domain or in automated snapshots, and when it is transferred between nodes in domain.
- Elasticsearch is HIPAA eligible and compliant with PCI DSS, SOC, ISO, and FedRamp standards, making it easy for you to build applications that meet compliance requirements.
Cost-effective
- Pay only for what you use
- no upfront fee or usage requirement.
- can reserve instances for for a one- or three-year term to get significant cost savings on usage as compared to on-demand instances.
UltraWarm
- a warm storage tier that complements the existing Elasticsearch hot storage tier by providing less expensive storage for older and less-frequently accessed data while still providing an interactive experience.
- stores data in Amazon S3 while using custom, highly-optimized nodes, purpose-built on the AWS Nitro System, to cache, pre-fetch, and query that data.
- This allows you to:
  - Retain up to 3 PB of data in a single Elasticsearch cluster while reducing cost per GB by nearly 90% compared to existing Elasticsearch storage tiers.
  - Run fast, interactive analytics on both recent (weeks) and historical (months or years) log data without needing to spend hours or days restoring it from the archives.
  - Easily query and visualize across both recent and historical log data via Kibana interface, enabling you to quickly identify and troubleshoot performance issues.
  - When searching and analyzing data, you don’t need to worry about which tier of storage that data is currently in as that is handled automatically.
  - To start using UltraWarm, sign in to the AWS console, create an Elasticsearch cluster, and when selecting nodes, enable UltraWarm.
  - You can select UltraWarm1.medium.elasticsearch or UltraWarm1.large.elasticsearch instances.

ref

01AWS, ELK

AWS

This post is licensed under CC BY 4.0 by the author.

The ELK stack

Elasticsearch log analytics and search use cases

Logstash collect data

Kibana visualization and reporting tool

AWS Elasticsearch Service

Benefits

Use cases

features

Trending Tags

Elasticsearch `log analytics and search use cases`

Logstash `collect data`

Kibana `visualization and reporting tool`