December 2019

Thursday, December 26, 2019

Logstash Tutorial : A complete guide for the beginners how to index data from logstash to Elasticsearch and Kibana



Logstash is an open source, server-side data procession pipeline that ingests data from a multitude of sources simultaneously, transforms it and then sends it to your favorite "stash"


Logstash
Logstash


1. Overview of Logstash


Developed by - Elastic NV 


2. What is Logstash ?

Logstash is a light-weight, open-source, server-side data processing pipeline that allows you to collect data from a variety of sources, transform it on the fly and send it to your desire destination. It is most often used as a data pipeline for Elasticsearch and open-source analytics and search engine. Because of its tight integration with Elasticsearch, powerful log processing capabilities and over 200 pre-built open-source plugins that can help you easily index your data. Logstash is the popular choice for loading data into Elasticsearch.

3. Installation of Logstash

For installation of Logstash please visit the below link - 


4. Download Dataset to import via Logstash into Elasticsearch and Kibana

Please download the sample dataset from the below link - 



5. How to run Logstash?

So When you kick off logstash, you do that with the intention of loading some data and the structure of the data example of any dataset, it can have no of columns and data types. So if you want them to configure separately, you need to map the file name inside the input blogs.


Config Settings file

input {
   stdin { } 
}

filters {}

output {
  elasticsearch { host => ["localhost:9200"] }
  stdout { codec => rubydebug }
}

Run Logstash

After downloading logstash - > open command prompt
go to logstash folder

bin/logstash -f simple.conf   //simple.conf is your file name 

Official Website Link for the Configuration 



all the column name inside filters block and the output block you have to say where the file will be indexed, and save the file in .conf extension.

6. Logstash Simple.conf?


input {
file {
path => "/Users/Atique/Desktop/Projects/myblog/Youtube/youtube_logstash/employee.csv"
start_position => "beginning"
sincedb_path => "NUL"
}
}
filter {
csv {
separator => ","
columns => [ "Age","Attrition","BusinessTravel","DailyRate","Department",
"DistanceFromHome","Education","EducationField","EmployeeCount",
"EmployeeNumber","EnvironmentSatisfaction","Gender","HourlyRate",
"JobInvolvement","JobLevel","JobRole","JobSatisfaction","MaritalStatus",
"MonthlyIncome","MonthlyRate","NumCompaniesWorked","Over18",
"OverTime","PercentSalaryHike","PerformanceRating",
"RelationshipSatisfaction","StandardHours","StockOptionLevel",
"TotalWorkingYears","TrainingTimesLastYear","WorkLifeBalance",
"YearsAtCompany","YearsInCurrentRole","YearsSinceLastPromotion","YearsWithCurrManager" ]
}
mutate {convert => ["Age", "integer"] }
mutate {convert => ["DailyRate", "integer"] }
mutate {convert => ["DistanceFromHome", "integer"] }
mutate {convert => ["Education", "integer"] }
mutate {convert => ["EmployeeCount", "integer"] }
mutate {convert => ["EmployeeNumber", "integer"] }
mutate {convert => ["EnvironmentSatisfaction", "integer"] }
mutate {convert => ["HourlyRate", "integer"] }
mutate {convert => ["JobInvolvement", "integer"] }
mutate {convert => ["JobLevel", "integer"] }
mutate {convert => ["JobSatisfaction", "integer"] }
mutate {convert => ["MonthlyIncome", "integer"] }
mutate {convert => ["MonthlyRate", "integer"] }
mutate {convert => ["NumCompaniesWorked", "integer"] }
mutate {convert => ["PercentSalaryHike", "integer"] }
mutate {convert => ["PerformanceRating", "integer"] }
mutate {convert => ["RelationshipSatisfaction", "integer"] }
mutate {convert => ["StandardHours", "integer"] }
mutate {convert => ["StockOptionLevel", "integer"] }
mutate {convert => ["TotalWorkingYears", "integer"] }
mutate {convert => ["TrainingTimesLastYear", "integer"] }
mutate {convert => ["WorkLifeBalance", "integer"] }
mutate {convert => ["YearsAtCompany", "integer"] }
mutate {convert => ["YearsInCurrentRole", "integer"] }
mutate {convert => ["YearsSinceLastPromotion", "integer"] }
mutate {convert => ["YearsWithCurrManager", "integer"] }
}
output {
elasticsearch {
hosts => "localhost:9200"
index => "employee"
document_type => "employee_details"
}
stdout {}
}

Project Link -
git clone https://atique1224@bitbucket.org/atique1224/youtube_logstash_tutorial.git

More details explanation & hands on please watch the below video link - 



Kibana Tutorial : A Complete guide for the beginners



Kibana is an open source data visualisation dashboard for Elasticsearch. It provides visualisation capabilities on top of the content indexed on an Elasticsearch Cluster. User can create bar, line and scatter plots or pie charts and maps on the top of large volumes of data.
Kibana
Kibana


1. Overview of Kibana


Kibana also provides presentation tool, referred to as Canvas, that allows users to create slide decks that pull live data directly from Elasticsearch.

The Combination of Elasticsearch, Kibana and Logstash is known as ELK Stack.

Developed by - Elastic NV 


2. What is Kibana ?

Kibana is a data visualization and management tool for Elasticsearch that provides real-time histograms, line graphs, pie charts, maps diagrams.

3. Installation of Kibana ?

For installation of Kibana please visit the below link - 


For more details and practical view please watch below video - 



4. Indexing data into Kibana ?

While indexing data into Elasticsearch, we need to run Kibana as well so, data will be indexed to Elasticsearch and Kibana.

Software Link -
Download Node JS - https://nodejs.org/en/download/ for MAC/Windows/Linux/UbuntuDownload Code Editor - Visual Studio - https://visualstudio.microsoft.com/

Project Link - 



More details and code explanation's please watch below video


                                    


5. How to perform GET, POST, PUT, DELETE Methods from Kibana ?


Software Link -
Download Node JS - https://nodejs.org/en/download/ for MAC/Windows/Linux/UbuntuDownload Code Editor - Visual Studio - https://visualstudio.microsoft.com/


Projects Link - 

git clone https://atique1224@bitbucket.org/atique1224/youtube_kibana_tutorial.git


More details and practical example please watch below video -





6. Elasticsearch Aggregation and Projection from Kibana ?


Elasticsearch Aggregation - 


# Aggregation with sum query

GET /students/_search

{
  "query": {
        "match_all": {}
  },
  "aggs" : {
        "Casual_Leaves" : { "sum" : { "field" : "leaves.CL" } 
        }
  }
}

# Aggregation with count query get no of repeated value
GET /students/_search
{
  "query": {
        "match_all": {}
  },
  "aggs" : {
      "count": {
            "terms": {
                  "field": "dept.keyword",
                  "size": 100,
                  "order": {
                  "_key": "desc"
                  }
            }
}
  }
}


#get the total count of record
GET /students/_count
{
  
}


Elasticsearch Projection -




#Its a sort of Projects


GET /students/_search
{
    "_source": ["student_id","skills"],
   "query": {
        "match_all": {}
    }
}

More details and explanation please watch below video -



7. Elasticsearch Pagination and Scroll Query from Kibana ?


Elasticsearch pagination - 

   So Say for example you have an huge record is there into your elasticsearch might be more than 10+ millons, If you want to display the entire records to the front end in a single shot, as we know that is not possible. So to prevent this kind of problem Elasticsearch supports 2 solutions first one is pagination.
Image result for pagination

Elasticsearch Scroll - 

   Second Solutions is the Elasticsearch Scroll. Each and every time it will create a scroll id and also you can set time, like that perticular scroll id will expires after what time.


More details and practical example please watch the below video - 



8. How to visualized the Elasticsearch data in Kibana ?

For visualize the Elasticsearch data we need to make sure that data has to be present into kibana. Before visulize the data we need to create an index pattern of the particular indices.
Also we have to make sure data has to be 4c quality of data. What is 4c Quality ?

Correctness - Validate data accuracy through comparison to external reference.

Currency - Deliver new and updated content in a timely manner.

Completeness - Provide the right data attributes and analysis to ensure customers have all of the necessary information to make critical decisions.

Consistency - Standardize identifiers and content across databases and products to be sure customers receive consistent information regardless of product platform.

More details and practical example please visit below video link - 














Elasticsearch Tutorial : A Complete guide for the beginners




Elasticsearch is a highly scalable open-source full-text search and analytics engine. It allows you to store, search and analyse big volume of data quickly and in near real time. It is generally used as the underlying engine/technology that powers applications that have complex  search features and requirements. Elasticsearch provides a distributed system on top of Lucene Standard Analyser for indexing and automatic type guessing and utilizes a JSON based REST API to refer to Lucene features.
Elasticsearch
Elasticsearch

1. Overview of Elasticsearch


Developed By -  Elasticsearch developed by Elastic NV, 8 Feb 2010.

Features - 

  1. Data Storage
  2. Flexibility, Data types, Full Text Search, 
  3. Unstructured, Document Store 
  4. Field and Document level API
  5. Cluster Indices, Data snapshots, Rollup Indices
  6. Elasticsearch SQL & Role based access control e.t.c 

Latest Version - 7.5 released on Nov 2019 


2. SQL vs NOSQL

Before going to discuss about more into Elasticsearch we need to know what is SQL and NOSQL, because I felt it's more important.

SQL - SQL database are scale vertically, means we need to increase the capacity of single server (CPU, RAM) to scale your database.

NOSQL - No SQL database are scale horizontally means we can add more servers to power up your database.

3. Relational vs Non Relational Database

Into more convenient way, here you can see that in below diagram is the example of scale out and scale up. In scale up you can see we need primary key, we need secondary and foreign key. Also we have a joining concept like left outer joining, right outer joining, full joining e.t.c. If the data size getting increase if it's a single server we need to increase our ram and CPU capacity. 


Relational vs. nonrelational databases
Traditional SQL NoSQL
DB
Primary Secondary
Scale up
DB
DB
DBDB
DB DB
Scale out
In scale out database we don't have any concept of joining. Instead of increasing the power of CPU, RAM we can add more servers to power up your database.

4. Scale up vs Scale out Database

Well this the real time example, thought to bring it up here. Though you might be thinking why I am discussing all those things this blog is all about Elasticsearch right ?

Well Elasticsearch is a no sql and scale out database. Before discussing about Elasticsearch first we need to know few basic things.

SQL vs. NoSQL wedding cake

5. What is Elasticsearch ?

Well, Elasticsearch is an open source, Restful, distributed search and analytics engine build on Apache Lucene. Since its release in 2020, Elasticsearch has quickly become the most popular search engine and its commonly used for log analytics, full text search, security intelligence, business analytics and operational intelligence use case. 

6. How does Elasticsearch works ?

Raw data flow into Elasticsearch from a variety of resources including logs, system metrics,
and web applications. Data ingestion is the process by which this raw data is parsed, normalised enriched before it is indexed in Elasticsearch. Once indexed in Elasticsearch, user can run complex queries against their data and use aggregations to retrieve complex
summaries of their data. From Kibana, users can create powerful visualisations of their data, shared dashboards and manage the Elastic Stack. 

7. Real Time Example  - Case 1 ?


In the below diagram let me explain what is happening. This the basic architecture for web application while you are dealing with huge number of data. So here in the frontend part is nothing but web browser, you can see that when user want to search anything from the browser if it is huge number of data sits into your database, then its very difficult to get the proper data and give immediate result to user. For that reason Elasticsearch came into the picture.

8. Real Time Example  - Case 2 ?


In the second scenario, if you have a huge number of data and you are going to deal with it, and bring to your frontend, that case you can bring Elasticsearch into the picture. Not only that if your data is there into Elasticsearch you can visualised your data via Kibana as  pie chart, bar chart, table e.t.c.



There are only 2 real time example I have explained, there are n no of reason & n number of challenges we will faced while we are dealing with big data. Based on that we need to decide when and where we need Elasticsearch and Kibana into the picture.


9. Popular Company's are using ?

There are popular company are using Elasticsearch, Kibana, Logstash, filebeat, there are - 

  1. CISCO
  2. SAP
  3. IBM
  4. CITRIX
  5. FACEBOOK
  6. LINKEDIN
  7. GOOGLE
  8. TWITTER
  9. MICROSOFT
  10. REDHAT
  11. ADOBE
  12. EA SPORTS
  13. BOSCH
  14. HIKE
  15. EBAY
  16. HTC
  17. FLIPKART
  18. AMAZON
  19. ASIANETNEWS
  20. SNAPDEAL
So right now 1 questions comes in mind is it free or not ?

Yes it is free on certain limits and certain services and open sources. 


More details please watch below videos - 


10. Installation of Elasticsearch ?

For installation of Elasticsearch please visit the below link - 


More details please watch below video - 



11. Indexing the Bulk data from Mongo DB to Elasticsearch ?

Software Link -
Download Node JS - https://nodejs.org/en/download/ for MAC/Windows/Linux/UbuntuDownload Code Editor - Visual Studio - https://visualstudio.microsoft.com/

Project Link - 

git clone https://atique1224@bitbucket.org/atique1224
/youtube_elasticsearch_indexing_tutorial.git


More details and code explanation's please watch below video


11. Elasticsearch with Node JS, Elasticsearch Aggregation, Elasticsearch GET, POST, PUT, DELETE Method's ?

Software Link -
Download Node JS - https://nodejs.org/en/download/ for MAC/Windows/Linux/UbuntuDownload Code Editor - Visual Studio - https://visualstudio.microsoft.com/

Project Link - 

git clone https://atique1224@bitbucket.org/atique1224/
youtube_elasticsearch_with_node_js_tutorial.git

More details & practical example please watch below video