Timbakto

September 16, 2012

Complex Event processing Tools

Hi... Here I am talking about some most usefull event processing tools basically log processing tools from distribute environment or from cloud environment. I have collected it from various sources and listing here with short description. I hope it will be usefull for us in finding best tools for log collection and data collection from multple sources.

1. Flume:

It is an apache project and basically used for efficiently collection, aggregation, and moving large amounts of log data.It has simple architecture and it works on the basis of streaming data flow. It collects data from various sources and delivers it Hadoop's HDFS.
There is three basis component of flume
a) Agent- lives on the source machine from where we need to collect data or log
b) Collector- Agents sinks data to collector and finally it writes it to HDFS.
c) Master-   It keeps all configuration of agents and collectors and manages them.

Please visit wikipedia and http://archive.cloudera.com/cdh/3/flume/UserGuide/ for more information.

2. Scribe:

Scribe is a open source project from Facebook and being used as log aggregation framework. It has simple API and uses.The scribe server running on every node in the system, configured to aggregate messages and send them to a central scribe server or servers in larger groups.

We can get more knowledge about it from here:
https://github.com/facebook/scribe/wiki

3. Kafka;

Being developed by linkedin and bascally used for log collection.

 


Structured Unstructured and Semi Structured data

Structured Data:  Data that resides in fixed fields within a records or file.It  is identifiable because it is organized in a structure. It can be searchable by data type within the content.

Unstructured Data: Data that don't have any fixed fields for records. The record size can get change at any moment.

Big Data technology

The main goal of this post is to list down some tools and technologies being used in Big data Technology. First I am discussing about what is big data and where it will be useful.

What is Big Data? - The data which is very complex and is  in very large volume, size and that cannot be easily managed by traditional data system can be considered as big data.