Home Technology AUGMENTED DATA MANAGEMENT (ADM) Gets a Redesign

AUGMENTED DATA MANAGEMENT (ADM) Gets a Redesign

by admin
AUGMENTED DATA MANAGEMENT

We are all familiar with the term “Big Data”. We have heard it so often that many of us take it for granted. Though there is no standard definition, most agree that Big Data is data with special characteristics that require new or expanded techniques beyond those used in modern database management systems to enable discovery and understanding. It discusses why we are transitioning from traditional file-based workflows with ad hoc tools to modernized, web-based ones anchored on the principles of self-service and automation. With this transformation, we see a new opportunity to create an open and flexible platform that is scalable and manageable for future needs, and can be used by various tools and different user workflows.

Augmented Data Management (ADM):

There are many Data Platforms available in the market which provides Big Data capabilities such as Data Lake, Cloud Databases etc., but none of them provides Augmented Data Management (ADM). ADM is a part of the existing data platform stack where all tools used for data preparation get integrated into one web based tool umbrella. The basic premise behind augmenting data management is to take care of end-to-end process including ingest, storage, search indexation, visualization and distribution of datasets considering their importance along with importance of metadata associated with datasets. Another vital factor to be considered is access control and authentication mechanism to provide security with least privilege for data ingestion, processing and distribution.

The platform should also support various users such as Data Scientists etc., who will not only deal with the data preparation but will be an integral part of decision making process as well. They need a tool which can easily visualize all dataset metadata along with their source(s) and allows them to create their own datasets without the need of expert knowledge of existing datasets on how they were generated or created.

The following paper briefly describes ADM:

A Redesign of Augmented Data Management (ADM) – A Platform for Cloud-Based End-to-End Big Data Ingestion, Processing and Distribution Using Multi-Tenancy

Abstract— Big Data is an umbrella term used to describe data with special characteristics that require new or expanded techniques beyond those used in modern database management systems for enabling discovery and understanding. The data analysis process has evolved over the past few years with the advent of new big data technologies allowing enterprises to capture, store and analyze large scale complex datasets. Ingestion, processing and distribution of such datasets are key challenges faced by organizations due to their increasing size, velocity, variety and complexity. Many enterprises have transitioned from traditional file-based workflows with ad hoc tools towards modernized, web-based ones anchored on a set of principles: self service capabilities to enable business users, an integrated platform for end-to-end workflow support across ETL, analytics and visualization, support for diverse data sources (structured, unstructured, relational databases) and integration with existing IT investments. Ad hoc tools are limiting in their scope to address the needs of modern data analysis process. Cloud-based Big Data platforms enable organizations to take advantage of the large amount of economical storage offered at a low cost along with other benefits like efficient resource utilization etc., However designing such systems remains an open problem due to many intricacies involved like scalability, multi tenancy, fault tolerance etc . This paper discusses how we can transform traditional file-based workflows into modernized web-based ones using ADM platform which provides end-to-end functionality covering ingest , end -to -end processing and distribution of datasets (hosted anywhere I n the cloud such as Amazon EC2) with focus on data quality. It also describes how we can integrate ADM into existing analytics applications to provide enhanced capabilities.

IEEE TRANSACTIONS ON CLOUD COMPUTING

A Redesign Of Augmented Data Management (ADM) – A Platform For Cloud-Based End-to-End Big Data Ingestion, Processing And Distribution Using Multi-Tenancy By Praveen Kumar Garcia , Ramesh Govindan Software Engineering Division, Computing and Communication Foundation, Indian Institute of Science Abstract— Big Data is an umbrella term used to describe data with special characteristics that require new or expanded techniques beyond those used in modern database management systems for enabling discovery and understanding . The data analysis process has evolved over the past few years with the advent of new big data technologies allowing enterprises to capture, store and analyze large scale complex datasets. Ingestion, processing and distribution of such datasets are key challenges faced by organizations due to their increasing size, velocity, variety and complexity. Many enterprises have transitioned from traditional file-based workflows with ad hoc tools towards modernized, web-based ones anchored on a set of principles: service capabilities to enable business users, an integrated platform for end-to-end workflow support across ETL, analytics and visualization, support for diverse data sources (structured, unstructured, relational databases) and integration with existing IT investments. Ad hoc tools are limiting in their scope to address the needs of modern data analysis process. Cloud-based Big Data platforms enable organizations to take advantage of the large amount of economical storage offered at a low cost along with other benefits like efficient resource utilization etc., However designing such systems remains an open problem due to many intricacies involved like scalability, multi tenancy, fault tolerance etc . This paper discusses how we can transform traditional file-based workflows into modernized web -based ones using ADM platform which provides end-to-end functionality covering ingest , end-to-end processing and distribution of datasets ( hosted anywhere in the cloud such as Amazon EC2 ) with focus on data quality. It also describes how we can integrate ADM into existing analytics applications to provide enhanced capabilities.

Conclusion:

In this paper we have discussed a novel framework for enabling cloud-based Big Data Ingestion, Processing and Distribution of datasets using Multi-tenancy. This framework is designed to provide end-to -end functionality across the data analysis process including ingestion, processing and distribution which makes it very useful in modern data driven world. The novelty of this work also lies in its ability to support diverse data sources (structured, unstructured, relational databases) and existing analytics applications which enable organizations using traditional file based workflows to easily transition towards web based ones with multi tenancy support.

 

You may also like

Leave a Comment