Enggineering Technology

DATA-DRIVEN SERVERLESS FUNCTIONS FOR OBJECT STORAGE

By

k.narendra Kumar, Associate professor in CSE, Chalapathi Institute of engineering and technology, lam, Guntur, ap, narendrakumar.k@gmail.com.

DESIGN OF ZION

Zion has been designed for scalable, data-driven execution of small functions in object stores. Zion’s design assumes that the underlying object store follows the “classical” architecture of load balancers distributing the workload evenly across the gateways or proxies, and a large pool of storage nodes.Zion lays out a disaggregated computing layer between the storage and gateway nodes for executing the functions.

Zion also integrates a metadata service and an interception software running in the storage gateways, which inspects incoming requests and reroutes them to the compute tier if necessary.

INTERCEPTION SOFTWARE AND METADATA SERVICE

Integrated in the storage gateway Aim of this software is to manage the deployment of functions, the association of triggers to these functions and their execution when a request matches a trigger.A trigger is a combination of a URL, with prefix and suffix filters, similar to AWS Lambda for Amazon S3, and a  HTTP method (GET, PUT, POST, and DELETE).

This interception mechanism is enough for many use cases. By specifying the suffix .txt as a filter, for instance, Zion can run a compression function to all GET requests for text objects. The list of triggers is the following:

onPut, onPost, and onDelete, which cause the execution of the associated function whenever a PUT, POST or DELETE request is received, respectively.

As an example, the onPut trigger can be used to process an object before its write to the object store, and even discard its storage, as processing is part of the write path, and not asynchronous as in AWS Lambda.onBeforeGet, a function associated to this trigger is ran when a user performs a GET request to the storage service. The function is executed before forwarding the request to the storage node, and hence, the function cannot process the targeted data object. However, this trigger can be useful in many use cases like HTTP headers processing, URL rewriting, temporarily redirects, etc. onAfterGet, which causes any function associated to this trigger to run on an incoming GET request to the storage service. The function intercepts the storage node’s response, and therefore, it can dynamically manipulate the object’s content. Metadata for triggers is pre-processed and indexed efficiently in the metadata service to guarantee a small, O(1) request matching overhead, of the order of μsecs. If there is any match, this layer is also responsible for redirecting the input flow as the object is read to an available worker. Non-intercepted data flows rapidly follow the default storage path and bypass the serverless compute layer as usual.

Serverless execution model can quickly spawn new workers is what makes it possible to intercept and process data flows “on-the-fly” without collocation. Computation Layer The computation layer is a pool of containers which puts the functions into execution. A function is the computation code unit which can process the data. Functions are data-driven, i.e they are focused to intercept the data flow, and process the data inline, as the object comes-in or comes-out from the storage cluster.Because of this, in our model, the response time from the functions (time to first byte) must be fast to not affect the user experience.

In addition to processing the data stream, functions can store information in a persistent way (e.g., an access counter, the timestamp of the last access, the name of the user who is  accessing, etc.) Concretely, a function can take one or all of the following actions after intercepting a lifecycle request of an object:

  • It can update the object metadata (e.g., an access counter);
  • It can generate new requests to the object storage service. This includes: GET an object, for example, a dependency needed to process the mainstream. PUT an object, for example, PUT a subset of the main object as it is processed. Delete an object, and POST metadata to another object;
  • It can generate new requests to other services (e.g. Rabbit, MongoDB, etc.), for example, to store some relevant information extracted from the mainstream.
  • It can update the request/response headers, and

It can cancel or rewire the request to another object Functions may make use of third-party library dependencies in order to achieve a specific behaviour. Once developed, functions should be packed with them within a TAR file. Therefore, in conjunction, functions and their dependencies must be lightweight, in such a way to minimize the time needed to transfer the function package from the storage system to the compute node. Once packed, a function is uploaded as a regular object to the object store. An interesting feature of Zion is that it allows the user to set up the CPU and memory requirements, and a timeout value for every function.

The timeout is the amount of time the system waits to receive the first byte of the function’s output. If the function times out, the request is automatically cancelled. This information allows functions to be configured differently in order to better manage certain jobs. This information is not mandatory, and Zion has the last word, assigning default values when necessary.

  • Zion’s functions accept parameters. Parameters can be explicit or implicit.
  • Explicit parameters are provided as headers in the request.

Implicit parameters are default parameters that a user specifies ahead of time in the moment of associating a trigger with a function.

  • Explicit parameters take precedence over implicit ones in the case of the collision.
  • As an example, consider a function to resize an image and the image resolution as a parameter.
  • If no argument was passed in the request, the implicit image resolution would be taken from the function’s metadata, or an error would be thrown accordingly.
  • The same function can have different parameter sets for different triggers.

A final important remark is that two different functions cannot intercept the same data flow in Zion, unless they do it in a pipeline fashion, one after another, which raises no consistency issues

  • In a compute node, functions are run inside isolated environments or containers.
  • Containers: Each function has its own Linux container in order to not interfere with the other cloud functions.
  • A container with a function running inside is called a worker.

A function may have zero, one, or more workers running at the same time, depending on the workload. In the traditional function model, starting a new worker takes around 6 − 7 seconds. One requirement of our model is that functions have to start running as soon as possible, for this reason, we leverage a ready-to-use pool of containers. In any case, our experiments verify that starting a new container takes around 0.9 seconds, which is practical enough for many synchronous and near-real-time applications. After a certain period of time, the idle workers are stopped and the corresponding containers are recycled

 

Please Email us for Updates and corrections,- publish@mydigitalnews.in,
Whatsapp, 888 5555 924

 

Tags

About the author

adminMDN

Add Comment

Click here to post a comment

Topics

Subscribe via Email

Enter your email address to subscribe to this NEWS portal and receive notifications of new posts by email.

Join 542 other subscribers

Any one can Send Us NEWS -Publish@mydigitalnews.in ,santhosh@mydigitalnews.in , 888 5555 924 Whatsapp Dismiss