Nuclio on Kubernates (Part 1)

Nuclio on Kubernates (Part 1)

Nuclio is an open source project developed by Iguazio, an Israeli platform providing continuous, real-time, and high-speed data processing framework. Its teams believe that current Serverless services can provide the most fundamental FaaS: using Function to integrate with other services, in order to provide services or applications after integration.

The design of Serverless Function is single threaded through event-driven triggering. It is only operated when event invoked function.

This means each and every function can be triggered by different event, and connected to the required services to effectively provide services or applications after integration. Other than being called Function as a Service (FaaS), Serverless architecture is also called Back-end as a Service (BaaS) by many articles or blog posts. Yaron Haviv, CEO of Iguazio, believes that current serverless solutions cannot completely meet Iguazio’s requirements because serverless still has the following problems that need to be solved:

Low performance

The architecture of Serverless greatly depends on information communication between functions and services so that there will be problems associated with IO required by different machines or services. However, there are two reasons that result in increased Round-Trip delay Time (RTT) problem:

  1. Each function is single-threaded. Even though it facilitates developers’ debugging work and decrease the multi-threading trouble, it means that each function cannot utilize concurrency to deal with problems related to data access. That is to say, if that specific function’s required data is not yet processed, that function will not be executed continuously on a real-time basis.
  2. Each function is Stateless. Therefore, many stateful functions to be applied must connect to other services to obtain state. Frequently invoking function will increase cost in communication, data replication, and context switch.

In order to solve the above two problems, Iquazio uses Go language to come up with a non-blocking data access method to solve the first one. It allows function to obtain required access data in real time. To solve the second problem, Iquazio applies shared memory to achieve zero-copy data movement, reducing number of context switch. In addition, in order to cope with heavy traffic of function requirements, we can use Nuclio to scale out function container to redirect incoming data to each function container. Through the above methods, Nulico provides single process with 400,000 function requests per second, which is 10~100 times faster than major serverless solutions.

Limited platform, event, and data resources

Currently there are many serverless services, but different FaaS provided by Serverless has different support from event and data resources, such as HTTP gateway, Kinesis, and SQS. Each event source will generate different event structure, as shown in Fig. 1. This represents that if Event Triggering of different platforms are not universal, then developers will be limited to that particular platform. For example, function(s) of AWS Lambda can only use Kinesis service provided by AWS to trigger.

As a result, Nuclio proposes an idea of Common Event Structure as shown in Fig. 2. It classifies and integrates all kinds of protocols and event sources, and it provides multiple options for serialization. Nuclio’s Common Event Source Approach was proposed as a specification at CNCF, expecting that other Serverless platforms can follow the spec to simplify the complexity and platform limits brought by the various event sources, which enables functions to be quickly deployed and connected to different event sources without the need to modify, and greatly improves the function portability. As the first proposer of this idea, Nuclio allows the same function code, without modification, to use the way similar to plugins to easily switch event sources, such as HTTP, Kinesis, Kafka, RabbitMQ, MQTT, NATS, Iquazio’s V3IO, File Content, or Event emulator, to trigger function(s) by means of rapidly modifying Event Consumer API.

Complex function app state maintenance, code dependencies, and service dependencies.

As mentioned in 1.b above, each function is stateless, therefore state or content acquisition, result saving, and message transmission need to depend on external data services. To use these data services, such as MySQL, developers need to write or specify a fixed database connection, code related to identity authentication, or some environmental parameters within the scope of functions. As mentioned above regarding the event sources, these various data sources have the same problem: when data types or repositories of function changes, that function also lose portability, because same code will become useless.

To simplify the complexity of data sources and to resolve the issue in relation to code and service dependencies, Nuclio’s teams adopt data-binding rules promoted by Azure: providing pluggable data binding method, using required code for data binding to provide universal API calls, and putting the details of how to bind and verify data from the data resource linked by API to the scope outside of the function. In other words, it changes the processing role of data binding from developers to operators, and it allows developers to focus on function writing, and to let the functions be portable. This method provides the following features:

  1. Simplicity: Through the Simple API, it allows code to not focus on data-binding details and SDKs.
  2. Security: Private information such as Identity verification of data binding will not appear again in the variables or within the scope of function.
  3. Portability: abstraction of data binding enables function to change data source with ease.
  4. Reusability: Same code that links to different data sources can provide different applications.
  5. Performance: Through Nuclio Components linking, data binding can provide non-blocking IP and zero-copy access

Function cannot be developed, debugged, tested, and deployed under hybrid or multi-cloud environment.

This can be divided into two parts. The first part discusses how a single serverless provider provides developing, debugging, testing, and deploying, while the second part describes how to let functions operate with different providers or under the cloud environment.

Firstly, it is not easy to use serverless functions to debug. Take AWS Lambda as an example, it does not support using breakpoint to debug, therefore developers need to use log information to debug. Moreover, serverless functions almost use container package to wait for the event triggering, thus the operation of each function essentially is deploying container. Since each time deploying a function is deploying a container, it means that if regression testing of that function cannot be automated, then redeploying new containers and deleting old function containers may occur each time the modified code of the same version function needs to be tested. Additionally, how to provide log parsing is also an important issue, since it will generate a lot of log information in the CI/CD process. Nuclio provides a few solutions for this as follows:

  1. Provides many non-intrusive structured and unstructured logging methods, such as Screen, HTTP, file, log stream, with dynamically controlled verbosity level. This allows developers to generates logs from different logs or debug points, and to directly modify verbosity level of log to conduct log parsing and obtain more meaningful log information. This also not only enables developers to analyze bugs and failures more effectively without the need to modify functions, but also provides developers a better function automation test and analysis.
  2. Provides a variety of runtime SDKs, such as Go, Java, and .Net. In the near future it may support more programming languages. Developers simply need to import SDK within the scope of a function, and then they are able to set breakpoints and complete with the auto completion feature. That particular runtime SDK provides developers an environment that is similar to local IDE, and simulates developers’ functions to be triggered by event and connected by data sources to acquire input and output of simulation test, and to display log on screen or to save log to files.
  3. Provides event simulations such as Files, HTTP, and Streams, for developers to conduct regression testing. Nuclio enables developers to observe the behavior pattern of functions without modifying function code.
  4. Plans for more debugging and instrumentation supports in the future, including integration with Statistics Debugging and OpenTracing.

Secondly, developers may require to operate functions in the cloud. In addition to the consideration of cost and platform limits, similar functions may require to be run on different devices or edge servers for IoT or edge computing. As for the platform limits, it may rely on the cooperation with different serverless service providers, as mentioned in point 2) and point 3) above, to let function equip with portability for different devices. As for single platform for IoT or edge computing, it implies that Nuclio enables functions to have version number, and to support distributed and automated deployment to conduct further rolling or canary deployment.

To be continue

by 呂威廷 迎棧科技工程師


Select list(s)*