Friday, 15 April 2016

Releasing Applications as Native OS Packages

Agility when building, testing, packing and deploying Software  is certainly a key quality aspect to pursue. BTW, there is no specific recipe to get there, among several other things that needs to done, avoid manual steps and not reinvent the wheel  by relying on well established solutions on the community are some of them. Going on this direction, there is very interesting project maintained by Netflix that is called Nebula.

Nebula Project is a series of individual Gradle plugins, each focused on providing a very specific functionality in an usual Development Pipeline tasks. Today I'm gonna talk about one of those, nebula-os-package.

The main idea

The main idea behind It is to pack a JVM based application and Its metadata as a native OS package. The plugin is able to generate Deb and RPM packages, which are the most popular package formats on the Linux world.

Using


First of all, add the plugin on your build.gradle file.


Then, add the specify the plugin dependency

Now you need to say how the application is gonna be set on Host, after the package is installed.

Couple of important things that are happening here:
  • Specifying package name and version (lines 2 and 3)
  • Under which directory the package is gonna be placed after installed on the target Host (line 5).
  •  All jars produced by Gradle during the build (the application jar and Its dependencies), are gonna be placed under /opt/packageName/lib on the target Host. (line 8)
  • Same thing  for configuration files under the resource folder (line 18).
  • The scripts generated by Gradle when building a Java application are gonna be used to start It up on the target Host (line 11).
With everything property set, just execute the build command accomplished by the package task specified on the build file.  The Debian package is gonna be placed at projectName/build/distributions

So What?


Someone could be arguing: 
  • Why should I use this? 
  • Is It better than build a  Fat jar with all Its dependencies inside! 
  • Gradle application plugin takes care of the whole Application start up for me generating  useful scripts.
Yes, these are all valid points. Actually, this is the way we've being done so far when releasing applications outside of the J2EE world. But doing like this tasks like: deploy, start/stop, update and removing applications are all on you. Scripts will need to be create to manage all these, so one more thing that Ops and Dev teams will need to care about.
When deploying applications as Native OS packages,  you can leverage a whole set of tools that are already there and none of the scripts mentioned before would be needed. This is a valid point when that affects agility when releasing and maintaining software.

Here  I have a working example in case you wanna try It out.

Cheers,



Friday, 8 April 2016

Google Pub/Sub. An Alternative Tool for Data Ingestion.

Less than one year ago, Google launched officially on the market a new cloud messaging service,  Google Pub/Sub. I've to confess I just looked at It more carefully after reading a very nice Blog post made by Spotify engineering team, describing their experience when testing this service.
As the name suggests, the processing model is based on  the publishing/subscribing, pattern implemented by most of the best brokers available on the market.
The message consumption works in two models:
  • Polling: You can configure your client o poll the topics times to times. 
  • Push: You register an URL that the Google Pub/Sub service is gonna call when messages arrive to the topic (web hook like)

Besides that, here are some characteristics that, in my opinion, makes this tool an interesting solution:

Durability

Messages sent to Google Pub/Sub messing system  wont be lost lost. Google guarantee message retention up to  7 days, which pretty resealable. This is a must in scenarios where back pressure needs to implemented and some other consistency requirements needs to be attended.


Reliability 

The Google Pub/Sub  messaging service is available in almost all regions where their cloud service is working (North America, Europe and Asia). The advantage here is, Multi Region availability and replication. It is all google responsibility keep everything in sync and available for you. The operational costs to make systems like this work on promise are sky high, so rely on a third party service is a very relevant point to consider.


Low Latency and High Throughput

Results shown by the Spotify Engineering team and the Google documentations itself are  very enthusiastic (1M messages/second). That makes It a considerable option for near real time processing systems. We should never trust 100% on benchmarks, ideally we should try It by ourselves, but similar number are on the Google documentation also, so It's relevant.


APIs

Choose your flavor

  • Java
  • Python
  • Objective C
  • GO
  • PHP
  • Javascript
  • REST
Once you pick one, It's resealable to check the performance to be sure It meet your needs. The Spotify team had some issues with the Java API, so they move to REST.

Billing model

It looks pretty fair ($0.40 /million messages on the first 250M). Just be aware that the real cost is gonna consider also:
  • Number or API calls: The process of sent, consume and ack a message are considered 3 separated API calls. The massage size is also taken into account. Google charges by blocks of 64 KB, so if the message contains 128 KB It's gonna take 2 calls.
  • Storage
  • Network
So, the final price may be tricky to calculate. Be aware that It may cost more than you expect It be.

When talking about data ingestion, this is definitively a tool to keep on the radar. The comparison with Apache Kafka is inevitable at this point, but in my opinion they are similar in some aspects but differ in others:

  • They are both production ready for Real time Log Analytics. There are several benchmarks on the Web using both and showing they are robust alternatives. BTW, Apache Kafka is more time on the market, which gives It some advantage.
  • They differ on the processing model. Differently than Kakfa, Google Pub/Sub doesn't let consumers rewind back  on the topic. On this point, Google Pub/Sub works more like a traditional messaging system (without the overheads).  Before being released as a product, Google Pub/Sub was a tool used internally by Google team and they didn't have this requirement so far. BTW, I won't be surprised seeing this feature on the next releases.
In my opinion, the ability to reprocess messages in a topic makes Apache Kafka a more relevant option when implementing Kappa Architecture. BTW, It's always interesting see another option for data ingestion on the market. More use cases would definitely make It more popular.

There is some producer example here. Google lets you use this service for free during 60 days. which is awesome for POCs in case you wanna try It.

Cheers,