Thursday, April 28, 2011

Trying DotCloud for Django/Python deployment

About dotcloud


The first time I heard about DotCloud was about two months ago, in the beginning of 2011. I was very tempted to try it as soon as possible but it seems there were a very lengthy queue of people like me so I had to wait for quite some time. But finally I've got my account activated few days ago.

I was so happy about that so initially I didn't even know what application to try to develop on it, as I wanted to cover as much services as possible that DotCloud offers. After all, I decided to go with a frontend Django application and several workers communicating with each other using messaging as it seems to be a good basis for a lot of things I have to do these days.

Originally I wanted to use RabbitMQ server, but unfortunately it's not working for some reason, which made me take a look at celery with redis backend. I have never used neither celery nor redis so it was even more interesting to learn these things as well.

So I came up to a scheme like these.

Sample Application





* Django Web Application provides a user interface for issuing tasks and obtaining their results
* MySQL is used to keep history of issued tasks
* Celery handles all task-related operations. It passes tasks to Workers for execution using Redis as transport layer and results are stored also in Redis

As for the task, I choose a really simple one: just sending 60 ICMP ECHO_REQUEST packets to determine if a host specified by user up or down. Use case is simple: user opens up a page, specifies a host to ping and then waits for result and it appears to him after some time.

Workers Implementation


I have started with workers implementation. I followed Tutorial on DotCloud web site to create my first Celery worker and it worked alright, but I had step from it several times for the following reasons:

RabbitMQ service problems

When I created a rabbitmq service it for some reason had empty ports list, like that:


$ dotcloud info bublick.rabbitmq
cluster: wolverine
config:
password: password
rabbitmq_management: true
user: root
created_at: 1303573502.963711
name: foo.rabbitmq
namespace: foo
ports: []
state: running
type: rabbitmq
$

I thought it was some glitch, especially considering that this happened not long ago after EC2 outage (and DotCloud runs on top of EC2). So I tried to drop this service and deploy a new one. Alas, everything was same. I joined an IRC channel -- #dotcloud on freenode -- to ask what's wrong with that and it appeared that it's a known problem and is going to be fixed soon.

It's not a big deal though, because Celery supports a lot of brokers in addition to RabbitMQ, so I decided to go with Redis and use it to store results as well. So I just changed celeryconfig.py to point to Redis instead of RabbmitMQ:


Then I implemented actual task which pings a given host:


As you see, it's pretty trivial.

However, I've spotted a problem when I tried to push the code, basically it said that 'celeryd' could not be started.

So I've logged in to VM running celery using the command 'dotcloud ssh foo.celery' (where foo.celery is a name of deployment) and tried executing 'celeryd' by hand, and it failed with 'Permission denied' error. I asked on IRC channel again and it was a known problem also, and it was quickly fixed for my deployment.

After that my worker was fully functional, you can check sources and its celery configuration here: https://github.com/novel/dotcloud-sample-app/tree/master/worker.

Then I moved to web application.

Web Application


As it was mentioned, I wanted to use Django framework for web application. Luckily, there's a tutorial on django deployment on dotcloud as well. I followed this tutorial step by step and sample django application deployed smoothly.

Then I needed to add celery integration. As I deployed code separately I didn't want to import actual task code but call by name. Also, I wanted task status to be fetched only when user actually requests it, so I had to find tasks by id.

The first steps is to configure django application to use celery. It's done by adding django-celery and redis dependencies to requirements.txt and defining celery stuff in settings.py:

.

And finally set of simple views to schedule tasks and report their status:


You can also see a view which provides a list of active workers. Now, if you deploy your web application and the worker thing should work fine. Additionally, you can add extra workers seamlessly. For example, if you named your worker 'foo.celery', you can add more workers exactly the same way, just use e.g. foo.celery2 etc for dotcloud deploy and dotcloud push commands.

Things to keep in mind


* PYTHONPATH contains only top level project directory by default, so one either should add django application to PYTHONPATH or use full path for modules, e.g. pay attention I have my 'ping' application listed in settings.py as 'frontend.ping' for example, otherwise it won't work
* You don't have to add database drivers like mysqldb-python to requirements.txt since they're available by default

DotCloud first impressions


* Documentation is good, fairly complete, I haven't seen an outdated or misleading information
* DotCloud guys are very helpful, all the questions get addressed very quickly on the IRC channel
* Great jobs done with the services. For example, I have never ever used redis and I have no idea how to set it up and configure. All I had to do is to run dotcloud deploy -t redis foo.redis and it's there, very good. The same goes for wsgi deployment -- it's a shame, but I've never did such type of deployment myself, I just use 'manage.py runserver' for local testing... it's good to have a deployment engineer on the team! :) So, it also went seamlessly, I don't even have to know where logs are located, I just do 'dotcloud logs foo.www'.
* Some services are unstable or not working, like RabbitMQ (which is unusable) and Celery (which needs some manual actions from support team to work fine). That doesn't seem to be a serious problem to me though, considering that DotCloud is at beta stage currently, and more over they must have been very busy last few days recovering from EC2 outage.

Things I'm curious about


* Currently, there's no way to install system-level package (like, say, ImageMagick. It way mentioned on IRC that popular packages like that will be eventually included into base images based on users requests. I'm curious how it will be done without letting images spreading out.
* I need to research how scaling is implemented and choosing scaling strategy looks like
* I wonder how upgrade strategy looks like. FAQ says that upgrades are "prudent and thoroughly tested" (c) but everyone knows that things can always break in some totally unexpected ways and what are the procedures for rolling back, sticking to specific version of the components and so on.
* I wasn't able to find if there are any monitoring facilities provided. Also it's not clear what's going to happen if some critical component or VM goes down.

Conclusion


It was fun to experiment with DotCloud and I'm going to spend some more time on it. It differs from services that I worked with before. I've got an impression that its abstraction level is somewhere higher that IaaS but lower that a typical PaaS. On the one hand, you don't have to worry about things you usually have to worry on IaaS, like installation and configuration of services like MySQL and so on. On the other hand, you still have to do a fair amount of deployment-related things, like creating databases, configuring users, manually running syncdb commands and other things you typically don't do on PaaS. Also, you can view almost everything you might need as you have ssh access to every VM, but you cannot change almost anything since it's not root access.

So I'm definitely interested to see how DotCloud evolves and hope to spend some more time using and learning it.

References


Sample Application

Sample application I've used as a sample could be found on github: https://github.com/novel/dotcloud-sample-app

Clone it:

git clone git://github.com/novel/dotcloud-sample-app.git

All you need to run it now is to put credentials for you mysql and celery services and deploy it on DotCloud. Have fun!

Further Reading

* DotCloud FAQ
* DotCloud Django Tutorial
* DotCloud Celery Tutorial
* Celery Documentation
* Django-celery Documentation
* Sample App

Monday, April 25, 2011

A tiny client for lingvo.yandex.ru

I've been using lingvo.yandex.ru translation service for several years now for Russian<->English translation, and was quite happy about it. In my opinion, it's one of the best online translation services and provides better quality translations (esp. for phrases) than e.g. Google Translate. The only thing I've been missing is an API so I, as a console geek, could have a cli tool to craft translations. Not so long they've added complete-as-you-type feature. So... firebug, 10 minutes and script is ready:



I wouldn't typically post a blog entry about 50-lines Python script, but I've figured out that it became my one of most used CLI tool (not counting basic ones, obviously, like cd or ls).

To try it just do:

# easy_install yaslov

and start using it! Like:

$ yaslov gnome.

Thanks to Yandex for a wonderful services and especially for the exposed suggestion script.

PS If you liked that, you might also want to check my similar script for urbandictionary: py-urbandict.

Thursday, April 21, 2011

Overview of GoGrid and Rackspace Load Balancing Services

Overview

Load Balancing is a technology to spread workload between several computers. These days one of the most popular application of this technology is load balancing for Web sites, such as balancing HTTP/HTTPS traffic across several Web servers (such as Apache httpd).

A lot of web deployments are moving to the cloud, so load balancers do. I'll give an overview of Load Balancing services provided by Rackspace and GoGrid.

GoGrid

Load Balancer service is a part of GoGrid's Cloud offering and was there from the version 1.0 of the API, so it's about 2008.

GoGrid uses F5 hardware Load Balancers, as stated in the documentation.

Rackspace

Rackspace's Load Balancer service is a separate stand alone service as opposite to GoGrid's one.

The service is relatively young: private beta was announced in November, 2010 and the final release happened in April, 2011, just few days ago at time of writing. Generally, I have started using it from the first private beta and it became quite stable already in the beginning of 2011.

Rackspace offering is based on Zeus software.

API

Like a theatre begins with hanger, services begins with API. Let's overview what API allows us to do with Load Balancers.

GoGrid

As it was mentioned above, Load Balancer API is just a subset of GoGrid API with all its pros and cons. It provides CRUD (create, read, update, delete) operations support for load balancers.

Actually, it's not very close to REST concept as:

* It doesn't use HTTP any method but GET, so e.g. to add a new balancer you make GET request on URL like 'loadbalancer/add' instead of POST on 'loadbalancer' and to delete it you call GET on 'loadbalancer/delete' instead of DELETE, etc
* It doesn't have a concept of element URIs, only collections. So, to get details on a balancer you request 'loadbalancers/get?id=bal_id' instead of 'loadbalancers/bal_id/'
* It uses GET arguments for passing options instead of serialised object in request body

It's not good or bad that it doesn't follow REST closely, REST is not a standard after all, and actually the fact that one doesn't need to bother with HTTP methods other than GET and serialisation might be beneficial for somebody. I will provide some analysis as API usability from programmers point of view later in the post.

So, what is it all about? You create a balancer and you have to specify IPs and ports of nodes to share the load. Important thing to know that you have to use IP addresses assigned to your GoGrid account, otherwise it will fail with not very descriptive internal error.

Besides specifying IP list, you can tweak some load balancer options, such as type and persistence. Currently GoGrid supports two types of load balancing:

* Round Robin (default)
* Least Connect

And for persistence, the options are:

* No persistence (default)
* SSL Sticky
* By source address

Once a balancer has been created, you can change only list of IPs and ports it balances load for. One caveat is that you have to pass a complete list of IPs, not just incrementally add or remove them one by one, so be careful not to run into race condition.

With GoGrid you can have up to 6 load balancers per account and up to 3 load balancers for a data center.

Rackspace

Rackspace Load Balancer API is, obviously, centered around CRUD operations for load balancer objects as well, though it supports far more than that.

It seems to follow REST quite precisely: uses collection and elements URI, correct HTTP semantics and content types.

When creating a balancer, you can specify not only IPs that belong to your Rackspace account, but basically any IP you want. Actually, I think it makes a lot of sense to provide such a flexibility if you're running a hybrid setup, and, say, have part of the nodes in your data center and part of them running on the cloud.

Additionally, you can tweak quite a lot of options. Let's check details for the most important ones, such as balancing type and persistence. As for balancing type, it provides a wide range of options:

* Round Robin
* Least Connect
* Random
* Weighted Least Connect
* Weighted Round Robin

For the last two options node weight (which you can assign to node) is taken into account. Type has to be specified at creation time, so no default value marked.

As for session persistence, only HTTP cookie persistence is supported.

Programming Specifics

There is a specific thing about building something on top of load balancing API. The thing is that unlike dealing with cloud servers, where you work mostly with atomic objects (like server itself), with load balancers you have a collection of nodes you want to balance between. And here two race conditions possible.

The first one is caused by the fact that load balancer needs to be reconfigured after you add or remove a node. And when you try to modify nodes list during the process of re-configuration, you'll get an error. Moreover, GoGrid doesn't have a special state which says that balancer is being re-configured at the moment. Rackspace has such a state, but race condition is still possible, image the following scenario:

* Balancer B is in 're-configuring' state
* Apps A1 and A2 want to add a new node to B
* Apps A1 and A2 see that B is immutable and start polling its status while it's not 'Ready'
* A1 becomes first
* A2 fails

This is not very serious problem, but it make coding a little more complicated.

The second possible race condition is GoGrid implementation specific because of its model of keeping IP list as a whole, without support of adding/removing individual IPs. Imagine a slightly modified version of the previous scenario:

* Balancer B is in 're-configuring' state
* App A1 wants to add IP I1 to B, so the IP list would be [B.ips] + [I1]
* App A2 wants to add IP I2 to B, so the IP list would be [B.ips] + [I2]
* Apps A1 and A2 tries to make a request until it succeeds
* App A1 becomes first, B.ips becomes [B.old_ips] + [I1]
* App A2 still fails because B turns 're-configuring' again
* App A2 finally succeeds with it's request and updates list of Ips to [B.old_ips] + [I2]
* As a result, I1 that should have been added is actually missing

Again, it's not like this is not solvable problem, but it's quite an effort to solve it.

And finally some numbers. I took Rackspace and GoGrid Load Balancer drivers from libcloud trunk which implement the same common interface and gathered these metrics:

* Driver's lines of code (loc)
* Unit-tests' lines of code (test loc)
* Lines of fixtures (i.e. content of responses)



You know that there are lies, damned lies, and statistics, so it's up to you to analyse these numbers. :-)

Tuesday, April 12, 2011

libcloud load balancers feature status

Background

If you follow libcloud maillist, you probably aware that I've started working on adding load balancers support. If not, please give it a quick view.

Current status

At this point I have implemented almost all features I've planned, namely:


  • Defined a common interface
  • Implemented drivers for Rackspace and GoGrid
  • Covered basic functionality with unit tests
  • Implemented load balancer support in lc-tools and run-time tested it

Currently interface is pretty basic and supports only main operations, like: CREATE/READ/DELETE for balancers and ADD/LIST/REMOVE operations for their nodes. Obviously, there are far more operations on load balancers that could be useful such as balancing algorithm and session handing options, but I need some time to play around with these things to understand how to plug it better.

One more thing I'm considering is adding support for blocking operations. It's quite common situation when you find balancer in immutable state: for example, when you add new node to balancer it needs some time to perform initial configuration and during that period you cannot perform operations on it, such as removing or adding more nodes. My current implementation just throws and exception in such case so user can figure out that object is immutable and try after some time.

However, it might be not very convenient for user to wrap any call to catch such situations, so I'm thinking about adding support for blocking mode, where function will block until operation succeeds or timeout reaches; blocking/non-blocking mode could be specified at driver init time. Most likely it's the way things will be done.

Examples

Meanwhile, take a look at this example to understand how things look like at the moment:

example_lb.py

Also, as I've mentioned above, I've added basic support for balancer manipulation to lc-tools. Will cover this topic a bit later when things settle down a little, but here's a quick overview how it feels like:

Giving it a try

To give it a try checkout balancers branch of my libcloud fork on github:


git clone git://github.com/novel/libcloud.git --branch balancers

and then install it like you usually do. The example file mentioned above would be useful to get started. Additionally, it might be useful to look at lb-* scripts in lc-tools lb branch.

Further reading