Posterous theme by Cory Watilo

Using Ruby FFI

This week, I had to work with a core C library and there was no reason for me to port it to Ruby. I came across the wonderful FFI gem/library. With FFI, calling external libraries is very easy. I think one of the real powers of FFI is to leverage native C libraries. However, it is also possible to port custom libraries or projects. There is a little bit of work to write the wrappers, which can be slightly tedius depending upon the size of the library/number of functions that we want to use, but still worth the time. There are few good documentation resources on the wiki page that are helpful - ffi examples was especially useful to get started fast.

These are the steps -

1. Create a shared library object for the C code that you want to port. On a unix based system the following commands will help. You might have to link additional libraries based on the code dependencies.

gcc -fPIC -c mylibrary.c
gcc -shared -o mylibrary.so mylibrary.o

2. Create a ruby wrapper based on your h file.

3. Start using your function calls. 

I have come up with a few use cases that covers some of the more used function call types. Let us see some test functions. For the lack of better name, mylibrary.h is defined as -

I have created four test functions, which show different argument flavors. Using the above commands, I have compiled my shared library object into say, a file called mylibrary.so. 

The ruby wrapper follows the structure outlined below -

module MyLibrary
  extend FFI::Library
  attach_function ...
end

In simple terms, attach_function call is used to redefine the function to enable its use in Ruby land. For the above code, my wrapper file looks as below -

The attach_function follows the pattern of name, input arg types, output arg type. It seems straight forward. (char *) resolves to string, (double *) is a pointer, (int) is an int...etc. You can find a list of types from the git repo of FFI. (https://github.com/ffi/ffi/blob/master/lib/ffi/types.rb)

Let us look at the sample code for calling these functions in both C and Ruby.

In C, using these functions are straight forward. In Ruby, most of them are straight forward except the last function test_function_4. This function takes an array and assigns the first few elements to some values. In Ruby, this array has to be declared FFI::MemoryPointer, which takes three arguments -

dd = FFI::MemoryPointer.new(:double,4, true)

I can pass this object to my function that expected the array. But, one thing that stumped me was that I was trying to access the values using get_double method (eg. dd.get_double(0), dd.get_double(1)...) it was not right. I found out that the right way to access this is using the get_array_of_double method.

The entire code can be downloaded from this git repo.

Binary data over HTTP in ruby

Today, I had to read a binary file and transmit it over HTTP post. Here is what I did, which worked perfectly on my localhost (?) but had wierdness all over it on Heroku/Production.

contents = open("#{Rails.root}/tmp/#{fname}.data", "rb") {|io| io.read }
HTTParty.post("#{params["url"]}", {:body => contents, :headers => {'Content-Type' => 'avro/binary',  'Authorization' => params["token"]}})

On the other end, I was dumping this binary data into a file for post processing. In development, this worked fine. However, in production, I was getting a wierd error when I tried to read the data from the binary file. The error was something along the lines of -

ArgumentError ("negative length" -

After much digging, I traced it down to an encoding issue because the data was getting modified someway during the POST operation. The right way to do this is to encode the data using Base64.

contents = open("#{Rails.root}/tmp/#{fname}.data", "rb") {|io| Base64.encode64(io.read) }
HTTParty.post("#{params["url"]}", {:body => contents, :headers => {'Content-Type' => 'avro/binary',  'Authorization' => params["token"]}})

This preserves the data. This makes sense. I still am not sure, why it was working on my localhost. 

 

Dabbling with Avro

Avro is a data serialization system that is impressive in terms of the data structures it provides. Avro relies on schemas. An Avro data includes schemas during writing and the same schema is always available when reading or de-serializing this data. This makes it a really cool feature. This means that the serialized data is completely described as it includes its schema. In addition to this, the serialization can be very fast. There are few good examples and discussions on using Avro with Ruby. Check it out.

X.commerce uses Avro for defining message contracts, which makes it possible to describe and validate messages easily. However, the message contracts are defined as an Avro protocol and directly as a schema. Avro supports RPC, where both client and server exchange schemas with a handshaking protocol. But, we don't want to do that. We want to parse the X.commerce contracts as Avro schemas.

I was dabbling with the Avro ruby gem (https://github.com/apache/avro/tree/trunk/lang/ruby) to understand how it operates and how I can directly use that gem to serialize/de-serialize messages. I started with a sample schema -

SCHEMA = <<-JSON
[{ "type": "record",
  "name": "Product",
  "fields" : [
    {"name": "id", "type": "string"},
    {"name": "product_url", "type": "string"},
    {"name": "product_purchased", "type": "boolean", "default": "false"}
  ]},
  { "type": "record",
    "name": "Review",
    "fields" : [
      {"name": "id", "type": "string"},
      {"name": "review_url", "type": "string"},
      {"name": "review_verified", "type": "boolean", "default": "false"}
    ]}
JSON

To serialize this schema, we can do the following -

file = File.open('data.avr', 'wb')
schema = Avro::Schema.parse(SCHEMA)
writer = Avro::IO::DatumWriter.new(schema)
encoder = Avro::IO::BinaryEncoder.new(writer)


dw = Avro::DataFile::Writer.new(file, writer, schema)
dw << {"id" => "product123", "product_url" => "http://ebay.com/some_product", "product_purchased" => true}
dw << {"id" => "review123", "review_url" => "http://yelp.com/some_review", "review_verified" => false}
dw.close

If you look at the schema, I have intentionally used "id" as the name for both the records (Product and Review). This is to illustrate what I think is a bad practice. While, the fields are relative to the particular record, it might be better to have a proper identifier as this helps us easily correlate the data during de-serialization, which we might do the following way.

file = File.open('data.avr', 'r')
reader = Avro::IO::DatumReader.new(nil, Avro::Schema.parse(SCHEMA))
dr = Avro::DataFile::Reader.new(file, reader)
dr.each { |record| p record }

The output is -

$output -> 
 {"id"=>"product123", "product_url"=>"http://ebay.com/some_product", "product_purchased"=>true}
{"id"=>"review123", "review_url"=>"http://yelp.com/some_review", "review_verified"=>false}

If you try to serialize a data using improper schema, it gets flagged immediately. This is the error dump. A better way to display is to catch it and throw an error. Again, this is more for ilustrative purposes -

Avro::IO::AvroTypeError (The datum {"id"=>"foo", "value"=>"1001010"} is not an example of schema'

With X.commerce contracts, read them as schemas and not as RPC protocols. I will post a rails working example of a message console, that sends/receives an Avro message using the above approach.