Archive for the ‘post production’ Category

Nagios /w incinerator

Monday, March 23rd, 2009
Lustre Frameserver from Nagios

Lustre Frameserver from Nagios

We have been looking at setting up an open source monitoring solution at the office for quite some time (I remember having a discussion about Nagios on my first day at work), but looking at the Nagios docs made me think that setting up all those .cfg files was going to really suck, so I looked at alternatives.

Over the christmas holidays I installed Zenoss (http://www.zenoss.com/), mainly because it promised to crawl thru our networks and be a breeze to set up using the webGUI. It did do the crawling as promised, but setting anything further than the basic settings was really, _really_ painful. Sure, it was nice with the couple of windows machines it recognised immediately, but otherwise it was pretty useless.

It took me several months to finally bite the bullet, but last week I got around to finally installing Nagios at the office. After setting up the basic checks for the localhost ( a RHEL server we had reserved for this use) in a couple of hours, I was quickly feeling pretty proficient with all the different .cfg files, and started venturing into the unknown…

RAID-Chassis

Getting information from out ten-odd RAID-subsystems would be quite important. I first thought I had struck gold when I found check_promise_vtrak from http://www.consol.com/apple/nagios-plugins/check-promise-vtrak/ . Set up was a breeze, and I quickly got it working from the CLI. But the plugin refuses to function from Nagios, it only returns (null), which ends as a critical error and an email in my box. Probably a small fix is needed in the plugin itself, because plainly it is not returning anything readable by Nagios.

Mac OS X Servers

Using the basic plugins and nrpe (http://nagios.sourceforge.net/docs/1_0/addons.html) I was able to check all the basic data on our servers. I would like to monitor the actual services themselves (usually AFP and SMB, with some DNS and OD on some machines). As above, I thought check_osx_services (http://www.nagiosexchange.org/cgi-bin/page.cgi?g=Detailed%2F1497.html;d=1) was going to fix all my problems. Again I was foiled at the start: this plugin wouldn’t work properly even from the CLI.

Autodesk Lustre /w Incinerator

Since this is the system that is keeping me the busiest in normal times, I wanted Nagios to help me out here.

Lustre /w Incinerator is a complex system, consisting of a workstation, frameserver, 8 rendering nodes, an ethernet network for commands and an Infiniband network for moving those frames around. With so many moving parts, there are way too many failure points here. Lately we have had a lot of issues with the renderd service (that handles the rendering and contact with the server) crashing  on the nodes by itself. I have simple scripts that allow me to restart the service on all the nodes at once, and it takes just seconds to run. The difficult part is getting the info when the nodes have crashed. Installing the normal checking tools on the nodes was not an option for a couple of reasons. Firstly: I don’t want to have too many extra things installed on the nodes and secondly: the nodes are not actually on any other network that the incinerator network, so there is no access to them from the Nagios server.

I have installed nagios plugins and nrpe on the frameserver. These check the normal things on the server (root disk space, CPU loads, Processes etc.). I also created a specific check to handle Browsed, the process that handles serving the frames to the workstation and the nodes. After some searching I discovered a plugin called check_process_by_ssh (http://www.nagiosexchange.org/cgi-bin/page.cgi?g=Detailed%2F2013.html;d=1)

which allowed me to formulate a suitable nrpe command to execute (from nrpe.cfg on the frameserver):

command[check_node1]=/usr/local/nagios/libexec/check_process_by_ssh -H node1 renderd

I then added the check to my Linux definitions:

define service{
        use generic-service
        host_name frameserver
        service_description Incinerator Node 1
        check_command check_nrpe!check_node1
        }

This worked fine from the CLI, but nagios didn’t get thru and said the status was critical. After some thought I realized the problem: nagios executes all scripts as the user nagios, and I was doing my testing as root. The Nagios user sisntä have the needed SSH authentication settings, so I copied the needed file (id_rsa) to a suitable folder, and modified the command on the frameserver:

command[check_node1]=/usr/local/nagios/libexec/check_process_by_ssh -H node1 -k /usr/local/nagios/keys/id_rsa -u root renderd

Now the checks work without a hitch, and I get an email about the nodes being down before an operator has been wondering what is wrong for an hour…

What I Learned…

Tuesday, January 20th, 2009

…during the last week of last year.

Due to some illnesses in our support staff, I ended doing a big upgrade job short-handed. This is what I learned:

  • When Autodesk support say they have shortened support hours, it also meant that they are quite understaffed, which reflected in both the reply times, and even the quality of some answers
  • The Lustre upgrade scripts couldn’t handle the straight upgrade to 2009SP2 from 2007. Several things got broken on the way, like the Incinerator Manager website, which could no longer start or stop the nodes.
  • The Lustre/Incinerator licensing scheme is quite difficult to understand at times, and you have to make sure your temporary license contains licenses for the nodes as well.
  • It is hard to test systems you have no idea of how to use
  • Working 17 hours shifts is not fun
  • A good remote desktop & SSH remote connection to the office is very nice at times
  • Always upgrade Autodesk-workstations from the local screen, else you will miss out on some essential settings, and get no warning
  • writing bash scripts would be a nice skill to have, thanks Filipp ;)
  • Lustre 2009 is a lot more picky about settings. We used use our SAN with the frames mounted directly from workstation-SAN-mount, but in 2009 we need to use the frames over Infiniband from the Frameserver in order to get the nodes to work

Film to Digital, now in essay format

Tuesday, December 2nd, 2008

Film to Digital – essay

New Red cameras – resolution glut?

Tuesday, November 25th, 2008

The new cameras from RED were announced last week. There has been a lot of anticipation, especially towards the cheaper camera, named Scarlet. The announcement does bring up some questions, though.

http://www.red.com/epic_scarlet/

New cameras

I guess most people were happy with the announcement: 8 different cameras (”brains” and a whole slew

 of attachments, accessories and modules. The “brains” come in two different flavors, Scarlet is the smaller, “lowend” (called professional) unit, Epic is the “master professional”, high end unit. Both come with several models, with different lens mounts and sensor sizes.

If you have not acquainted yourself with RED-technology, look at the presentation I made earlier this month: http://www.thingamagic.net/jussi/?p=7 Here is a recap: the RED One is a digital cinema camera, that captures images in up to 4K resolution (4096×2580) using a proprietary “raw” format, called Redcode or r3d. The camera is very cheap compared to other professional digital motion picture cameras, with the price starting at 17500 USD.

At the same time I found this interesting article, Redfacts. It is especially interesting considering that it is the first piece to really heavily criticize the RED camera that I have read. Not being too familiar with any of the systems discussed in the paper (Red One or Sony F23), I will not go into details of the paper too heavily. It did however raise some valid points, which I would like to ponder about.

(more…)

Film to Digital

Tuesday, November 11th, 2008

filmwork_2

RED – Camera

I was planning on concentrating on the RED part of the presentation, but decided I had to go thru the basics first, else there wouldn’t really be a point to the whole exercise.

The actual presentation will be on thursday the 13th of November.

/jussi