David A. Black

Yet another cello update

At last report, I was renting a cello because I wanted to get back into playing but my really really nice cello had the problem that, even after playing it for forty years, it always felt like the string length was too short and that I had to pull my left hand back toward the nut all the time.

Well — some changes were made. Sam Finlay, of Finlay & Gage in Maplewood, NJ, moved the bridge, thus lengthening the strings. He also made a new soundpost and did all the necessary tweaks. Then we put Warchal Amber strings on the cello. They’re kind of a non-gut gut string replacement; they’re wound over a non-gut, synthetic core, but specifically crafted to have a gut-like sensitivity and palette.

The result is that the Hill has a much bigger sound — which I consider absolutely fine (though it might affect the welcome I receive in ensembles with recorders and baroque flutes and such) — and feels comfortable under the left hand. The sound, though bigger, is still very Hill-like, and the strings do indeed have quite a range of color.

So I’m back to playing my cello! And working on a program of unaccompanied cello music. More as that progresses….

Some thoughts on irony

It’s a pity about irony. It’s really interesting, but almost universally misunderstood. It can be funny — but it is not the same as humor. Ironic language can be delivered in a sarcastic tone — but irony is not the same thing as sarcasm. And, most importantly, it simply isn’t about things like dying on your birthday.

The best starting point for understanding irony is paradox. Like paradox, irony involves a juxtaposition of conditions which both do and do not appear to coexist or be capable of coexisting. In the case of irony, one of the conditions generally adheres to strict logic, while the other represents a kind of reductio ad absurdum of the logical principle at hand.

Frequently, the ironic juxtaposition is humorous as well as ironic, and that’s the reason that irony so often gets confused with humor. Here’s an example, in the form of an exchange between two professors (an exchange I overheard many years ago):

Professor 1: I told a student he was stupid today. 
Professor 2: Really?
Professor 1: Actually I got him to tell me how stupid he was.
Professor 2: Oh, the Socratic method!

(Professor 1, for the record, has a curmudgeonly streak but also a heart of gold.) The irony here resides in the juxtaposition of two propositions: that getting a student to work his own way to understanding an idea is the basis of Socratic method, and that a teacher would employ Socratic method to expose a weakness on the part of the student.

The irony, moreover, is a quality of the final statement itself. Irony is first and foremost a rhetorical device. That’s one way it differs from paradox: a paradox really is a set of internally contradictory conditions that can’t be resolved, whereas irony is, so to speak, the pronouncement of paradox in situations that don’t actually fail to resolve. Professor 2 spoke ironically. Humorously, too; but ironically.

Here’s another example — this time a mealtime conversation between students (of whom I was one) and their teacher. The teacher had, as he generally did at this weekly restaurant meal, ordered tongue.

A student: You really do like eating tongue, don't you?
Teacher: I agree with Mozart about the importance of silence. That's why I eat them so assiduously.

If you like silence, you’ll eat tongues. That makes (a kind of) sense. But the idea that eating tongues will actually lead to more silence is absurd. The statement itself is ironic.

Actions can be ironic too. This example is a little more elaborate, but so good that I can’t resist it. My poker group plays a version of poker where your cards are lined up in front of you in a row, some face up and some face down. Over the course of the game, cards that were not wild can become wild. (Real poker players: please don’t wince.) When a card becomes wild, we typically push it forward about an inch so that there are two rows: the wild cards, and the non-wild and face-down cards.

One time I had only one card, and it became wild. I only had one row of cards — since only one card. Nonetheless, I reached down and pushed my single card forward an inch.

This action was ironic. Strictly speaking, it made sense: we push wild cards forward. But it was also absurd, since I only had one card; I wasn’t separating wild cards from non-wild cards. There’s no actual paradox involved because the situation was under my control, and indeed of my own making. The irony resided in the juxtaposition of the inexorable logic and the irreducible absurdity.

There is of course a ton more to say, but enough said for now. Irony is all around us, but often not recognized. I think it deserves better than it generally gets.

Cello update

I am a professionally trained cellist, though I’ve never made my living at it (never tried to) and haven’t played much at all in recent years.

I’m playing more now. But there are some circumstances surrounding this that might interest at least some of my friends and family.

In 1977 — at a time, I might add, when the British economy was incredibly favorable to middle-class Americans — my parents bought me a beautiful cello, a Joseph Hill instrument from the 1770s. (The last digit of the date on the label is illegible, but the first three are 177.) It was a stroke of great good fortune for me.

But there was a snag.

The very first time I played the Hill, I learned that to play it, I had to pull my left hand back toward the nut (the pegs, basically) what felt like almost a half-tone. So playing a D felt like playing a C#, and so forth.

I have played the Hill for over forty years, and never has my left hand felt comfortable. I always feel like I have to adjust, and have to think about adjusting — and have very little confidence that I’m going to find the notes, especially when shifting to the higher registers.

Not a bad problem to have, I know. But here’s the thing: in getting back to playing, I feel I need a cello that my left hand can deal with. Call me inflexible, cowardly, whatever. It doesn’t matter. I’m not young any more, and if I want to play, I need to accommodate my own preferences.

I am renting a not particularly great cello, and having a wonderful time. Actually I’ve now got a different cello out on consignment, it not being out of the question that I’ll purchase a lesser cello that I’m more comfortable playing than the Hill. Kind of strange position to be in, but there it is.

And another thing…. These cellos I’m connecting with now are strung with steel-core strings. I was trained to play on gut strings (and no, not just baroque music either). But I’m not going to walk into a string instrument shop and find them renting cellos with gut strings. If I do buy a second cello I will investigate converting it to gut. But I absolutely refuse, at this point in my life, to let the composition of the string blockade me from playing. Playing, you see: that’s the important thing.

Schrödinger’s Gecko: thoughts on “could” in advertising

I’m struck by the use of the word “could” in advertising. Fifteen minutes “could” save you fifteen percent or more on car insurance, says the abominable GEICO.

What does that mean?

I have a car insurance policy, and if I were to apply for a GEICO policy (which I never will, because of their commercials) they would presumably figure out what to charge me, and the amount they come up with either will or will not be fifteen percent — or more — less than what I’m paying now. It’s deterministic: there’s no uncertainty as to the outcome.

So what’s with “could”?

“Could” means: we haven’t run the numbers on you yet, so we don’t know. That’s the thing: they don’t know. But their not knowing doesn’t mean there isn’t an answer. There’s no time-bound factor here; I’m not going to get fifteen percent off based on how well I eat for a month, or how fast I can walk. Everything is in place already.

“Could” is the advertiser’s way of saying, “From our perspective, it’s uncertain whether or not you would save money because we don’t know anything about you. In fact, no uncertainty is involved, but we can’t admit that. We have to make it sound like what we don’t know is indeterminate.”

It’s a subtle yet rather clumsy way of treating potential customers as an undifferentiated mass, while pretending to have their interests at heart.

O’Connor sings “Ode”; or, getting used to life without Facebook

It came to my attention recently that Sinéad O’Connor recorded “Ode to Billy Joe” in 1995. My first reaction was to think of all my friends on Facebook who would find this fact very interesting and cool. But I haven’t been on Facebook since about ten days after the November 8, 2016 disaster.

I’d been very active on Facebook for about eight years. In many ways, I love Facebook. Being in touch with all those friends and relatives is amazing. I cannot name a stratum of my life–from kindergarten to grade school to summer camp to teaching to… anywhere–from which I did not have friends, sometimes many friends, on Facebook.

I’ve defended Facebook against its detractors many times, especially against the criticism that it’s a waste of time. It’s time spent communicating with friends. What could be wrong with that? How better to spend one’s time?

And if it were only that, then that argument would hold. The problems for me arose acutely during the 2016 campaign, though I can see in retrospect that they weren’t entirely new. The best way to sum up the difficulty I faced with Facebook is to say that what I needed in 2016 was an echo chamber and, contrary to popular myth, Facebook isn’t one.

I have no interest in “discussing” and/or “debating” so-called “issues” like capital punishment, women’s health, LGBTQ civil rights, and so forth. There’s nothing to discuss.

Facebook is problematic for me in this connection for at least two reasons.

First, there’s the endless, inescapable stream of claptrap from the right. Some of it comes from my friends; a lot of it comes from their friends, who see that they have said something on my page and come along to argue with it.

Second, I am not nearly as adept as I wish I were at ignoring the bait. To my sorrow, I really do find myself wanting to tell the haters and bigots and enemies of the Constitution what I think of them–even though I know full well that it’s a waste of electrons.

After the election I decided that I actually could not stand Facebook any more. Oh, and about that time thing: it had become a serious drain on my time, and it wasn’t all a bunch of joyful communing with friends. I found myself spending, say, ninety minutes in the morning, and more time later, mostly swatting at right-wing gnats. That is not an appropriate use of my time. Moreover, it made me miserable during the campaign (during which I was already pretty miserable, because I saw clearly all along that Fuckface was going to win), and after the election my already small reserves of tolerance for Facebook-style “discussion” were essentially depleted.

I’ve done a good job of avoiding Facebook. I logged in to say thanks for everyone’s birthday wishes last week, and that’s pretty much it.

But it’s a hard adjustment. I’m used to telling lots of people things all the time. When I learn that Sinéad O’Connor recorded “Ode to Billie Joe” and have to keep it more or less to myself, it feels weird.

That’s OK, though. I’m working on repurposing the Facebook time and mental energy to other things. I don’t yet know how my role in the resistance will pan out, but I want to give it a chance to do so. Facebook can of course be a rallying point for people of good will (that is, Fuckface’s enemies). But, to use a favorite phrase of my late father’s, it exacts a price.

Some thoughts on code review

Github has added some new features to pull requests that make for a richer code review cycle. To commemorate this, I thought I’d share some words about code review–not the Github stuff specifically (I’m still exploring!) but the process in general.

Code review is a cornerstone of good team-based software development. It isn’t the same as pair-programming, and it isn’t a replacement for it. By the same token, pair programming is not a replacement for code review. The two practices work in tandem–and, as I have suggested elsewhere, code review can also make solo coding a viable part of the development process. I consider it absolutely essential.

In this article I’m going to go over three guidelines for successful code reviewing:

Do your code reviewing alone
Be fair and flexible about style
Treat unclarity as a red flag

I’ll say something about each of these guidelines in turn.

Do your code reviewing alone

If you are pairing on code review, your pair is either (a) one of the people who wrote the code, or (b) someone else who didn’t write the code. Both options are problematic.

Reviewing code with someone who originally worked on the code inevitably means that you’re seeing the code through their eyes. You’re really asking for a guided tour, not doing a review. In review, code should speak for itself. You don’t want someone there explaining it to you. That’s not representative of what it would be like to work on it yourself at a later date, or for someone new to the team to pick up a story that required them to work on this code.

Moreover, you might–without meaning to–put your pair on the defensive. “Why did you do this?” you might ask, pointing to some line or lines of code. The person who wrote or co-wrote those lines then has to mount a defense of them, and that can be frustrating as well as unproductive. Lines of code in isolation sometimes don’t explain themselves; you, as the reviewer, might have to look at the whole class or file to see the rationale for the code’s structure or naming conventions. But quizzing the coders on why they did something is not the same, cognitively, as looking through the code yourself and letting the code speak (or fail to speak) to you itself.

Doing code review in a pair where neither of you worked on the original code is also not advisable. Each of you will pick up on different things, and pointing those things out–not to mention scrolling up and down and flipping among files out of sync with each other–is simply not the same process or experience as examining the code yourself. Your different paths through both seeing and understanding the code will cause interference patterns and almost certainly make the task less efficient and less effective.

Now this doesn’t mean that you should never sit down with someone else and look at code! I’m talking specifically about code review: someone has just committed some code, and it needs that next pair of eyes. You may have occasion to discuss code you didn’t write with someone who did write it, and you may have occasion to discuss code with someone else when neither of you wrote it. (Think of a pair picking up a ticket on some code by people who have left the team.) What I’m saying is not that it’s bad for these things to happen. What I’m saying is that they are not code review.

Be fair and flexible about style

Your team should have a working agreement about code style. It may be a blanket agreement (“When we write Python, it has to pass Pylint”). Or it may be a very specific agreement (“No empty parentheses in Ruby method definitions”). How you determine your team or “house” style is a matter for the team and the organization.

When you do code review, you should not hesitate to point out violations of team style agreements. However, if the code you’re reviewing does not violate team style–but happens not to be in the style you yourself would have used–leave it alone. Code review isn’t about making everything look as if you had written it.

If you spot something stylistic that you think no one on the team should be doing, make a note of it and bring it up later as a possible addendum to your team’s style agreement. But don’t flag it in code review if it isn’t already part of that agreement. Code review is not the place for stylistic ping-pong matches.

Treat unclarity as a red flag

Perhaps the main thing you should be looking for during code review is code that is not clear. Unclarity can take many forms; but whatever form it takes, it should be treated as a show-stopper. Unclear code means someone is going to get confused at some point down the road–possibly including the people who wrote the code in the first place. (We’ve all had that happen!) You should immediately write a comment when you come across something unclear. In your comment you should include some analysis as to why you think it’s unclear: a method that’s doing too much and is hard to follow, confusing variable naming, and so forth.

This guideline is actually not only important in its own right, but also figures as another reason not to do code review alone. If you do code review in pairs, it’s too easy to explain the code to each other as you go. You don’t get enough of a chance to determine whether or not the code per se is clear.

Don’t be afraid of insulting someone’s code by saying it’s unclear. By the same token, if someone reviews your code and says they didn’t understand or couldn’t follow something–whether because of logic or naming or anything else–don’t defend or explain your code. It’s likely that you could explain the unclarity away. But you shouldn’t have to, and in the future when other people look at your code, you won’t have the opportunity to. Consider the finding of unclarity to be a reality check, and a very compelling indication that you need to change the code.

Happy reviewing!

Prostate Cancer Awareness begins with prostate awareness

This is not medical advice. Please consult your doctor(s) about medical questions and issues.

September is Prostate Cancer Awareness Month. I’m pleased to say that it’s also the month when my three-year post-surgery blood test returned good results. My PSA (more about PSA below) is still “undetectable,” as it has been since two months after my July, 2013 radical prostatectomy.

Prostate cancer awareness is very, very important. But what’s struck me forcefully over my three years as a prostate cancer patient and survivor is the number of men who have no idea what their prostate does. I’ve started asking them point-blank. Nine times out of ten, I get shrugs. Many men think the prostate does nothing, like the appendix. Those who know what the prostate does tend to have a medical background, or to have had prostate cancer themselves or in the family.

The prostate secretes semen. The testicles do not secrete semen, despite vulgarisms that suggest otherwise (like “bust a nut”). The testicles produce sperm cells. Those are the things that swim upstream looking for eggs to fertilize. Semen is the delivery mechanism for sperm cells. Semen comes from the prostate.

It follows, then, that after your prostate is removed, you can no longer ejaculate. When I mention this fact to other men, I often get asked “Why not?” or “Are you sure?” The answers to these questions are, respectively, “Because semen comes from the prostate” and “Yes, I am fucking sure.”

Here are some generalities, based on what I have learned as member of the Reluctant Brotherhood.

The nerves that enable erection run along the prostate. If those nerves are implicated in the cancer, they have to be removed. If they’re not cancerous, they still have to be peeled away from the prostate, which traumatizes them. In those cases they take a while to recover, possibly measured in years. In short, erectile function after a prostatectomy covers a wide spectrum. And medicine and/or devices play a role for many men.

While ejaculation is not possible without a prostate, orgasm is–even in the absence of erection. The strength and quality of orgasm will vary from man to man, and may improve over time.

Urinary incontinence after a prostatectomy also varies on a case-by-case basis (that’s a pattern with this disease and its treatments). At its worst, you’re looking at a lifetime of Depends and other ameliorative measures. But if you’re lucky, you don’t have any serious problems with incontinence. Maybe a little bit when you sneeze.

The main way you detect prostate cancer, or at least become suspicious of its presence, is through a PSA test. PSA stands for Prostate Specific Antigen. It shows up in the blood in higher concentrations when there’s cancer. If the doctor feels it’s warranted, they (yes, I know they is really plural but I hate “he or she”) will order a biopsy. A digital exam of the rectum is also used in diagnosing cancer and other prostate disorders.

Please do not fear any of these diagnostic techniques. Do them if your doctor suggests them. In fact, if you’re old enough–I think the guidelines say forty, but again, I’m not dispensing medical advice so please research it–and/or have a relevant family history, you should ask about them.

As regards treatment, the big three are surgery, radiation, and “active surveillance” (sometimes called “watchful waiting”). Active surveillance means not intervening, but continuing to measure and monitor via PSA and possibly follow-up biopsies, and then intervening with surgery or radiation if it becomes necessary. In my case, active surveillance was not an option because while the cancer was not terribly aggressive, there was quite a lot of it. I talked to two radiologists and two surgeons, and decided that surgery was right for me.

Prostate cancer is a grim disease. It has a reputation as a “good” cancer, because survival rates when it’s caught early are pretty spectacular and because some manifestations of it are slow-growing enough not to warrant intervention. But the permanent changes from treatment are significant in every case, and extremely significant in some cases. And around 30,000 men die from it every year in the United States. It’s not to be taken lightly.

So be prostate (cancer) aware, this month and every month.

Again: This is not medical advice. Please consult your doctor(s) about medical questions and issues.

Why I am not doctrinaire about pair programming

Pair programming is a wonderful practice. But it isn’t a cure-all for code-quality issues, and it isn’t the only game in town.

My reasons for not being a fan of a 100% pair programming approach fall generally under two headings:

pair programming has some (sometimes serious) drawbacks
other practices often produce good results

Let’s look at these in turn.

Pair programming has some (sometimes serious) drawbacks

I’ve heard it said many times that the chief strength of pair programming lies in its incorporation of instantaneous code review. But this is a dangerous half-truth. I am an unshakable believer in peer review of all code. Pair programming, I’m here to tell you, isn’t what I mean by peer review.

Yes, the members of the pair do catch each other doing things they shouldn’t. The locus classicus in my case is the time Brian Olore talked me off the ledge when I, fed up with undefined method 'blah' for nil errors, threatened to write something along the lines of expect(nil).to_receive(:blah).... Would I really have done it had Brian not been there? Was it just a cry for help? We’ll never know. But yes: pair programming can save developers from themselves, and from each other.

But it can also become a folie à deux. The pair becomes a team-within-a-team, and when both members of the pair find a code problem equally exasperating, it’s not uncommon for them to agree, perhaps against their better judgement, on a suboptimal solution. (I’m talking mostly about design and style, not sweeping non-working code under the carpet.) Getting a third person on the team to look at the paired code is peer review. Pair programming isn’t.

Pairing can be an unpleasant experience. I’ve been on teams with a 100% pairing approach, and I’ve had some very bad days. Not everyone is born to pair with everyone else. You can, if you like, say that developers who don’t like working with their pair should just suck it up and do it anyway. I don’t buy that. I want my software development practice to be enjoyable and rewarding, not only in the long run but as it happens. I believe in programmer fun. I do not want to feel uncomfortable or bored or in conflict very much of the time at all. I dislike the Procrustean approach to pair programming, which says (or implies) that pair programming is Always Right and everything else, like whether or not you like your job, has to fall into place around that.

Another serious problem with a doctrinaire approach to pair programming is that it relegates other, potentially very fruitful approaches to the scrap heap. That’s no good either.

Other practices often produce good results

I write some of my best code when I work alone for a while, and then come up for air and get a code review. The working alone takes me to depths of concentration that I cannot achieve when I pair. My colleagues are there to catch me if I fall (and I them).

I’ve been soloing on code, in one form or another, since 19[REDACTED]. It’s one of the great pleasures of my life, and it works to the benefit of teams I’m on because it results in good code. A doctrinaire approach to pair programming, however, rules this kind of work out of bounds.

This makes no sense to me. If you have a path that leads to good code, why would you abandon it? By what standard of measure is that better than not abandoning it?

Res ipsa loquitor, I would have thought. Nor do I just toss in mention of code review to make my pronouncements more credible. It’s not uncommon for me to come up with either new ideas or solutions to old problems that are right on target but, in the doing, a bit over-engineered: too many new classes, too much indirection, and so on. My colleagues can spot both the goodness in what I’ve done and, in many cases, a simpler way to do it. And I reciprocate. It’s a symbiosis that differs from that embodied in pair programming, but which produces excellent results.

And here’s another thing: Github. If there was an equivalent of Github pull requests fifteen or twenty years ago when (as I remember it) pair programming started to become popular, I don’t know what it was. Today it’s possible, and easy, for teammates to go line-by-line through each other’s code, compare it to what came before, comment on it, and suggest changes to it. The landscape is not the same with these tools as without them. That’s a major reason why I find the notion of 100% pair programming constrictive. It strikes me as outdated.

I have whimsically referred to soloing-plus-pull-request-review as “asynchronous pairing.” There’s something to that–though mind you, I strongly advocate pull request review for all code, whether soloed or paired. And, to come full circle, pair programming has a lot going for it. I’ve been on teams where no one does it, and I prefer teams where they do. At its best, it’s highly satisfying. But it isn’t the only way to produce code of high quality, and we cannot afford to jettison other practices that allow us to do just that.

Permalink shenanigans

Yesterday I published a blog post–the first non-hello-world one on my new blog. Before doing so I had tried to get pretty permalinks working (it’s a WordPress site). They didn’t seem to work, probably because I put an nginx rule in the wrong config file… but anyway, they didn’t work.

So yesterday’s post went out to the world with an ugly permalink:davidablack.net/blog?p=33. I was very happy with the response to the post, but not with the ugly permalink. I didn’t like the /blog part (the host should be blog.davidablack.net) and I didn’t like the uninformative query string.

Today I fixed it, and with little or no down time.

It was easy to change the root address and type of permalink in WordPress. Trickier was grandfathering in poor ol’ ?p=33.

I did it by putting a rewrite rule in the nginx config file for davidablack.net (which is also a WordPress site). The rule is in the /blog location block, and looks like this:

if ($query_string ~ "p=33") {
      rewrite ^(.*)$ http://blog.davidablack.net/2016/08/17/a-log-reflector-for-aws-lambda/ last;
    }

(Except it’s probably word-wrapping on your screen: everything from “rewrite” to “last;” should be on one line.)

So yesterday’s link still works, and so does today’s nicer one. Cool.

A “Log Reflector” for AWS Lambda

If your organization has lots of AWS Lambda functions deployed, it’s also got lots of AWS CloudWatch log groups. Keeping track of all the logs across projects and teams can be unwieldy. But with a little effort, you can forward the log entries from every group and stream to one centralizing Lambda function. And from there you can do whatever you want with them–including sending them all to a log aggregator like SumoLogic.

That’s just what we in the Post Enrollment Tech (PET) team at 2U, Inc. have done. Our team has a large handful of functions running on AWS Lambda, and other teams are starting to follow suit. Each such Lambda function has a CloudWatch log group associated with it; each such log group contains a number of timestamped log streams. We wanted to solve the problem of forwarding our CloudWatch logging events to SumoLogic, and we wanted to solve it not only for our team’s projects but for all projects organization-wide.

We called the project that tackled this problem the Log Reflector. Our primary goal for the project was to automate the forwarding of all Lambda logging events to SumoLogic. A secondary (though important) goal was to ensure that any newly-created CloudWatch log group automatically got included in that forwarding process. We achieved both goals, and learned a lot along the way.

The main thing we learned at the outset was that a given CloudWatch log group can subscribe to a given Lambda function, causing that function to be invoked every time a message gets added to any stream in that log group. On invocation the function receives the contents of the log message, plus some metadata–the whole logging event, basically. The function, in turn, can do whatever it wants with the data sent to it, including passing it along to SumoLogic.

Thus the log-reflector Lambda function was born. Its purpose is for log groups to subscribe to it, and to listen to the log streams of those log groups and forward the log messages to SumoLogic.

In the service of our secondary goal–automatically subscribing new log groups to the log-reflector–we created the log-subscriber function, which listens for new log groups being created and subscribes them to the log reflector. There’s some icing on the cake (for example, a deploy-time script that plays catch-up by subscribing all existing log groups to the reflector); but between them, log-reflector and log-subscriber make up the heart and soul of the project.

In this article we’ll walk through a good cross-section of what the PET team has done on the Log Reflector project. We’ll start by looking at the handler of log-reflector. I don’t want to dwell on that handler too much, partly because it was very quick and easy (you’ll see why) and partly because you may or may not be sending your logs to SumoLogic like we are. But you’ll see what it consists of. After that we’ll take a look at how to subscribe a log group to a Lambda function. We’ll use a CLI approach, which isn’t exactly automatic but does give you a good feel for what’s actually going on.

Then we’ll turn to log-subscriber and look at how that function is configured, and at some of the code. I won’t go over everything, but I’ll show you enough and give you enough pointers that you’ll be able to pick up the threads if you wish to.

Two notes before we get going:

First, you’ll notice that the processes described here are rather heavy on permissions-related tasks. That comes with the territory (AWS): every entity needs permission to do virtually anything with, or to, any other entity. Getting permissions right has been a large part of the project, and worth trying to get a handle on from the start.

Second, keep in mind that sending logging events to SumoLogic is just what we do. It doesn’t have to be what you do with your centralized logging events. The main thing here is to get all your CloudWatch logs talking to a single listener–the log reflector–and to automate the process of hooking new log groups up to that listener. Your log reflector function can connect to SumoLogic or it can do something else. That part is up to you.

Now let’s dig in.

The handler for `log-reflector`

If you’re not planning to stream your logs to SumoLogic, you can skim or skip this section. Just remember, conceptually, that the log-reflector function is listening for logging events from the log groups that are subscribed to it. Each such event gets passed along to log-reflector in a well-defined JSON data structure. Whatever else your handler does, it needs to know how to pull this structure apart and do whatever it needs to do–SumoLogic or otherwise–with what it finds.

The handler for log-reflector in our project comes essentially verbatim from SumoLogic’s sumologic-aws-lambda Github repository–specifically, cloudwatchlogs_lambda.js. Put this code in a Lambda function, and you’ve got something that knows how to parse log messages that get forwarded to it, and how to forward them to SumoLogic in turn.

The handler uses the node4.3 Lambda runtime. Out of the box it encourages you to hard-code your SumoLogic endpoint in the code. We prefer to inject the endpoint into the handler via the SUMO_ENDPOINT environment variable. However you do it, the handler code correctly parses the incoming logging event from whatever log stream is talking to it, and it knows how to send it to Sumo.

Once you have your log-reflector function in place, doing what you want it to do (or perhaps for now just logging, to its own CloudWatch log, the fact that it has received an event), you’ll want to know how exactly to subscribe a log group to a Lambda function. Let’s look at that next.

Subscribing a log group to a Lambda function

Say you’ve got a Lambda function called update-student. Every time someone hits that function, you want the CloudWatch logging event to make its way to log-reflector.

In other words, you want to subscribe update-student‘s log group–which is probably called something like /aws/lambda/update-student–to your log-reflector function.

You can wire up the subscription of a log group to log-reflector in a number of ways:

in the Lambda console UI (hint: create a “trigger”)
in the CloudWatch logs console UI (hint: choose a log group and then “Stream to AWS Lambda” from the “Actions” dropdown)
via the AWS CLI (the aws utility)

We’ll look at the third of these options (but feel encouraged to look at the first two also). But first, you’ll need to set the right permissions on the log-reflector function. You need to tell the function that CloudWatch log groups are allowed to invoke it.

Make sure you’ve got your AWS credentials and defaults in place, and then issue this command (adjusting the argument values appropriately, of course):

aws lambda add-permission \
          --function-name 'log-reflector' \
          --statement-id 'something_unique' \
          --action 'lambda:InvokeFunction' \
          --principal 'logs.us-east-1.amazonaws.com' \
          --source-arn 'arn:aws:logs:us-east-1:123123123123:log-group:/aws/lambda*:*' \
          --source-account 123123123123

You’ll get back some JSON with details of what you just did. Now log-reflector can be invoked by your log groups. So it’s time to subscribe the /aws/lambda/update-student log group to the log reflector.

To create the actual subscription, you specific the log group you want listened to and the function you want to do the listening:

aws logs put-subscription-filter \
  --log-group-name '/aws/lambda/update-student' \
  --destination-arn 'arn:aws:lambda:us-east-1:123456789012:function:log-reflector' \
  --filter-name 'update-student-filter' \
  --filter-pattern ''

(Filter patterns offer you the opportunity to get fancy with what’s passed through. We’re not going to delve into them here.)

If all goes well there will be no screen output. But if you wait a minute or two and go to (or reload) your CloudWatch logs screen, you should see that /aws/lambda/update-student now has a subscription listed. And you can examine your handiwork with another CLI call:

aws logs describe-subscription-filters \
  --log-group-name '/aws/lambda/update-student'

which will give you back something like this:

   {
    "subscriptionFilters": [
        {
            "filterName": "update-student-filter", 
            "destinationArn": "arn:aws:lambda:us-west-2:123456789012:function:log-reflector", 
            "creationTime": 1470599015946,
            "filterPattern": "", 
            "logGroupName": "/aws/lambda/update-student"
        }
    ]
}

(Note that we get an array of filters. At present, no log group can have more than one subscription, so it will always be an array of one element or an empty array.)

You’ve created the subscription; now try it out! Using your favorite client (cURL, postman, etc.), hit the endpoint for your update-student function. Be patient… and in not too long you’ll see your request logged in a stream in the function’s log group. You should also see that your log reflector function got invoked; check log-reflector‘s log group for a recent entry.

So now you have a function in place that will receive logging events from subscribed log groups, and you’ve got a technique for subscribing log groups to that function. But you don’t want to go on doing this manually. So let’s look at how to automate the creating of subscriptions. Our goal will be to assign yet another Lambda function the task of listening for the creation of new log groups, and subscribing each such log group to the log reflector.

Automating subscriptions with `log-subscriber`

This time, we’ll look at the permissions-ish chores first, and then turn to the handler code.

The IAM role that’s in effect when log-subscriber is executed needs permission to create subscription filters on CloudWatch log groups. You can create or edit a policy containing this permission in the console UI (in IAM go to “Roles” and choose the relevant role), or you can do it through the CLI. We’ll do the latter here.

The process involves two steps: (1) creating the policy, and (2) attaching it to a role. Here’s how to create it:

aws iam create-policy \
  --policy-name 'let-me-do-something' \
  --policy-document '{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "logs:DescribeSubscriptionFilters",
                "logs:PutSubscriptionFilter",
                "logs:DeleteSubscriptionFilter"
            ],
            "Resource": "arn:aws:logs:us-east-1:123123123123:log-group:/aws/lambda*:*",
            "Effect": "Allow"
        }
    ]
}'

(You may not need the describe and delete operations yet, but adding them in makes it easier to add functionality to your code later.)

This command will come back at you with, among other things, the ARN of the new policy. You’ll need it for the next command: attaching the policy to the role. You’ll also need the role name, which you can get from the IAM console. Cut and paste as needed to issue the following command:

aws iam attach-role-policy \
  --policy-arn 'arn:aws:iam::123123123123:policy/let-me-do-something' \
  --role-name 'role-string-r-IamRoleLambda-A1B2C3D4E5F6'

Now that the lambda’s role is allowed to create subscription filters, it’s time to flesh out the log-subscriber function itself. We’re not going to go into all the possible detail of the process here, but the general idea is this: the handler for log-subscriber will be invoked whenever a new log group is created, and will go through the permission and subscription-creation steps we went through above–but automatically.

If you’re using Python you’ll probably want to use the boto3 library, which wraps the AWS API. You’ll need to research that further on your own, but here’s an example showing what our earlier aws add-permission command would look like translated into boto3:

import boto3

lamb = boto3.client('lambda')

lamb.add_permission(
           FunctionName='log-reflector',
           StatementId='something-unique',
           Action='lambda:InvokeFunction',
           Principal='logs.us-east-1.amazonaws.com',
           SourceArn='arn:aws:logs:us-east-1:123123123123:log-group:/aws/lambda*:*',
           SourceAccount='123123123123')

When the subscriber function gets invoked, it will be sent information about the log group whose creation triggered the invocation. In order to create the subscription, you’ll want to dig out the name of the log group and then do a boto3-style put_subscription_filter call:

def handler(event, context):
    log_group_name = event['detail']['requestParameters']['logGroupName']
    log_client = boto3.client('logs')
    log_client.put_subscription_filter(
         logGroupName=log_group_name,
         filterName='nice-name-for-filter',
         filterPattern='',
         destinationArn='arn:aws:lambda:us-west-2:123456789012:function:log-reflector'

(This is of course a “happy path” example. You’ll get an exception if the event your subscriber function receives doesn’t conform to the data structure it’s looking for–but you can always add error-handling of whatever kind you need.)

But how do you get a function like log-subscriber to be invoked upon log-group creation? To start with, by defining a rule. You can do so from the command line:

aws events put-rule \
  --name 'New_log_group_subscriber' \
  --region us-east-1 \
  --event-pattern '{
  "detail-type": [
    "AWS API Call via CloudTrail"
  ],
  "detail": {
    "eventSource": [
      "logs.amazonaws.com"
    ],
    "eventName": [
      "CreateLogGroup"
    ]
  }
}'

So now there’s a rule in force that says that the creation of a log group should invoke a function. But we haven’t told it which function! To do that we have to create a target for the rule–actually an array of targets, but we only need one. We’ll put the JSON describing the target in a shell variable and then use it inside a JSON array:

target='{
  "Arn": "arn:aws:lambda:us-west-2:123456789012:function:log-subscriber",
  "Id": "LogSubscriberTarget"
}'

aws events put-targets \
  --rule "New_log_group_subscriber" \
  --region us-east-1 \
  --targets "[$target]"

(Note that the rule parameter has to match the name we gave the rule in the call to put-rule, above.)

Let’s try out log-subscriber. Easy: we’ll create a brand-new log group, and then check whether or not it has been subscribed to log- reflector:

aws logs create-log-group \
  --log-group-name '/aws/lambda/testing-the-subscriber'

Now count to 200…. Not really, but you have to be patient: the creation of the new subscription may not be instantaneous. But in not too long you should, upon reloading, see that the new log group has appeared on your CloudWatch console, and has a subscription to your log reflector function. You can also use the technique we used to check on the success of the subscription creation we did earlier:

aws logs describe-subscription-filters \
  --log-group-name '/aws/lambda/testing-the-subscriber'

You should see something like this:

   {
    "subscriptionFilters": [
        {
            "filterName": "aws-lambda-testing-the-subscriber", 
            "destinationArn": "arn:aws:lambda:us-west-2:123456789012:function:log-reflector", 
            "creationTime": 1470615778768, 
            "filterPattern": "", 
            "logGroupName": "/aws/lambda/testing-the-subscriber"
        }
    ]
}

(Note that the filter name was generated automatically.)

This is the familiar array-of-one description of the subscriptions (really, the single subscription) of the log group to the log-reflector function.

So now we’re in a good place. Every time a new log group is created, that log group will be subscribed to the log reflector. And the log reflector can do whatever it needs to do with the stream of logging events coming at it from all the log groups that it’s listening to.

Conclusion(s) and thoughts

Development on the Log Reflector project has gone very smoothly, though there’s been a bit of a learning curve. AWS roles and permissions are famously complex, and it’s not always easy to get the hang of what’s going on. For example: you don’t grant a log group permission to execute a function; you grant the function permission to be executed by the log group. That kind of thing.

And there are a few glitches that haven’t sorted themselves out. One that I’m still puzzled by is the fact that if you use the CloudWatch console to tell a log group to stream to a Lambda function, the subscription of the log group to the function shows up in the Lambda console when you look at the function’s triggers. But if you do it programmatically, it doesn’t; there appear to be no triggers. Yet everything else works identically in the two cases, and I have not yet tracked down the reason for the anomaly.

AWS regions can be a pitfall. I have a slightly dinkier version of the Log Reflector on my personal Github and AWS accounts, as a learning tool. One morning, all my test scripts (delete and create subscriptions, add permissions, etc.) stopped working. I’m not entirely sure but I’m reasonably certain that it was because I’d had the AWS_DEFAULT_REGION environment variable set in the terminal that I’d used previously, but not in the one I was using that day. Once I set that variable, things largely fell into place and started working again. Old hat to seasoned AWS developers, no doubt, but something to keep an eye on.

My final thought is this: working this intensely on an AWS project has taught me, above all, that there is more to AWS than one is likely to be able to imagine. It’s gargantuan. And it’s correspondingly complex–but the complexity, from what I’ve seen, is packaged about as well as it could be. The documentation is very good, though I tend to add the word stackoverflow to my Google searches because otherwise all you get, typically, is the official docs and not the down-in-the-trenches problems and solutions.

So enjoy AWS, and happy reflecting!

Thanks to Nate Snyder for pairing on this project and for feedback on a draft of this post.