Some thoughts on code review

Github has added some new features to pull requests that make for a richer code review cycle. To commemorate this, I thought I’d share some words about code review–not the Github stuff specifically (I’m still exploring!) but the process in general.

Code review is a cornerstone of good team-based software development. It isn’t the same as pair-programming, and it isn’t a replacement for it. By the same token, pair programming is not a replacement for code review. The two practices work in tandem–and, as I have suggested elsewhere, code review can also make solo coding a viable part of the development process. I consider it absolutely essential.

In this article I’m going to go over three guidelines for successful code reviewing:

  1. Do your code reviewing alone
  2. Be fair and flexible about style
  3. Treat unclarity as a red flag

I’ll say something about each of these guidelines in turn.

Do your code reviewing alone

If you are pairing on code review, your pair is either (a) one of the people who wrote the code, or (b) someone else who didn’t write the code. Both options are problematic.

Reviewing code with someone who originally worked on the code inevitably means that you’re seeing the code through their eyes. You’re really asking for a guided tour, not doing a review. In review, code should speak for itself. You don’t want someone there explaining it to you. That’s not representative of what it would be like to work on it yourself at a later date, or for someone new to the team to pick up a story that required them to work on this code.

Moreover, you might–without meaning to–put your pair on the defensive. “Why did you do this?” you might ask, pointing to some line or lines of code. The person who wrote or co-wrote those lines then has to mount a defense of them, and that can be frustrating as well as unproductive. Lines of code in isolation sometimes don’t explain themselves; you, as the reviewer, might have to look at the whole class or file to see the rationale for the code’s structure or naming conventions. But quizzing the coders on why they did something is not the same, cognitively, as looking through the code yourself and letting the code speak (or fail to speak) to you itself.

Doing code review in a pair where neither of you worked on the original code is also not advisable. Each of you will pick up on different things, and pointing those things out–not to mention scrolling up and down and flipping among files out of sync with each other–is simply not the same process or experience as examining the code yourself. Your different paths through both seeing and understanding the code will cause interference patterns and almost certainly make the task less efficient and less effective.

Now this doesn’t mean that you should never sit down with someone else and look at code! I’m talking specifically about code review: someone has just committed some code, and it needs that next pair of eyes. You may have occasion to discuss code you didn’t write with someone who did write it, and you may have occasion to discuss code with someone else when neither of you wrote it. (Think of a pair picking up a ticket on some code by people who have left the team.) What I’m saying is not that it’s bad for these things to happen. What I’m saying is that they are not code review.

Be fair and flexible about style

Your team should have a working agreement about code style. It may be a blanket agreement (“When we write Python, it has to pass Pylint”). Or it may be a very specific agreement (“No empty parentheses in Ruby method definitions”). How you determine your team or “house” style is a matter for the team and the organization.

When you do code review, you should not hesitate to point out violations of team style agreements. However, if the code you’re reviewing does not violate team style–but happens not to be in the style you yourself would have used–leave it alone. Code review isn’t about making everything look as if you had written it.

If you spot something stylistic that you think no one on the team should be doing, make a note of it and bring it up later as a possible addendum to your team’s style agreement. But don’t flag it in code review if it isn’t already part of that agreement. Code review is not the place for stylistic ping-pong matches.

Treat unclarity as a red flag

Perhaps the main thing you should be looking for during code review is code that is not clear. Unclarity can take many forms; but whatever form it takes, it should be treated as a show-stopper. Unclear code means someone is going to get confused at some point down the road–possibly including the people who wrote the code in the first place. (We’ve all had that happen!) You should immediately write a comment when you come across something unclear. In your comment you should include some analysis as to why you think it’s unclear: a method that’s doing too much and is hard to follow, confusing variable naming, and so forth.

This guideline is actually not only important in its own right, but also figures as another reason not to do code review alone. If you do code review in pairs, it’s too easy to explain the code to each other as you go. You don’t get enough of a chance to determine whether or not the code per se is clear.

Don’t be afraid of insulting someone’s code by saying it’s unclear. By the same token, if someone reviews your code and says they didn’t understand or couldn’t follow something–whether because of logic or naming or anything else–don’t defend or explain your code. It’s likely that you could explain the unclarity away. But you shouldn’t have to, and in the future when other people look at your code, you won’t have the opportunity to. Consider the finding of unclarity to be a reality check, and a very compelling indication that you need to change the code.

Happy reviewing!

Why I am not doctrinaire about pair programming

Pair programming is a wonderful practice. But it isn’t a cure-all for code-quality issues, and it isn’t the only game in town.

My reasons for not being a fan of a 100% pair programming approach fall generally under two headings:

  • pair programming has some (sometimes serious) drawbacks
  • other practices often produce good results

Let’s look at these in turn.

Pair programming has some (sometimes serious) drawbacks

I’ve heard it said many times that the chief strength of pair programming lies in its incorporation of instantaneous code review. But this is a dangerous half-truth. I am an unshakable believer in peer review of all code. Pair programming, I’m here to tell you, isn’t what I mean by peer review.

Yes, the members of the pair do catch each other doing things they shouldn’t. The locus classicus in my case is the time Brian Olore talked me off the ledge when I, fed up with undefined method 'blah' for nil errors, threatened to write something along the lines of expect(nil).to_receive(:blah).... Would I really have done it had Brian not been there? Was it just a cry for help? We’ll never know. But yes: pair programming can save developers from themselves, and from each other.

But it can also become a folie à deux. The pair becomes a team-within-a-team, and when both members of the pair find a code problem equally exasperating, it’s not uncommon for them to agree, perhaps against their better judgement, on a suboptimal solution. (I’m talking mostly about design and style, not sweeping non-working code under the carpet.) Getting a third person on the team to look at the paired code is peer review. Pair programming isn’t.

Pairing can be an unpleasant experience. I’ve been on teams with a 100% pairing approach, and I’ve had some very bad days. Not everyone is born to pair with everyone else. You can, if you like, say that developers who don’t like working with their pair should just suck it up and do it anyway. I don’t buy that. I want my software development practice to be enjoyable and rewarding, not only in the long run but as it happens. I believe in programmer fun. I do not want to feel uncomfortable or bored or in conflict very much of the time at all. I dislike the Procrustean approach to pair programming, which says (or implies) that pair programming is Always Right and everything else, like whether or not you like your job, has to fall into place around that.

Another serious problem with a doctrinaire approach to pair programming is that it relegates other, potentially very fruitful approaches to the scrap heap. That’s no good either.

Other practices often produce good results

I write some of my best code when I work alone for a while, and then come up for air and get a code review. The working alone takes me to depths of concentration that I cannot achieve when I pair. My colleagues are there to catch me if I fall (and I them).

I’ve been soloing on code, in one form or another, since 19[REDACTED]. It’s one of the great pleasures of my life, and it works to the benefit of teams I’m on because it results in good code. A doctrinaire approach to pair programming, however, rules this kind of work out of bounds.

This makes no sense to me. If you have a path that leads to good code, why would you abandon it? By what standard of measure is that better than not abandoning it?

Res ipsa loquitor, I would have thought. Nor do I just toss in mention of code review to make my pronouncements more credible. It’s not uncommon for me to come up with either new ideas or solutions to old problems that are right on target but, in the doing, a bit over-engineered: too many new classes, too much indirection, and so on. My colleagues can spot both the goodness in what I’ve done and, in many cases, a simpler way to do it. And I reciprocate. It’s a symbiosis that differs from that embodied in pair programming, but which produces excellent results.

And here’s another thing: Github. If there was an equivalent of Github pull requests fifteen or twenty years ago when (as I remember it) pair programming started to become popular, I don’t know what it was. Today it’s possible, and easy, for teammates to go line-by-line through each other’s code, compare it to what came before, comment on it, and suggest changes to it. The landscape is not the same with these tools as without them. That’s a major reason why I find the notion of 100% pair programming constrictive. It strikes me as outdated.

I have whimsically referred to soloing-plus-pull-request-review as “asynchronous pairing.” There’s something to that–though mind you, I strongly advocate pull request review for all code, whether soloed or paired. And, to come full circle, pair programming has a lot going for it. I’ve been on teams where no one does it, and I prefer teams where they do. At its best, it’s highly satisfying. But it isn’t the only way to produce code of high quality, and we cannot afford to jettison other practices that allow us to do just that.

A “Log Reflector” for AWS Lambda

If your organization has lots of AWS Lambda functions deployed, it’s also got lots of AWS CloudWatch log groups. Keeping track of all the logs across projects and teams can be unwieldy. But with a little effort, you can forward the log entries from every group and stream to one centralizing Lambda function. And from there you can do whatever you want with them–including sending them all to a log aggregator like SumoLogic.

That’s just what we in the Post Enrollment Tech (PET) team at 2U, Inc. have done. Our team has a large handful of functions running on AWS Lambda, and other teams are starting to follow suit. Each such Lambda function has a CloudWatch log group associated with it; each such log group contains a number of timestamped log streams. We wanted to solve the problem of forwarding our CloudWatch logging events to SumoLogic, and we wanted to solve it not only for our team’s projects but for all projects organization-wide.

We called the project that tackled this problem the Log Reflector. Our primary goal for the project was to automate the forwarding of all Lambda logging events to SumoLogic. A secondary (though important) goal was to ensure that any newly-created CloudWatch log group automatically got included in that forwarding process. We achieved both goals, and learned a lot along the way.

The main thing we learned at the outset was that a given CloudWatch log group can subscribe to a given Lambda function, causing that function to be invoked every time a message gets added to any stream in that log group. On invocation the function receives the contents of the log message, plus some metadata–the whole logging event, basically. The function, in turn, can do whatever it wants with the data sent to it, including passing it along to SumoLogic.

Thus the log-reflector Lambda function was born. Its purpose is for log groups to subscribe to it, and to listen to the log streams of those log groups and forward the log messages to SumoLogic.

In the service of our secondary goal–automatically subscribing new log groups to the log-reflector–we created the log-subscriber function, which listens for new log groups being created and subscribes them to the log reflector. There’s some icing on the cake (for example, a deploy-time script that plays catch-up by subscribing all existing log groups to the reflector); but between them, log-reflector and log-subscriber make up the heart and soul of the project.

In this article we’ll walk through a good cross-section of what the PET team has done on the Log Reflector project. We’ll start by looking at the handler of log-reflector. I don’t want to dwell on that handler too much, partly because it was very quick and easy (you’ll see why) and partly because you may or may not be sending your logs to SumoLogic like we are. But you’ll see what it consists of. After that we’ll take a look at how to subscribe a log group to a Lambda function. We’ll use a CLI approach, which isn’t exactly automatic but does give you a good feel for what’s actually going on.

Then we’ll turn to log-subscriber and look at how that function is configured, and at some of the code. I won’t go over everything, but I’ll show you enough and give you enough pointers that you’ll be able to pick up the threads if you wish to.

Two notes before we get going:

First, you’ll notice that the processes described here are rather heavy on permissions-related tasks. That comes with the territory (AWS): every entity needs permission to do virtually anything with, or to, any other entity. Getting permissions right has been a large part of the project, and worth trying to get a handle on from the start.

Second, keep in mind that sending logging events to SumoLogic is just what we do. It doesn’t have to be what you do with your centralized logging events. The main thing here is to get all your CloudWatch logs talking to a single listener–the log reflector–and to automate the process of hooking new log groups up to that listener. Your log reflector function can connect to SumoLogic or it can do something else. That part is up to you.

Now let’s dig in.

The handler for log-reflector

If you’re not planning to stream your logs to SumoLogic, you can skim or skip this section. Just remember, conceptually, that the log-reflector function is listening for logging events from the log groups that are subscribed to it. Each such event gets passed along to log-reflector in a well-defined JSON data structure. Whatever else your handler does, it needs to know how to pull this structure apart and do whatever it needs to do–SumoLogic or otherwise–with what it finds.

The handler for log-reflector in our project comes essentially verbatim from SumoLogic’s sumologic-aws-lambda Github repository–specifically, cloudwatchlogs_lambda.js. Put this code in a Lambda function, and you’ve got something that knows how to parse log messages that get forwarded to it, and how to forward them to SumoLogic in turn.

The handler uses the node4.3 Lambda runtime. Out of the box it encourages you to hard-code your SumoLogic endpoint in the code. We prefer to inject the endpoint into the handler via the SUMO_ENDPOINT environment variable. However you do it, the handler code correctly parses the incoming logging event from whatever log stream is talking to it, and it knows how to send it to Sumo.

Once you have your log-reflector function in place, doing what you want it to do (or perhaps for now just logging, to its own CloudWatch log, the fact that it has received an event), you’ll want to know how exactly to subscribe a log group to a Lambda function. Let’s look at that next.

Subscribing a log group to a Lambda function

Say you’ve got a Lambda function called update-student. Every time someone hits that function, you want the CloudWatch logging event to make its way to log-reflector.

In other words, you want to subscribe update-student‘s log group–which is probably called something like /aws/lambda/update-student–to your log-reflector function.

You can wire up the subscription of a log group to log-reflector in a number of ways:

  • in the Lambda console UI (hint: create a “trigger”)
  • in the CloudWatch logs console UI (hint: choose a log group and then “Stream to AWS Lambda” from the “Actions” dropdown)
  • via the AWS CLI (the aws utility)

We’ll look at the third of these options (but feel encouraged to look at the first two also). But first, you’ll need to set the right permissions on the log-reflector function. You need to tell the function that CloudWatch log groups are allowed to invoke it.

Make sure you’ve got your AWS credentials and defaults in place, and then issue this command (adjusting the argument values appropriately, of course):

aws lambda add-permission \
          --function-name 'log-reflector' \
          --statement-id 'something_unique' \
          --action 'lambda:InvokeFunction' \
          --principal 'logs.us-east-1.amazonaws.com' \
          --source-arn 'arn:aws:logs:us-east-1:123123123123:log-group:/aws/lambda*:*' \
          --source-account 123123123123

You’ll get back some JSON with details of what you just did. Now log-reflector can be invoked by your log groups. So it’s time to subscribe the /aws/lambda/update-student log group to the log reflector.

To create the actual subscription, you specific the log group you want listened to and the function you want to do the listening:

aws logs put-subscription-filter \
  --log-group-name '/aws/lambda/update-student' \
  --destination-arn 'arn:aws:lambda:us-east-1:123456789012:function:log-reflector' \
  --filter-name 'update-student-filter' \
  --filter-pattern ''

(Filter patterns offer you the opportunity to get fancy with what’s passed through. We’re not going to delve into them here.)

If all goes well there will be no screen output. But if you wait a minute or two and go to (or reload) your CloudWatch logs screen, you should see that /aws/lambda/update-student now has a subscription listed. And you can examine your handiwork with another CLI call:

aws logs describe-subscription-filters \
  --log-group-name '/aws/lambda/update-student'

which will give you back something like this:

   {
    "subscriptionFilters": [
        {
            "filterName": "update-student-filter", 
            "destinationArn": "arn:aws:lambda:us-west-2:123456789012:function:log-reflector", 
            "creationTime": 1470599015946,
            "filterPattern": "", 
            "logGroupName": "/aws/lambda/update-student"
        }
    ]
}

(Note that we get an array of filters. At present, no log group can have more than one subscription, so it will always be an array of one element or an empty array.)

You’ve created the subscription; now try it out! Using your favorite client (cURL, postman, etc.), hit the endpoint for your update-student function. Be patient… and in not too long you’ll see your request logged in a stream in the function’s log group. You should also see that your log reflector function got invoked; check log-reflector‘s log group for a recent entry.

So now you have a function in place that will receive logging events from subscribed log groups, and you’ve got a technique for subscribing log groups to that function. But you don’t want to go on doing this manually. So let’s look at how to automate the creating of subscriptions. Our goal will be to assign yet another Lambda function the task of listening for the creation of new log groups, and subscribing each such log group to the log reflector.

Automating subscriptions with log-subscriber

This time, we’ll look at the permissions-ish chores first, and then turn to the handler code.

The IAM role that’s in effect when log-subscriber is executed needs permission to create subscription filters on CloudWatch log groups. You can create or edit a policy containing this permission in the console UI (in IAM go to “Roles” and choose the relevant role), or you can do it through the CLI. We’ll do the latter here.

The process involves two steps: (1) creating the policy, and (2) attaching it to a role. Here’s how to create it:

aws iam create-policy \
  --policy-name 'let-me-do-something' \
  --policy-document '{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "logs:DescribeSubscriptionFilters",
                "logs:PutSubscriptionFilter",
                "logs:DeleteSubscriptionFilter"
            ],
            "Resource": "arn:aws:logs:us-east-1:123123123123:log-group:/aws/lambda*:*",
            "Effect": "Allow"
        }
    ]
}'

(You may not need the describe and delete operations yet, but adding them in makes it easier to add functionality to your code later.)

This command will come back at you with, among other things, the ARN of the new policy. You’ll need it for the next command: attaching the policy to the role. You’ll also need the role name, which you can get from the IAM console. Cut and paste as needed to issue the following command:

aws iam attach-role-policy \
  --policy-arn 'arn:aws:iam::123123123123:policy/let-me-do-something' \
  --role-name 'role-string-r-IamRoleLambda-A1B2C3D4E5F6'

Now that the lambda’s role is allowed to create subscription filters, it’s time to flesh out the log-subscriber function itself. We’re not going to go into all the possible detail of the process here, but the general idea is this: the handler for log-subscriber will be invoked whenever a new log group is created, and will go through the permission and subscription-creation steps we went through above–but automatically.

If you’re using Python you’ll probably want to use the boto3 library, which wraps the AWS API. You’ll need to research that further on your own, but here’s an example showing what our earlier aws add-permission command would look like translated into boto3:

import boto3

lamb = boto3.client('lambda')

lamb.add_permission(
           FunctionName='log-reflector',
           StatementId='something-unique',
           Action='lambda:InvokeFunction',
           Principal='logs.us-east-1.amazonaws.com',
           SourceArn='arn:aws:logs:us-east-1:123123123123:log-group:/aws/lambda*:*',
           SourceAccount='123123123123')

When the subscriber function gets invoked, it will be sent information about the log group whose creation triggered the invocation. In order to create the subscription, you’ll want to dig out the name of the log group and then do a boto3-style put_subscription_filter call:

def handler(event, context):
    log_group_name = event['detail']['requestParameters']['logGroupName']
    log_client = boto3.client('logs')
    log_client.put_subscription_filter(
         logGroupName=log_group_name,
         filterName='nice-name-for-filter',
         filterPattern='',
         destinationArn='arn:aws:lambda:us-west-2:123456789012:function:log-reflector'

(This is of course a “happy path” example. You’ll get an exception if the event your subscriber function receives doesn’t conform to the data structure it’s looking for–but you can always add error-handling of whatever kind you need.)

But how do you get a function like log-subscriber to be invoked upon log-group creation? To start with, by defining a rule. You can do so from the command line:

aws events put-rule \
  --name 'New_log_group_subscriber' \
  --region us-east-1 \
  --event-pattern '{
  "detail-type": [
    "AWS API Call via CloudTrail"
  ],
  "detail": {
    "eventSource": [
      "logs.amazonaws.com"
    ],
    "eventName": [
      "CreateLogGroup"
    ]
  }
}'

So now there’s a rule in force that says that the creation of a log group should invoke a function. But we haven’t told it which function! To do that we have to create a target for the rule–actually an array of targets, but we only need one. We’ll put the JSON describing the target in a shell variable and then use it inside a JSON array:

target='{
  "Arn": "arn:aws:lambda:us-west-2:123456789012:function:log-subscriber",
  "Id": "LogSubscriberTarget"
}'

aws events put-targets \
  --rule "New_log_group_subscriber" \
  --region us-east-1 \
  --targets "[$target]"

(Note that the rule parameter has to match the name we gave the rule in the call to put-rule, above.)

Let’s try out log-subscriber. Easy: we’ll create a brand-new log group, and then check whether or not it has been subscribed to log- reflector:

aws logs create-log-group \
  --log-group-name '/aws/lambda/testing-the-subscriber'

Now count to 200…. Not really, but you have to be patient: the creation of the new subscription may not be instantaneous. But in not too long you should, upon reloading, see that the new log group has appeared on your CloudWatch console, and has a subscription to your log reflector function. You can also use the technique we used to check on the success of the subscription creation we did earlier:

aws logs describe-subscription-filters \
  --log-group-name '/aws/lambda/testing-the-subscriber'

You should see something like this:

   {
    "subscriptionFilters": [
        {
            "filterName": "aws-lambda-testing-the-subscriber", 
            "destinationArn": "arn:aws:lambda:us-west-2:123456789012:function:log-reflector", 
            "creationTime": 1470615778768, 
            "filterPattern": "", 
            "logGroupName": "/aws/lambda/testing-the-subscriber"
        }
    ]
}

(Note that the filter name was generated automatically.)

This is the familiar array-of-one description of the subscriptions (really, the single subscription) of the log group to the log-reflector function.

So now we’re in a good place. Every time a new log group is created, that log group will be subscribed to the log reflector. And the log reflector can do whatever it needs to do with the stream of logging events coming at it from all the log groups that it’s listening to.

Conclusion(s) and thoughts

Development on the Log Reflector project has gone very smoothly, though there’s been a bit of a learning curve. AWS roles and permissions are famously complex, and it’s not always easy to get the hang of what’s going on. For example: you don’t grant a log group permission to execute a function; you grant the function permission to be executed by the log group. That kind of thing.

And there are a few glitches that haven’t sorted themselves out. One that I’m still puzzled by is the fact that if you use the CloudWatch console to tell a log group to stream to a Lambda function, the subscription of the log group to the function shows up in the Lambda console when you look at the function’s triggers. But if you do it programmatically, it doesn’t; there appear to be no triggers. Yet everything else works identically in the two cases, and I have not yet tracked down the reason for the anomaly.

AWS regions can be a pitfall. I have a slightly dinkier version of the Log Reflector on my personal Github and AWS accounts, as a learning tool. One morning, all my test scripts (delete and create subscriptions, add permissions, etc.) stopped working. I’m not entirely sure but I’m reasonably certain that it was because I’d had the AWS_DEFAULT_REGION environment variable set in the terminal that I’d used previously, but not in the one I was using that day. Once I set that variable, things largely fell into place and started working again. Old hat to seasoned AWS developers, no doubt, but something to keep an eye on.

My final thought is this: working this intensely on an AWS project has taught me, above all, that there is more to AWS than one is likely to be able to imagine. It’s gargantuan. And it’s correspondingly complex–but the complexity, from what I’ve seen, is packaged about as well as it could be. The documentation is very good, though I tend to add the word stackoverflow to my Google searches because otherwise all you get, typically, is the official docs and not the down-in-the-trenches problems and solutions.

So enjoy AWS, and happy reflecting!

Thanks to Nate Snyder for pairing on this project and for feedback on a draft of this post.