Chad Fowler

the passionate programmer, author, speaker, musician, technologist, CTO

My Best Conference Talks on Video

I’ve given many presentations over the years at conferences all over the world. These are links to video recordings of some of the best.

Michael Moritz on Being a World-Class Entrepreneur

Semil Shah did a nice job of pulling highlights from a recent interview with Sir Michael Moritz in Foreign Affairs. This one struck me as a particularily useful checklist for would-be entrepreneurs.

What makes a world-class entrepreneur?

Michael Moritz says:

  • Clarity of thought.
  • The ability to communicate clearly.
  • A great sense of mission.
  • A massive willingness to perservere.
  • A willingness to make painful decisions.
  • Extraordinary energy.

(I made this into a bulleted list.)

This list rings true to me. I have seen surpluses and deficits in each of these points in myself and others, and I’ve seen the impact of someone who matches this list well (or doesn’t but thinks they do).

Do You Give Energy or Take Energy?

As you interact with other people in the world, you either generate energy or you deplete it. In a team environment, there are people who always bring the team’s energy level up. When they are absent, you miss it. They somehow direct the flow of conversation and events from dead ends to forward motion. When you discuss a problem with them, you feel better afterward.

And, of course, there are those who have the opposite effect. Any debate becomes an exhausting fight. Negative events are followed by even more negative responses. Hard problems become drudgery. Optimism turns into cynicism. When you part ways, you feel tired.

Most of us play both of these roles, depending on our moods. Some of us fall almost constantly into one of the two buckets.

There is no reason we can’t all fall into the first category.

Which one are you? Do you give or take energy from the people around you?

The Curse of a Name: How to Kill a Good Idea

What do the following have in common?

  • Agile Software Development
  • Six Sigma
  • Behavior Driven Development
  • Software Craftsmanship
  • DevOps

Each of these represents a good idea that a group of well-meaning people tried (and succeeded) to spread into the world.

Each is generally poorly defined and poorly understood.

Each term has now lost its meaning.

Each term, with new watered-down, wrong-headed interpretations is being used constantly to create a false sense of security and justification for bad practices.

In each case, adoption of a good idea is being accidentally replaced by adoption of a name which represents that good idea. The term becomes a placeholder for good intention.

I can “do BDD” in the same way that I can sign up for a membership at a gym. I feel the sense of accomplishment without the need to actually get healthy.

Must-Read Software Development Books

I made this list for some developers at Wunderlist before I started working there, and I’ve been keeping it updated over time.

Leave a comment and tell me what you’d add or change?

On Being 40

Today I am 40 years old.

40 is a confusing age. Old enough to not be a kid anymore by any reasonable definition. Still young, inexperienced, and stupid enough to feel like one anyway. Maybe that last bit never goes away. Maybe it only intensifies.

One feature of my version of me at 40 is that I have gradually, through the years, weeded my habits of many of the meaningless societal norms I had programmed into me as a child. For example, Kelly and I don’t celebrate Valentine’s Day or religious holidays for faiths we don’t hold (so, really, any of them). And, birthdays are just arbitrary landmarks in an ever-accruing collection of minutes and seconds between birth and death. Christmas, Easter, Halloween, etc. are just days like any other, marked primarily by different music, food, and color schemes. “Bah, humbug” says this crusty, aging ruiner-of-annual-spirit!

For some reason people pay special attention to the birthdays and anniversaries that are evenly divisible by 10. Why 10? What’s so special about 10? Sure, it has some interesting mathematical properties, but we don’t tend to make much use of them when we measure time.

I can’t remember my 10th, 20th, or 30th birthdays. I don’t recall having done anything spectacular while I was of those ages. Nothing especially remarkable happened to me when I was 10, 20, or 30.

But, here I am at 40, somehow unable to shake the made-up significance of this particular instance of modulus 10 equaling zero.

Here are some facts about becoming 40:

  • The vast majority of people from my part of the world live to be at least 40. It’s only one day more remarkable than turning 39 + 364 days. And in fact one day less remarkable than turning 39 + 366 days (depending on leap years I guess).
  • At 40, I’m statistically probably over half way through with my life. While this might sound heavy, it’s actually already been true for a few years. I just didn’t write about it then, because my age wasn’t evenly divisible by 10.
  • My turning 40 is slightly more remarkable than my birth and my eventual death. Everybody is born and everybody dies. Not everybody turns 40.
  • At this age, I’m becoming aware of just how long and short 40 years is. Literally everything that has ever happened to me can fit into 40 years. If I do another 40, twice as much can happen in total, but…
  • Relative to the age of the earth or many other arbitrarily choosable lengths of time, 40 years is so insigificant I might as well not even exist.
  • 40 years ago today, my amazing, kind, wise mother must have been going through one of the most terrifying experiences of her life. She was 19, and I was a human being coming into the world that she was going to have to somehow take care of. I try to imagine myself doing this at 19. What an imaginary disaster! I hope I’ve been worth it. She sure did a great job, especially during those times when I was an unreasonable mess (specifically, when I was… 19!!!). Happy Mother’s Day, mom!

Now that I’m old, I’m supposed to be able to give young people advice. I don’t really have any, so instead I’ll give young readers a glimpse into what 40 is like (for me, anyway). Here goes.

Being 40 feels very much like being 30. Except your body hurts more. I say that even as someone who is in significantly better shape at 40 than I was at 30. It still literally hurts. Everyone likes to joke about this, so you think it’s just a joke like those idiotic jokes everyone likes to tell about being chained to their spouses and such. But it’s real. Sorry about that. Also, I can’t stay awake as long as I used to without getting seriously grumpy. I’m pretty sure I’m generally less fun than I used to be. That’s saying something, because I’ve probably never been the person people considered to be “fun” anyway.

But that’s all superficial. At 40 I feel a much greater sense of appreciation for all I have and all I have been able to do in my life. I won at marriage and had the gift of doing it at a very early age, giving me the ability to experience a large portion of my life with Kelly. I’ve made a remarkable career for myself. I’ve had the privilege to travel around the world, working, meeting people, speaking at conferences, and generally exceeding any expectation I’ve ever had for how things would turn out.

I feel a much stronger sense of urgency to remember to appreciate every minute of every day. I’m professionally much more accomplished than I was at 20 or 30. But I don’t feel the magnitude of that every day. Those things even themselves out on the hedonic treadmill.

I do, however, feel the magnitude of my increasing, deepening love for my wife and my family. I understand more acutely the rarity of true friendship and the scarcity of empathy. And I have learned to differentiate between proximity and truth when it comes to human relationships. A relationship doesn’t have to be constantly tended to to remain precious, but it must be revered.

Like turning 40, none of these realizations is remarkable. But, sometimes just feeling and expressing is enough. As I get older I realize not everything I do has to be an attempt to be remarkable.

I hope you have a good day today, even if–especially if–it’s not your birthday. Thanks for reading this.

The Magic of Strace

Early in my career, a co-worker and I were flown from Memphis to Orlando to try to help end a multi-day outage of our company’s corporate Lotus Domino server. The team in Orlando had been fire-fighting for days and had gotten nowhere. I’m not sure why they thought my co-worker and I could help. We didn’t know anything at all about Lotus Domino. But it ran on UNIX and we were pretty good with UNIX. I guess they were desperate.

Lotus Domino was a closed-source black box of a “groupware” server. I can’t remember the exact problem, but it had something to do with files not being properly accessible in its database, and apparently even the escalated support from Lotus was unable to solve the problem.

This was one of the most profoundly educational experiences of my career. It’s when I learned what may be for me the most important UNIX tool to date: strace.*

Nowadays I probably use strace or some equivalent almost every day I work as a programmer and/or systems engineer. In this post I’ll explain why and how and hopefully show you some tips on how to get the most out of this powerful little tool.

What is strace?

strace is a command line tool which lets you trace the system calls and signals received by a given process and its children. You can either use it to start a program or you can attach to an existing program to trace the calls the program makes to the system. Let’s look at a quick example. Say we have a simple C program like this:

It doesn’t do much. It just prints “Hi” to the screen”. After compiling the program, we can run it with strace to see all of the system calls the program makes:

To start the program, we invoke strace and pass the program name (and any parameters we want to pass to the program). In this case we also passed the “-s” flag to strace telling it the maximum string size we want it to print. This is helpful for showing expanded function arguments. Here we pass 2000, which is abitrarily “enough” to see everything we need to see in this program. The default is 32, which in my experience means we’ll almost definitely miss information we care about in the trace. We also use the “-f” flag, which tells strace to follow any children of the program we execute. In this case there are no children, but when using strace regularly it’s probably good to get into a habit of following a process’s children so you can see everything that happens when you execute the program.

After the invocation, we see a rather verbose log of every system call. To some (most?), this might look like gibberish. But, even if you have almost no idea what you’re looking at, you can quickly spot some useful pieces of information.

The first line of the trace shows a call to execve(). Unsurprisingly, execve()’s job is to run a program. It accepts the path to the program, any arguments as an array, and a list of environment variables to set for the program (which are ommitted from the output since they’d be so noisey).

The last two lines contain another recognizable sequence. First you see a call to write() with our C program’s string “hi\n”. The first argument to write() is the file descriptor to write to. In this case it’s “1”, which is the process’s standard output stream. After the write call (which looks garbled because the actual write to standard out showed up along with the strace output), the program calls exit_group(). This function acts like exit() but exits all threads in a process.

What are all these calls between execve() and write()? They take care of connecting all of the standard C libraries before the main program starts. They basically set up the runtime. If you look at them carefully you’ll see that they walk through a series of possible files, check to see if they are accessible, open them, map them into a memory location, and close them.

An important hint: every one of these functions is documented and available to read about using man pages. If you don’t know what mmap() does, just type “man mmap” and find out.

Try running through every function in this strace output and learn what each line does. It’s easier than it looks!

Tracing a running, real-world process

Most of the time when I need a tool like strace, it’s because there’s a process that’s already running but not acting right. Under normal circumstances these processes can be extremely hard to troubleshoot since they tend to be started by the init system, so you can only use logs and other external indicators to inspect them.

strace makes it easy to look at the inner workings of an already running process. Let’s say we have a problem with Ruby unicorn processes crashing in production and we’re unable to see anything useful in the process’s logs. We can connect to the process using strace’s “-p” flag. Since a real production web service is likely to generate a lot of activity, we’ll also use the “-o” flag to have strace send its output to a log file:

After only a few seconds, this log file has about 9000 lines. Here’s a portion of it that contains some potentially interesting snippets:

We won’t go through every line, but let’s look at the first one. It’s a call to select(). Programs use select() to monitor file descriptors on a system for activity. According to the select() man page, its second argument is a list of file descriptors to watch for read activity. select() will block for a configurable period of time waiting for activity on the supplied file descriptors, after which it returns with the number of descriptors on which activity took place.

Ignoring the other parameters for now, we can see that this call to select() watches file descriptors 14 and 15 for read activity and (on the far right of the trace output) returns with the value “1”. strace adds extra info to some system calls, so we can see not only the raw result here but also which file descriptor had activity (14). Sometimes we just want the raw return value of a system call. To get this, you can pass an extra flag to strace: “-e raw=select”. This tells strace to show raw data for calls to select().

So, what are file descriptors 14 and 15? Without that info this isn’t very useful. Let’s use the lsof command to find out:

In the 4th column of lsof’s output, labeled “FD”, we can see the file descriptor numbers for the program. Aha! 14 and 15 are the TCP and UNIX socket ports that Unicorn is listening on. It makes sense, then, that the process would be using select() to wait for activity. And in this case we got some traffic on the UNIX socket (file descriptor 14).

Next, we see a series of calls which try to accept the incoming connection on the UNIX socket but return with EAGAIN. This normal behavior in a multi-processing network server. The process goes back and waits for more incoming data and tries again.

Finally, a call to accept4() returns a file descriptor (13) with no error. It’s time to process a request! First the process checks info on the file descriptor using fstat(). The second argument to fstat() is a “stat” struct which the function fills with data. Here you can see its mode (S_IFSOCK) and the size (which shows 0 since this isn’t a regular file). After presumably seeing that all is as expected with the socket’s file descriptor, the process receives data from the socket using recvfrom().

Here’s where things get interesting

Like fstat(), recvfrom()’s first parameter is the file descriptor to receive data from and its second is a buffer to fill with that data. Here’s where things get really interesting when you’re trying to debug a problem: You can see the full HTTP request that has been sent to this web server process! Here it is, expanded for readability:

This can be extremely helpful when you’re trying to troubleshoot a process you don’t have much control over. The return value of the recvfrom() call indicates the number of bytes received by the call (167). Now it’s time to respond.

The process first uses ppoll to ask the system to tell it when it would be OK to write to this socket. ppoll() takes a list of file descriptors and events to poll for. In this case the process has asked to be notified when writing to the socket would not block (POLLOUT). After being notified, it writes the beginning of the HTTP response header using write().

Next we can see the Unicorn process’s internal routing behavior at work. It uses stat() to see if there is a physical file on the file system which it could serve at the requested address. stat() returns ENOENT, indicating no file at that path, so the process continues executing code. This is the typical behavior for static file-based caching on Rails systems. First look for a static file that would satisfy the request, then move on to executing code.

As a final glimpse into the workings of this particular process, we see a SQL query being sent to file descriptor 16. Reviewing our lsof output above, we can see that file descriptor 16 is a TCP connection to another host on that host’s postgresql port (this number to name mapping is configured in /etc/services in case you’re curious). The process uses sendto() to send the formatted SQL query to the postgresql server. The third argument is the message’s size and the fourth is a flag–MSG_NOSIGNAL in this case–which tells the operating system not to interrupt it with a SIGPIPE signal if there is a problem with the remote connection.

The process then uses the poll() function to wait for either read or error activity on the socket and, on receiving read activity, receives the postgresql server’s response using recvfrom().

We skipped over a few details here, but hopefully you can see how with a combination of strace, lsof, and system man pages it’s possible to understand the details of what a program is doing under the covers.

What’s “normal”?

Sometimes we just want to get an overview of what a process is doing. I first remember having this need when troubleshooting a broken off the shelf supply chain management application written in “Enterprise Java” in the late 90s. It worked for some time and then suddenly under some unknown circumstance at a specific time of day it would start refusing connections. We had no source code, and we suffered from the typical level of quality you get from an enterprise support contract (i.e. we solved all problems ourselves). So I decided to try to compare “normal” behavior with the behavior when things went wrong.

I did this by regularly sampling system call activity for the processes and then compared those normal samples with the activity when the process was in trouble. I don’t remember the exact outcome that time, but it’s a trick I’ve used ever since. Until recently I would always write scripts to run strace, capture the output, and parse it into an aggregate. Then I discovered that strace can do this for me.

Let’s take a look at a unicorn process again:

Here we use the “-c” flag to tell strace to count the system calls it observes. When running strace with the “-c” flag, we have to let it run for the desired amount of time and then interrupt the process (ctrl-c) to see the accumulated data. The output is pretty self-explanatory.

I’m currently writing at 7am about a system that is used during working hours, so the unicorn process I’m tracing is mostly inactive. If you read through the previous trace, you shouldn’t be surprised by the data. Unicorn spent most of its time in select(). We know now that it uses select() to wait for incoming connections. So, this process spent almost all of its time waiting for someone to ask it to do something. Makes sense.

We can also see that accept4() returned a relatively high number of errors. As we saw in the above example, accept4() will routinely receive the EAGAIN error and go back into the select() call to wait for another connection.

The resulting function list is also a nice to-to list you could use to brush up on your C system calls. Add them to your to-do list and go through and read about one per day until you understand each of them. If you do this, the next time you’re tracing a Unicorn process under duress in production you’ll be much more fluent in its language of system calls.

Finding out what’s slow

We’ll wrap up by looking at how strace can help us determine the cause of performance problems in our systems. We recently used this at work to uncover one of those really gremlin-like problems where random, seemingly unrelated components of our distributed system degraded in performance and we had no idea why. One of our databases slowed down. Some of our web requests slowed down. And finally, using sudo got slow.

That was what finally gave us a clue. We used strace to trace the execution of sudo and to time each system call sudo made. We finally discovered that the slowness was in a log statement! We had apparently misconfigured a syslog server without realizing it and all of the processes which were talking to that server had mysteriously slowed down.

Let’s take a look at a simple example of using strace to track down performance problems in processes. Imagine we had a program with the following C source and we wanted to figure out why it takes over 2 seconds to execute:

Glancing through the code, there’s nothing obvious that sticks out (heh), so we’ll turn to strace for the answer. We use strace’s “-T” flag to ask it to time the execution of each system call during tracing. Here’s the output:

As you can see, strace has included a final column with the execution time (in seconds) for each traced call. Following the trace line by line, we see that almost every call was incredibly fast until we finally reach a call that took more than 2 seconds! Mystery solved. Apparently something in our programming is calling nanosleep() with an argument of 2 seconds.

In a real-world trace with a lot more output we could save the data to a file, sort through it on the last column, and track down the offending function calls.

There’s more!

strace is an amazingly rich tool. I’ve only covered a few of the options that I use most frequently, but it can do a lot more. Check out the strace man page to browse some of the other ways you can manipulate its behavior. Especially be sure to look at the uses of “-e”, which accepts expressions allowing you to filter traces or change how traces are done.

If you read through this far and you didn’t know what all these function calls meant, don’t be alarmed. Neither did I at some point. I learned most of this stuff by tracing broken programs and reading man pages. If you program in a UNIX environment, I encourage you to pick a program you care about and strace its execution. Learn what’s happening at the system level. Read the man pages. Explore. It’s fun and you’ll understand your own work even better than before.

UPDATE: And the fate of Lotus Domino server?

Apparently this article appeared to be a dramatic setup for the conclusion of the story of the Domino server. People on other sites, twitter, and in person all asked “OMG What happened to the Domino server?!?!?!” So I’ll wrap up by telling the rest of the story.

Somehow, by tracing system calls, we fixed the Domino server. I’m afraid I don’t remember exactly how, but you can rest assured it was extremely dramatic and a HUGE surprise to everyone involved.

Then my co-worker and I went out into the parking lot and got into our rented, convertible, red Ford Mustang, cranked up the volume on the only CD we brought with us to listen to (Guns & Roses’ Appetite for Destruction), drove safely to the airport, and forgot to take the CD out of the car. Only after returning otherwise uneventfully to Memphis (via Atlanta) did we realize we had left it in the car.

I hope whoever rented that car next enjoyed the hair metal as much as we did.

  • Actually I’m simplifying a bit. We were on Solaris, and the equivalent tool there was truss. It’s basically the same. You get the point.

Killing the Crunch Mode Antipattern

In the software industry, especially the startup world, Crunch Mode is a ubiquitous, unhealthy antipattern. Crunch Mode refers to periods of overtime work brought on by the need to meet a project deadline. Developers stereotypically glorify the ability and propensity to stay up all night grinding through a difficult problem. It’s part of our folklore. It’s part of how we’re measured. It’s something companies and leaders take advantage of in order to accomplish more with less.

And it’s stupid.

If you want a “knowledge worker” to be as ineffective and produce the lowest level of quality possible, deprive them of their sleep and hold them to an unrealistic deadline. In other words, activate Crunch Mode.

Why Not to Crunch?

  • It makes us stupid. The more I work, the less relevant my years of experience become. I constantly make rookie mistakes. I break things in production. I leave messes behind. I waste hours going down the wrong train of thought.
  • It burns people out, sometimes permanently. They burn up their passion that takes down time to replenish. Unless the non-Crunch work is sufficiently energizing (and frequent), enough crunching can cause your best people to leave.
  • It makes people lazy and less productive. This may seem ironic, but when someone puts in heroic levels of effort, they start to place less value on each minute. I know that if I work all night, then an hour brain-break mid-day sounds very reasonable. The problem is that these breaks become a habit that can persist between Crunch times.
  • It’s a risky way to make your commitments. Crunch Mode means you are using your team beyond capacity. That’s like trying to drive 50km on 40km of gas. It might be OK, but if you do it all the time you’re going to end up broken down on the side of the road waiting for help at some point. Maybe more often than not.
  • Accountability is lost. When someone is working all hours, they can’t be blamed for mistakes. They can’t be blamed for coming in late, forgetting an email, introducing bugs, not writing tests, cutting technical corners, and doing all sorts of things that don’t describe how you want people on your team behaving.
  • It puts the credibility of management in question every time. Because, managers, believe it or not, every single time it happens, the entire team asks themselves, “But why?”
  • It shows a team that the leader cares about meeting a business goal more than he or she cares about their health. This may sound harsh but it is literally true.

The more you have to use your brain, the less effective and healthy Crunch Mode is. In fields that require less creativity and thought, it might even really work as a (ruthless) management technique. In software development, it just doesn’t.

Why do we do it?

The number one reason teams go into Crunch Mode is that their leaders have failed to understand and/or set realistic expectations for the time it takes to complete a project. In worst cases, the deadlines are arbitrarily set by management and not tied to any specific business need. In other cases, the deadlines are inflexible, but the scope can and should be adjusted to a realistic level. Sure, it may be that the team committed to those incorrect deadlines, but it’s up to the ones deciding on the deadlines to verify that they’re realistic before making a commitment.

Fear and the resulting breakdown of communication also drive us into Crunch Mode. “Can you get this done by ?” “Uh…yes?” Developers fear saying “no”. Managers fear looking bad by committing to what seem like far off dates. Managers fear setting far off deadlines, because developers miss dates more often than not. “If we pad the estimates are we going to miss those by 20% too?”

Another reason we go into Crunch Mode is that we are perpetuating a culture of cowboy heroism which many of us unwittingly get caught up in. The feeling of finishing tons of work in a short period and depriving oneself of quality personal time can be addicting, especially when it results in “saving the day” for a project. Rolling up your sleeves and cranking to the end of a deadline makes you feel valuable in a very concrete way. Without your overtime, the project doesn’t get done on time. With it, the project is saved. It’s hard to find such black and white ways to add value in daily “normal” work.

Maybe the most addictive feature of Crunch Mode is it’s the easiest way to see a team really click. At the beginning of Crunch Mode, people get intensely focused. Communication is streamlined. The big important stuff gets tackled quickly and finished. A team can initially raise its skill level a notch with the focus alone. It feels great as both a manager and a team member to work that efficiently and effectively. Unfortunately it’s difficult (not impossible) to work this way all the time, so we’re tempted to activate Crunch Mode on occasion just to feel this way again.

Alternatives to Crunch-Mode

  • Miss the deadline. Ya, that’s right. Let your customers down this time. Make less money. Incur opportunity cost. Just fail. You already failed to manage your team and your time. Maybe you should let that have more visible consequences?
  • Set smaller goals. When you set a massive goal, way off in the future, it’s impossible to estimate whether it’s actually realistic. However, if you set a goal for this afternoon, you’re probably going to be pretty accurate with your estimates.
  • Measure progress concretely and in small steps. Never trust a status report, even from yourself. In software, the only deliverable that matters is one that you can execute.
  • Set more realistic goals for the team and problems you face. If you’re continually having to slip into Crunch Mode, you clearly don’t understand your capabilities. Admit that you’re going to go slower than you expected and adjust for it.

As unhealthy, counterproductive, and just plain stupid as Crunch Mode is, sometimes you just have to do it. We all accept that. Crunch Mode is the nuclear option. A leader needs to have it available as a tool, but each time he or she wields this tool, he or she pays in long term credibility and trust.

Can we stop it?

It’s time to finally stop this insanity. Think of the time, money, energy, and potential happiness wasted on poor planning, communication, and leadership.

Managers, hold yourself accountable for Crunch Mode when it happens. See it as a personal failure.

Everyone else, hold yourself accountable for every non-crunch minute you work. Make them count. Overcommunicate. Focus.

On Having Something to Say

The first time I was invited to give a keynote speech, I thought “Why would they want me? I have nothing to say.” The second time was a few months later. Same thing. What should I talk about? What’s worth listening to?

Previously I had been asked by my favorite publisher to write a book. But, what should it be about? I don’t really have much interesting to say. I’m just learning, after all.

Now it’s about ten years later. I’ve written and contributed to several books. I don’t know how many conferences I’ve spoken at anymore or even how many countries I’ve been invited to. Still, when I receive a request to speak or am asked my thoughts on a topic by a stranger, I think to myself, “But, why? What do I have to say?” I’m still learning, after all.

I started writing my second book, Rails Recipes, roughly a month after I started actually using Rails full time at work. Compared to a lot of other people I knew, I was unqualified to write it. I was going to have to learn a lot in order to write anything at all. I was even going to have to learn what I needed to learn!

The only thing I had to say about Rails at the time is “I don’t know what to say about Rails”.

By the time I finished the book I had written many small Rails applications and worked on one large one extensively. I had also read almost every line of code in the book and made some fairly decent contributions to the framework. By the end of the book project, I was an expert. I had a lot to say about Rails.

But I think one of the reasons the book resonated so well with new Rails developers is that it was written from the fresh perspective of someone still learning.

I saw Martin Fowler give a lecture in Bangalore 11 years ago. He said something that has stuck with me ever since: “Whenever I want to really learn about something, I write a book about it.” To most people, like me, this is counter-intuitive, but my own history has shown me that it makes a lot of sense.

So, these days, while casting about for something worth saying, I just ask myself, “What do I wish I had something to say about?” and I explore that topic aloud. Sometimes it falls flat, and sometimes it results in content that helps me and other people get through the world a little easier.

Don’t be afraid of not knowing enough or not being experienced enough to help people with your words. The worst case is you learn something in the process. The best case is you create something remarkable.

Rule One of Management: First, Do No Harm

Primum non nocere, or “first, do no harm” is a universal principle among healthcare professionals worldwide. It essentially means this: given a (bad) situation, your first priority is to not make it worse through your actions. Doctors hold a position of power over their patients. Most patients are unqualified to diagnose and treat themselves, so they must trust the education, experience, and skill of the doctor. A doctor can significantly (even terminally) affect a patient’s life.

I propose “first, do no harm” as a principle to apply in every situation where the do-er has a position of power over those his or her actions affect. Like managers.

Managers hold a position of power in an organization. A manager having a bad day can make a decision that changes the course of years of multiple people’s lives. A managerial decision can destroy morale, burn people out, or even ruin another person’s career.

Couple this power with the typical human need to do something, and we have a problem on our hands. As people, we grow anxious when we’re in a position in which we don’t feel we’re actively doing anything. Especially for new managers, this can drive a person crazy. You’re used to being the producer who gets things done, and suddenly you’re in meetings and you aren’t the one producing anymore. How do you measure your productivity?

In the worst case, you base it on something like the number of decisions you make or the number of processes you change.

Try to remember the last time you were in a job and things got screwed up because a manager was out of the office. Now try to remember the last time things were screwed up because of an offhand comment, a sarcastic remark, a poorly timed email, or a decision some manager made.

At a tactical level, the risks of managerial action far outweigh the risks of inaction.

Therefore, managers–especially you first-time new managers, please don’t feel pressured to just do something. Be reasoned and careful when you encounter a new problem.

Rule number one is: First, do no harm.