A few years ago I was drowning in administrative work.
To deliver high-quality online courses we were patching together several different tools to create a good student experience. We used Teachable as our course platform, Slack for our community, Zoom for our live sessions, Google Calendar to send out course invites, Miro for collaboration, and Mailchimp to send out course emails.
This mostly worked, but it required a lot of administrative work to keep our systems in sync.
When a student wanted to change their email address, for example, we had to update their email address in Teachable, update the calendar event, unsubscribe their old email and subscribe their new email to the course emails in Mailchimp, revoke their Slack access through their old email address, and re-invite their new email address.
An email address change was just one use case. There were dozens more that required that we make changes in all of our systems.
I had an admin helping me with this. We used detailed checklists on Trello cards to manage each use case. When a student requested a change, my admin copied the relevant card template and followed the checklist.
It worked for the most part. But there were a lot of mistakes.
For years, I harangued my admin for these mistakes. But then, one day, I found myself reading Ask Your Developer by Jeff Lawson and I realized I was the one making the mistake.
Don’t Ask Humans to Do Tasks That Should Be Automated
If you aren’t familiar with Ask Your Developer, it’s a great read. It’s written to help non-engineers understand the value of software engineering. It’s chock-full of stories about how the author solved business problems by writing code.
This book was exactly what I needed. It helped me see my mistake as clear as day. I was having humans (my admin) complete tasks that a computer was much better suited to do.
Even though it had been ten years since I had written a line of code, I decided to dive back in and I started automating my business logic. It wasn’t easy.
At the heart of all of my administrative troubles was the need to move data in between disparate systems and to make sure that all of these systems were in sync. To make this work, I had to do two things: 1) I had to establish a primary system of record—my single source of truth for student data and 2) I had to find and learn how to use the APIs for each of the tools that I was using.
Platform Teams and API Teams Need to Do Discovery
As you’ll see, I ran into several challenges along the way. These challenges made it abundantly clear to me that API teams need to be doing far more discovery than they are clearly doing.
I ran into usability issues on a daily basis. Some of these challenges were due to me being a beginner. I had to learn the conventions of a REST API. I was new to OAuth. I had to learn to wrap my API calls in try-catch clauses, so that I could see the errors that were being returned. I was new to rate limits. But everyone is a beginner at some point.
API teams need to think about onboarding just like everyone else. Some companies had getting started guides that explained these basics. Others assumed their users were all experienced developers.
Some of the challenges that I encountered had nothing to do with my experience level and were simply inexcusable. I ran into documentation that was incorrect. Error codes were incomprehensible. Critical endpoints were missing. I found countless examples where PUT behaved like PATCH and vice versa. To this day, three years in, there are still challenges that I haven’t figured out how to overcome. Searching Stack Overflow shows that I’m not the only one.
This got me curious. Are there API teams who are doing good discovery? Do they think about onboarding? Do they pair program with their customers to understand where there are gaps in their documentation or their endpoint coverage? Do they usability test their API?
So I posted on LinkedIn and got some great responses. That led to this five-part series. If you missed the first two parts you can find them here:
- Understanding Web APIs: What They Are and How They Work
- Common Usability Issues with Web APIs: And How Discovery Can Help
Today, I’ll be sharing my experiences with a variety of APIs. My goal is to showcase the need for better discovery. Next month, we’ll share two more stories from API teams who are putting the discovery habits into practice.
My goal with this series is to encourage more API teams to embrace discovery. If you work on an API team and are already putting the discovery habits into practice, please reach out. I’d love to share your story. If you know of an API team who could be doing more discovery, please share this article with them.
Okay, let’s get back to my story.
Establishing a Single Source of Truth: Good Data Management
One of the primary issues we ran into with my admin manually making these updates was mistakes were made. Systems would get out of date. We’d discover that we had two different email addresses for a student—one in Teachable and one in Slack—and we wouldn’t know which one was current.
I knew we needed a single source of truth. At first, I thought I would use Teachable as our system of record. After all, this was where students purchased and enrolled in courses and where they accessed their course content. It was our primary tool.
So I started by searching for a Teachable API and found some documentation. Great. Since I was new to REST APIs, I spent several hours reading through the document. I explored the different endpoints and started to piece together my plan for how I was going to get data in and out of Teachable.
There was one problem. I couldn’t figure out how to access the API. There were no details about access keys, OAuth, or any other form of authentication.
I reached out to their VP of Product (who I was connected to on Twitter) and was told there was no API. That documentation was created by an intern who was looking into whether Teachable should offer an API. Ugh, that was a day wasted.
Even though Teachable was our primary tool, with no API, it couldn’t act as our system of record. I also knew that if I was going to solve my administrative problem, I would eventually have to move to a course platform that had a robust API.
So I decided to use Airtable as my system of record. Why Airtable? Because they had a robust API that was well documented.
I exported all of my student data out of Teachable and imported it into Airtable. I did this manually. But it was pretty straightforward. Teachable allowed me to export to a CSV file. I rearranged the columns to match my data model in Airtable and then simply imported the CSV file.
Next, I had to learn how to programmatically access data in each of my tools via their respective APIs.
Getting Started with Different APIs
Learning from my Teachable experience, I realized that when exploring a new API, the first thing I should do is learn how to access the API. This sounds obvious, but to start I was more concerned with what the API could do for me.
But now that I had a little experience under my belt, I developed the following process for learning about a new API:
- Figure out how to authenticate
- Understand what endpoints are available and if they meet my needs
- Dig into the details of specific requests and responses
- Understand error codes and the best way to do error handling
- Understand rate limits and what I would need to do to stick to them
Some APIs made it easy to figure out each of these steps. Others, not so much.
Authentication: How Do I Access This API?
The first API I started to play with was Airtable’s. This was fortuitous because their authentication mechanism is extremely simple. They support both personal access tokens and OAuth access tokens.
Since I only needed to access my own data, I generated a personal access token and passed it in the headers with each of my API requests. Easy peasy—even for a beginner.
Next, I started investigating the Slack API. This was a little more challenging. The Slack API assumes you are using the API to build an app that will run inside Slack. So the way to get authentication credentials is by creating a Slack app.
This is all well and good if you are, in fact, creating a Slack app. But that wasn’t my goal. I just wanted to invite and deactivate people from my Slack community programmatically. No app was needed.
But the only way that I could find to create credentials to access the API was to create an app. So I did end up creating a dummy app that does nothing, just so that I could get access credentials.
This made the initial setup confusing, but once I got my API credentials, the Slack API worked just like Airtable. I sent my credentials in each API request via headers.
Accessing the Google Calendar API stymied me (and still does to this day). Their documentation is 100% focused on creating an app for Google Workspaces and only discusses OAuth as the authentication mechanism. But that doesn’t work for my purposes.
After some digging, I found that Google does support server-to-server access via a Google Cloud project. But I wasn’t able to successfully set this up. I don’t remember exactly what went wrong. But I do remember wasting two days on this and making no progress.
I did, however, find a workaround. Google Calendar has a Zapier integration and I realized my code can trigger a Zap through their webhook trigger. So I was able to programmatically create events and add and remove guests by triggering Zaps. That’s what I still use today (although you’ll see later this strategy also comes with limitations).
In August 2023, I moved from Teachable to LearnWorlds as my new course platform. No API access was a dealbreaker for me. Accessing the LearnWorlds API was reasonably easy. Like Airtable, it was token-based. Unlike Airtable, however, the tokens expired.
I didn’t know how to manage this. Should I store the token in my database? Remember, I use Airtable as my system of record. Was this secure? How do I keep track of when to refresh it? I didn’t want to deal with all of these issues, so I started by simply requesting a new API token at the start of any task that required LearnWorlds data.
I was surprised by how many different authentication mechanisms I encountered. While OAuth is a standard for third-party authentication, there doesn’t seem to be a standard for server-to-server authentication.
This created a lot of friction in the onboarding process for me. You can’t do anything with an API before you authenticate. This was often much harder than it needed to be and is an area that could use much more discovery.
Understanding Endpoints: Does This API Do What I Need It to Do?
Once I finally got access to each API, I had to assess whether or not they offered the functionality that I needed.
Just like no product can serve every need for every customer, no API can serve every need for every developer. But I was surprised by some of the gaps that I found.
Let’s start with the most egregious example.
Teachable: The Only Way to Access a User Is Through Their Proprietary ID
In 2022, Teachable finally released an API. I was ecstatic—right up until I started exploring the documentation.
Let’s start with the Users endpoint. They support a GET request that returns all users through a series of pagination requests (max 20 results in each request). They also support requesting a specific user via Teachable’s user ID. But they don’t support looking up a user ID by email or any other means.
At the time that Teachable released their API, I had 8,000 students. Up until this point, I had never been exposed to nor needed access to Teachable’s user IDs for my students. But now, in order to fetch any details about a user via their API, I needed to know the student’s user ID.
When I raised this concern with their product team, I was told that I could fetch all of my users (via 400 API requests) and store each of my students’ Teachable user IDs in my own database, so that in the future, when I needed to look up a student via the Teachable API, I could first retrieve the Teachable user ID from my own database.
On top of this, I also needed a way to fetch students’ Teachable user IDs when new students signed up for my school. I considered setting up a Zap that triggered based on a new school sign-up that would then update Airtable with the new student’s Teachable user ID.
This was a lot of work that all could have been avoided if they simply provided the option to look up a user by email address.
Thankfully, I didn’t bother implementing any of this, because I found other shortcomings that made it clear this API wasn’t well thought out.
Teachable: Can’t Update Basic Student Details
Remember, one of my primary goals was to keep student data in sync across my systems. The primary pieces of data that I would need to update based on changes in other systems were things like student name, student email, and student enrollments.
The Users endpoint did support a PATCH request (as long as I had the student’s user ID), but it only allowed me to update the student’s name and src. At launch, the only description for the src field was string. I had no idea what this was. But it was clear this endpoint wasn’t adequate. There was no way to update a student’s email address.
Again, I emailed the team at Teachable. I suggested it might be nice to update a student’s email address via the API. I also asked what the src field was. I got a generic response where they thanked me for my feedback. They did not tell me what the src field was.
Three years later, it looks like they did finally update their documentation. The description on the src field is now “The signup source of the user, which is displayed on the Information tab of the user profile.” Even though I used Teachable for years, I have no idea why anyone would want to programmatically update this field, given that it’s a field that Teachable populates based on the referrer when someone signs up from your school.
The Teachable API is an example of how Teachable builds features. They release the absolute minimum required to say they offer specific functionality. It doesn’t actually work and they never iterate on it. I am glad I moved off Teachable and have never looked back.
LearnWorlds: A Great Example of Endpoint Coverage
I migrated my course business from Teachable to LearnWorlds. One of my primary evaluation criteria for the new course platform was a fully featured API and LearnWorlds offers that. Their API supports almost everything that I could reasonably expect.
It’s also clear that their API wasn’t a one-off feature. Every time they release new functionality in the main product, they quickly support that same functionality through the API. When it comes to endpoint coverage, I have no complaints about the LearnWorlds API.
Google Calendar’s Zap Integration: Why Can’t I Remove a Guest?
There’s one last example I want to share before we move to the next section. You might recall that I was never able to successfully access the Google Calendar API and so I used a series of Zaps to work around it.
This works, but with one major caveat. The Google Calendar Zapier integration is incomplete. I can create new events. I can add guests to new events. But I can’t remove guests from events. This functionality is simply missing.
When I searched the web trying to find an alternative, I ran across this question and answer in the Zapier community forums where an engineer describes a convoluted hack to get around the problem. This is a heinous solution to a critical gap in API coverage. It is, however, what I use today when I have to remove a student from a course event.
Just as we have to think through who we might serve with our regular products and what opportunities we’ll address, API teams need to do the same. When they don’t, they miss out on offering critical functionality that prevents their customers from getting value out of their products.
Sending Requests and Receiving Responses: How Does This Actually Work?
It’s one thing to determine that an API has the functionality that you need. It’s a whole other thing to get to the point where it actually works.
Even on the best designed and most well-documented APIs, I still ran into issues sending and receiving data via API calls. Let’s look at some examples.
There are two components in this section: 1) sending data that the API can comprehend and 2) getting back a meaningful response.
Slack: Some Endpoints Accept JSON Bodies. Others Require Form-URL-Encoded Data
Let’s start with Slack. Most of Slack’s API endpoints take a JSON body in the request. This is pretty standard, but that hasn’t always been the case. Some of their endpoints only take query parameter requests.
In Slack’s defense, this is well-documented on each of their endpoints. But it still tripped me up. And when I sent a poorly formatted request, I didn’t always get a comprehensible error message to help me identify the problem.
Slack, unlike Teachable, does support a lookup user by email endpoint. But this endpoint only accepts form-url encoded data via the query string. I definitely remember getting tripped up by weird characters in email addresses and how to properly send this request. But I eventually figured it out. This was a minor annoyance and overall Slack’s API is extremely well documented.
Slack: Restricting Access to Specific Endpoints Based on Plan Tier
My primary gripe with the Slack API is the result of their business logic. Slack has decided to put user invites and deactivations behind the paywall. So there’s no way to programmatically invite or deactivate a Slack user unless you are on their Business+ plan.
Now don’t get me wrong. I’m all for paying for products. But this is simply not feasible for large communities. I would have to pay $12.50 per user per month just to get access to this API endpoint. And that’s not per developer that uses the API—that’s per community member. We have over a thousand active community members each month.
As a result, we still have to manually invite and deactivate all of our Slack accounts.
I’ll give Slack a pass on this one, simply because I know that large, open communities are not Slack’s target customer segment. But as a member of this customer segment, it is quite disappointing.
Airtable: Updating Linked Records Is Inconsistent and Confusing
Airtable is one of my favorite APIs to work with. But that doesn’t mean it’s perfect. One of the biggest areas of confusion for me when it comes to sending requests is around updating linked record fields. A linked record field allows a record in one table to link to another record in another table in the same Base.
Let’s get more specific. In my Airtable Student Base, I have a Student table and a Cohorts table. A student record has a Cohort field that links to all of the cohorts that student is enrolled in. Those cohorts live in the Cohorts table. The primary field on my Cohorts table is a Cohort ID. So in the Airtable graphical interface, the Cohort field on the Student record looks like an array of Cohort IDs.
But behind the scenes, that’s not right. It’s actually an array of Airtable record IDs. Now when I first started using the Airtable API, I had no idea what an Airtable record ID was. This isn’t exposed in the graphical interface at all. But it’s a unique ID that Airtable assigns any record in any Base.
So when I want to update a student’s cohort list, it looks to me like I can simply send an array of cohort IDs. But that’s not right. I have to send an array of record IDs.
If that was the whole problem, it would be fine. I’d learn it once and remember to always send record IDs. But there are instances where if I’m just adding one element to a linked record field, the record ID won’t work. I actually need to use my primary key (in this case the cohort ID).
Even three years into using the Airtable API, I’m not really clear on the rules as to when to use the record ID vs. the primary key when accessing or updating a linked record field.
LearnWorlds: The Course Endpoint Has Hidden Logic That Was Undocumented
While I’m a big fan of the LearnWorlds API, their Course endpoint created a lot of havoc when I was first learning their API. A course in LearnWorlds has the following attributes: title, description, price, dripFeed, access, categories, label. Each of these attributes have adequate descriptions in the documentation and were familiar to me from my experience using their main product.
What wasn’t documented was the business rules that determined the dependencies between these attributes. LearnWorlds offers a PUT action, but not a PATCH action for their course endpoint. So my expectation was that whatever data I sent would be destructive—it would overwrite all of the existing fields.
That means if I sent the following:
{
"access": "enrollment_closed"
}
I would expect the course title, description, price, dripFeed, categories, and label to be wiped out. But that’s not what happened.
Now I know a lot of APIs confuse PUT and PATCH, so if all that happened was the access field was updated and everything else stayed the same, I would have been annoyed, but I could have lived with it.
But what happened instead, simply confused me. Sending:
{
"access": "enrollment_closed"
}
Had the following impact:
- Title, description, dripFeed, and label kept their original values.
- Access was updated as expected.
- Price changed from 799 to 0.
- Categories were wiped out.
Okay, this is just weird. What’s going on here? I emailed support and they told me that the PUT request on the course endpoint doesn’t require all of the fields. You only need to send the fields that you want to update. In other words, it’s really a PATCH request. Okay, got it.
But that doesn’t explain why price and categories were wiped out. So I sent a series of curl requests with their corresponding responses to the support team illustrating what I was seeing.
In the process of generating these examples, I tried to solve my own problem by simply sending the original price and categories along with access in my request, so that price and categories wouldn’t be wiped out:
data: {
"access": "enrollment_closed”,
“price”: 799,
“Categories”: “Public Cohort”
}
But this triggered the following error, “Price cannot be greater than 0 for non-paid courses.” So I was stuck. I couldn’t find a way to update the access field without wiping out the price field.
There was another undocumented consequence of this. When a course price is zero, the API no longer allowed me to unenroll students from the course. I asked the support team about this and they shared that students can’t be unenrolled from free courses. Since the price on the course was now 0, students could no longer be unenrolled.
Okay, so LearnWorlds has some business rules:
- Price cannot be greater than 0 for non-paid courses.
- Students can’t be unenrolled from free courses.
Each of these rules makes intuitive sense. If a course isn’t paid, why should it have a price? If a course is free, why should a student ever be unenrolled from it?
The problem was how these rules interacted with each other. The course access field was overloaded. It has the following potential values: free, paid, enrollment_closed, private.
In my case, I was offering a paid course with limited enrollment. So whenever I sold a ticket, I needed to check to see if I had hit my enrollment cap. If I had, then I wanted to update the course access to enrollment_closed. But by doing this, I could no longer unenroll students from the course.
By the way, neither of these business rules were documented in their API documentation. I uncovered them through trial and error and through conversations with their support team.
To LearnWorlds’ credit, after a lot of back and forth with their support team, they acknowledged and fixed a few bugs:
- The categories getting wiped out was a bug and they fixed this.
- They realized the conflict in their two business logic rules and they updated the endpoint so that changing the access from paid to enrollment_closed no longer wiped out the price.
- The consequence of the last change meant that I could now unenroll students from a paid course after enrollment closed.
So my needs were met. But not without a lot of back and forth and heartache. And to this day, these business logic rules are still not included in their API documentation.
You’ll see in our subsequent stories about other APIs that under-documented endpoints is a very common usability issue with APIs. It can be hard to anticipate everything a customer might need to know about your backend logic. But this is exactly where the discovery habits can help. Pair programming with customers, story mapping how different endpoints might be used, and of course, assumption testing could have helped to uncover all of these issues long before a customer ever experienced them.
Error Handling: What Do I Need to Do When Something Goes Wrong?
As we’ve previously discussed in our introduction to Web APIs, error messages help developers figure out what to do when things go wrong. I definitely have encountered my fair share of error messages. Let’s look at some examples.
I use AWS Step Functions to orchestrate Lambda functions. If you aren’t familiar with the AWS world, that’s okay. These are simply the tools that I use to write code.
Step Functions have built in support for error handling and automatic retries. But that assumes that I can determine the best retry path based on the error that I get back. That’s not always the case.
Airtable: Missing Fields When a Field is Blank
One of the most common errors I encounter is from trying to access an Airtable field when that field is blank. For example, if I’m trying to access a student’s cohorts (as described above), if the student has no cohort enrollments, this field will be blank on the student’s record.
That means after retrieving a student resource, when I try to access the Cohorts field, the Step Function generates an error. The field isn’t in the data. That’s because when you fetch a resource from Airtable, it only returns fields that have data.
I understand why Airtable does this. I suspect their goal is to reduce the volume of data in the response. But it means that my code has to always first check to see if the field exists before trying to access it. This is incredibly annoying in a Step Function.
Thankfully, AWS Step Functions generate clear errors. But this isn’t an error that my code can handle with a retry. It’s actually a bug that I have to address when testing my code. I don’t always catch these errors ahead of time. This is probably my most common production bug because it’s hard to predict ahead of time all the different situations in which data might be missing.
Anthropic: Frequent Overloaded Errors
Lately, I’ve been encountering Anthropic’s 529 Overloaded error. (Note: This is different from their 429 Rate limit error. It’s an error that they use when their overall system is overloaded.) My code automatically delays and retries on any 5XX error, so these are pretty easy to handle. However, over the last week or two, I’ve been seeing this error frequently enough that tasks are starting to queue up and it’s affecting other downstream work. You’ll also see in the rate limit section later that I don’t really have a good mechanism for dealing with backed up tasks.
This is the first time that I’ve encountered frequent 5XX errors from an API and I’m still not quite sure what the best way to handle it is.
LearnWorlds: Missing Error Messages or Silent Fails
One of the most frustrating experiences I’ve had is when I make an API request that fails, but it doesn’t return an error message at all.
For example, the LearnWorlds API allows me to enroll or unenroll a student from a course. My expectation is that if the enrollment or unenrollment doesn’t happen, that I would get an error message. But that hasn’t always been the case. Sometimes the request fails quietly.
What’s even worse is that while the enrollment request responds with the student’s enrollments, the unenrollment request does not. So not only does it fail quietly, the response doesn’t give me the information I need to verify the action.
So for any enrollment/unenrollment request, I always have to follow up with a request to get the student’s enrollments and verify that the expected action did, in fact, take place.
I am learning this is good defensive programming. But it leads to a lot of extra code. Instead, I would prefer that an error is returned when the action failed and that the response body always included the student’s enrollments so that I could also verify myself that the action happened.
Error messages are how APIs communicate during times when things go wrong. It’s important that this communication is clear, actionable, and accurate. Sadly, this is often not the case. API teams should be testing their error messaging with customers so that they can find and resolve these issues.
Rate Limits: How Much Can I Send and How Often?
Every API defines rate limits for how often you can send requests. Thankfully, for my purposes, I have very rarely found myself in situations where the rate limits are constraining.
For example, the Airtable API—the API that I use the most often—has a rate limit of five requests per second. Most of the time, my code is orchestrating only a handful of requests to accomplish a task. So as long as I’m reasonably thoughtful about the timing of these tasks, I don’t hit the rate limits.
However, I do execute code in response to webhooks (or Zap triggers, if you are more familiar with that model). These can be unpredictable and can lead to concurrent tasks running at the same time. Sometimes when this happens I do exceed the rate limit. But these are rare. So for now, I simply handle these errors with retries after a delay.
I know the proper way to handle these situations is to queue all API requests and to throttle them so that they go out under the rate limit. But these errors happen so rarely, I haven’t prioritized this work.
A Final Word on My Experience
I will be the first to admit that I am not an expert on using APIs. Nor am I a very experienced engineer. But I hope that by sharing some of my experiences across a variety of APIs, it helps to shed light on some of the usability issues that come up when working with APIs.
Over the next two weeks, we’ll share two more stories from product managers who work on API teams. In the first, a product manager will share the challenges his customers faced when trying to move to a new version of his product’s API and how he started to use the discovery habits to overcome those challenges. In the second, we’ll look at how a PM evaluated the business case for a new API and we’ll see how several of the discovery habits helped get buy-in.
If you want to follow along with the rest of the series, be sure to subscribe below.