Devin - First Impressions
My first impressions of Devin: The user experience, how good it is at completing tasks, and how it compares against a human engineer.
Devin just became publicly available, where anyone can just sign up and pay to get access without going through a sales call.
Here’s my first impressions after trying out Devin for a day.
This post consists of 3 parts: the user experience, the performance, and the pricing. It will take about 15 minutes to read.
User Experience - Superb
This is hands-down the best user experience I’ve had for an AI coding solution.
Onboarding
The first obvious thing you notice that it is extremely well designed for software engineers as users. This is very apparent from the onboarding step:
The Devin interface guides you through a series of steps (with video guide as well) to setup the repository. How to install dependencies, how to run lint and tests, what are the things to take note for the repository.
Here are some minor UX issues (no default value provided on how to install dependencies, I had to type that), but overall it is very clearly thought-out and as a developer you understand why it is done this way, and how the configurations you setup during onboarding relates to VM provisioning, snapshots, etc.
Repo notes
During repo setup, Devin automatically generates a repo note as part of its knowledge base on what it should know about the repository.
You can edit it manually, and when working on tasks and interacting with users, Devin will continuously give you suggestions to update the knowledge base so that your instructions and rules are followed across sessions. It is like a memory component for Devin.
Separately to the repo note in knowledge base, Devin also has an automatically generated index of the repository to understand on a high level the code organization, structure and what each component does in the repository.
I suspect this is useful when doing Retrieval-augmented generation (RAG) on the codebase to figure out which are the files that are relevant to the task at hand.
New Session Interface
To start a new task, you are greeted with a huge chat window with suggestions at the bottom. This is really good when you want to describe the requirements in details.
I find this user experience superior to other AI coding apps that are based on VS Code, which puts code at the center of the UI. When using Devin, I can focus more on describing what I want, instead of worry about the code.
You can also give Devin tasks directly on Slack, and chat with Devin there. But I will focus on the experience of using the Devin web app in this post, as it is more interesting.
Main Interface
Once Devin starts working on the task, you get to the main interface.
The main interface for Devin is very well designed from an engineer’s perspective.
On the left you have a chat session, and on the right you have Devin’s Workspace. You can collapse either one of them to make the other one full screen.
Two-way Realtime Chat
On the left panel, you can interact with Devin via two-way communication:
You can at anytime tell Devin to change something (use a different approach while Devin is working on the task), or update your requirements.
Devin will proactively seek out clarifications (when something is not right) and ask for your permission to do something (storing sensitive information).
The cool thing is that Devin will respond in realtime to your messages and update its current plan to accommodate new information you have provided. It’s like you can interrupt it at anytime (hopefully Devin won’t get annoyed by my micro-management).
Devin’s Workspace
On the right panel, you have the Devin’s Workspace.
On the first tab, you can “follow Devin’s actions” by looking at what Devin is doing.
The rest of the tabs are very developer-centric, and allows you to “monitor” Devin’s actions.
Broadly speaking, there are a few tools that Devin can utilize and take actions on:
Shell - Executing a shell command
I am glad that Devin supports multiple shells, this allowed it to run the server while executing
curl
against it to test the logic.Devin is proficient with various git commands, gh commands (to interact with GitHub PRs and CI), and bash commands.
Browser - Visiting a web page and seeing its content
Devin has access to a Chrome browser that is controlled by an automated testing software. It is likely something similar to Selenium or Playwright.
Devin can both see the webpage inside the browser, as well as “operate” on the webpage (clicking a button for example).
Devin can take screenshots of the web pages in the browser.
I am not sure what are the exact set of capabilities available to Devin, it’s probably something like the WebDriver Protocol.
Editor - Carry out file operations and making file edits.
The code editor appears to be a fork of VS Code.
It is interesting that note that Devin will first decide which files are relevant for the task, and only load the relevant files into the editor, instead of loading all the files in the repository. I think this helps to improve the efficiency of RAG afterwards.
Devin can also use the editor to take notes.
In one of the sessions, Devin created a
TODO.md
in/home/ubuntu
(which is outside the code repo) to track its current progress for a task involving editing many files. This is quite similar to my personal workflow a few years ago when I was working in large tech companies.In another session, Devin created a
notes.txt
file to analyse an issue where the images had wrong aspect ratio in one component, but not the other.I consider the note taking ability to be the sign of a great software engineer.
Planner - Where Devin makes a plan for the steps to accomplish the task
Planner is the interesting bit. It looks like Devin is using a domain specific language (DSL) / pseudocode to plan out the tasks it needs to execute.
This DSL / pseudocode supports basic control flows and conditionals (if, goto, for loop).
I am not sure if Devin has an internal logic that actually executes these DSLs, or it is just pseudocode to be consumed by LLMs, but it looks cool for an engineer.
If it is actually running these dynamically generated tasks as functions or modules. This is a big step towards the “agentic” direction that many companies are going after.
Secrets Management
Devin has a dedicated way to handle secrets, basically you enter the secrets (API keys) into Devin interface via the “Add secrets” button, and it will be available to Devin as an environmental variables in the shell.
Funny enough, Devin does not seem to be aware of this in some sessions, and insists on creating a .env
file to get the secrets.
You can also just set it up in the Devin’s start commands, similar to how .bashrc
works.
Performance and Results - Passable
After using Devin for more than 10 simple to medium level tasks, I assess the actual performance of current version of Devin (v1.1.0, as of 15 January 2025) as passable at accomplishing engineering tasks.
Here are two examples of simple tasks that I gave to Devin.
Update Blog Post Screenshots
For the first task, I gave it a task on the website of 16x Prompt. I asked it to update the old screenshots in my blog posts with the latest version on the landing page.
This is a simple but tedious task that involves going through all the blog posts and finding instances where the screenshot was using an other version, and updating the screenshot to the latest version.
Devin correctly identified 3 files that needed updates, but missed a lot of others (low recall).
And for the 3 files that it correctly identified, it missed updating one of them when doing file edits. I am not sure why it happened.
I discovered these two issues while looking at the Vercel preview for the PR. After some back-and-forth, Devin managed to identify a lot more files that required update. I didn’t check if it covered all files, but it’s good enough for me.
Overall the final PR was satisfactory, and I am happy with the outcome, but the time it took and the amount of back-and-forth communication is disappointing.
Page Content Update
Another simple task that I gave to Devin was to update the content in the page where I had a list of interesting cli tools for AI coding, to incorporate a new tool that I discovered.
For this task, Devin needs to figure out the format of existing tools in the list, find the one-sentence description of the tool from its GitHub repo, and add the new tool to the existing list with the right formatting.
Overall this task went pretty well. I got the first PR within 5 minutes. But I noticed a strange issue, where Devin doesn’t seem to know the current date.
It thinks at December 2024 is a future date. Even after using system commands (date) in cli to get the current date (15 January, 2025), it still thinks that it is wrong.
After some back-and-forth, Devin finally convinced itself that the system date is correct, and proceeded to update the date in the page.
Overall Performance Evaluation
I also gave some more complex tasks:
Integrating 3rd party SDKs
Writing scripts to generate content using Anthropic API
Generate API endpoints in Express based on OpenAPI specs
Make a web page to simulate calling a series of APIs calls and display the results
Most of these tasks took more time and back-and-forth, but eventually I got Devin to accomplish all of them without me writing any code.
So here is what I think is good, and bad about it overall.
The good things:
Devin can actually complete the tasks I gave to it without need me to write any code myself.
For all tasks that I gave Devin which I believe are of simple to medium level difficulty for a typical software engineer, Devin managed to accomplish them, with some help from me along the way or during the review).
I only had to write in English the entire time, not a single line of code was written by me.
Devin can automatically write tests (without being prompted) and execute them either locally or via CI to verify that everything works.
For more complex tasks, Devin writes very detailed descriptions in PR to outline the changes made and the testing done to ensure that the code works. This is helpful for reviewing the PR.
Devin can autonomously work on its own, in parallel, for multiple tasks, speeding up development process significantly.
Devin is really “smart” in some cases.
For example, when I instructed it to use
curl
to run some insomnia collection exported asJSON
, it actually usedjq
to retrieve and chain variables between the different requests.Later after repeatedly doing the same thing, it started writing bash scripts to automate this (it also knows about running
chmod +x
).
Devin is really fast for certain tasks involving building a demo page or writing an automation script. For example, a demo page with 3 API calls that would take me at least 15 minutes to build, only took Devin 43 seconds.
Devin can “visually see“ web pages in his browser, by taking screenshots of them (presumably via browser automation APIs). This means you can ask it to attach screenshots in the PR to show how it actually looks like after making a frontend change.
This doesn’t work currently with GitHub website (the linked images resides internally within Devin’s system, not public accessible).
But the screenshots can be viewed both via Devin’s web interface and within Slack.
The good/bad thing:
Devin is very cautious about committing sensitive information into the repository, due to its built-in “security best practices” in the “system prompt”. Maybe this is due to the earlier security lapse that was reported in December, or Devin had been reported to accidentally committed sensitive data before.
For example, I tried to ask it to commit Stripe test cards (which are publicly available online) to the repo but Devin came back to confirm with me 3 times that I wanted to do it, basically refusing to it. Eventually Devin did commit those in a bash script file (perhaps unknowingly).
This also causes Devin to go into an infinitely editing loop sometimes, when the user instruction conflicts with the “security best practices” and Devin will edit the file back-and-forth in an endless loop:
The bad things:
Devin is constrained by the limited number of tools available to it (the “senses”).
Devin can’t visit web pages that requires human-triggered authentication.
For example, Devin can’t visit GitHub website and see the PR that it created, or look at GitHub Actions pages to troubleshoot CI issues.
To debug an error that happened during CI, it has to be either “look” (RAG) through all the logs from GitHub actions, or use
grep
to search for keywords. Human developers, on the other hand, can go to GitHub website, visually locate the relevant logs, and quickly narrow down the issue to specific lines where the error happened.I don’t think the human / visual troubleshooting approach is necessarily the best or more efficient, but it does outperform what Devin is doing now.
I assume Devin can’t access action logs on GitHub website like a developer normally would, because it is an app on GitHub and GitHub apps typically only allow with API access.
This also causes issues where Devin can’t fix UI some issues because it is not able to visually inspect how it looks like via human eyes (for example, a complex layout on a web page, or the aspect ratio of an image).
Update on 17 January 2025: I discovered that Devin can in fact “see” the web pages by taking screenshots in his browser. You just need to ask him to do it.
Some other examples of actions that are trivial for a developer, but (presumably not possible) for Devin:
Copy-pasting a few lines from one file to another
Visually see the lint errors via IDE built-in lint system
Use IDE built-in features to perform refactoring
I believe Devin currently relies primarily on LLMs (with some custom formats or additional logic on top) to “edit files”, which is inefficient and prone to errors. This makes Devin less efficient in handling some types of tasks like refactoring or troubleshooting issues.
Devin is really slow in completing certain tasks, much slower than software engineers. Sometimes it can be appear to be idle for a few minutes. For example, a task that would take me 30 minutes to complete end-to-end took Devin more than 2 hours.
I suspect this has to do with either the underlying LLMs’ latency, or the endless loop due to conflicts between the “system prompt” and the user instructions.
Devin's performance degrades in long sessions. A warning appears when the session has been going on for a long time (at around 2.5 hours or 10 ACUs).
I suspect this has to do with the effective context window of the underlying LLMs. When the session goes on for too long, the history cannot fit into the context window, and the cost for each subsequent LLM call keeps going up.
Claude gives similar warning on their UI as well.
Devin generates two plans for each task, one in plain text form inside the chat, and one using DSL / pseudocode inside planner. These two do not always align on the steps. The planner might miss out some important steps that was mentioned in the chat.
Devin sometimes does not pick up the relevant knowledge from the knowledge base (recall problem in RAG). And in cases where it does pick up the relevant knowledge, there is a chance that Devin will not use that knowledge anyway (likely a limitation of the reasoning capabilities of the underlying LLM, and Devin’s internal planning logic).
Overall Devin feels like a smart developer being asked to code while being handcuffed.
Pricing - Fair
All pricing below are in USD.
Devin is currently priced at $500 a month for 250 Agent Compute Units (ACUs).
Worth noting that on the plan page, $500 is listed as the “discount price”, whereas the original price is $1250.
I got 50 additional ACUs after onboarding (300 in total), which is supposed to be a gift for scheduling a call, even though I didn’t actually schedule one due to very limited slots around 3am in Singapore.
For additional usage beyond the included 250 ACUs, it is $2 / ACU. So the additional usage charge is exactly the same as the base subscription price, at $500 per 250 ACUs.
This means Devin, as of now, is essentially using a volume-based pricing of $500 per 250 ACUs.
I think this pricing makes sense because the cost of operating Devin does scale linearly with the amount of compute and task.
Agent Compute Units (ACUs)
According to Devin’s official documentation:
Currently, each ACU is approximately equivalent to 15 minutes of active Devin work. As reference, simple frontend fixes usually take between 5-15 minutes.
In my first round of testing, I ran 4 simple to medium tasks across 2 repos, in total 7 Devin sessions (2 extra sessions for “Verify Repo Access Tasks“ / onboarding, a one extra session because Devin crashed once). Completing these 4 tasks successfully costed me 46.47 ACUs ($92) in total.
So each task costed me about 11.5 ACUs, which translates to $23, or approximately 172.5 minutes / 2.9 hours.
I think I would be able to finish these tasks faster myself (maybe an hour to 1.5 hours, depending on the complexity). But I am happy to pay Devin $23 to do a task that would have costed me an hour (in a frictionless way), so that I can be free to do other more meaningful things.
Cost of Devin vs Junior Engineer
In terms of the cost, Devin costs $2 per 1 ACU (15 minutes of Devin active work) at a flat rate. The hourly rate would be $8 per hour.
Say we hire Devin full-time for a month (22 days working days, each day 8 hours). That would translate to $8 * 8 hours * 22 days = $1,408 a month.
However, Devin is slow in completing some tasks (compared to a good junior engineer). So I would multiple the cost by a factor of 2 to 4 to achieve the same productivity. So it would cost between $2,800 to $5,600 USD a month for Devin to achieve a full-time junior engineer level productivity. This is similar to the salary range of a junior engineer in Singapore.
A key benefit of Devin, however, is that you can just hire him by entering your credit card. And you can hire (scale up) as many Devins as you want to work on things in parallel. You can’t hire human engineers that quickly, and humans comes with overhead cost of managing them.
Conclusion
Despite being publicly available, I would say Devin is still at a very early stage in terms of maturity. There are some key constraints on what Devin can do. And it has some quirks that seems bizarre for an AI software engineer.
However, I must say that the user experience (UX) is by far the best I’ve seen in recent years, surpassing ChatGPT and Claude.
I am hopeful that if Cognition can fix these quirks and find novel solutions to overcome the constraints, Devin can really change the entire software engineering landscape.
Worth the $500 for me.
I will continue to use Devin daily, give it more variety of tasks, and post more in-depth analysis in the future. Subscribe to receive them.
Thanks for writing this! I have been curious about Devin but didn't really have a reason to drop $500 on it.