Self-teaching with goals

April 29, 2021

I started programming at a young age and have friends who did the same. It’s very fun! And it can set you up for success later in your career, as you’ll already know what you’re doing - or not. I know people whose self-taught journey did nothing for them career-wise, and I think I know why. The key difference is that some of us built larger project(s), while others did not. Everyone builds smaller projects, but focusing on one or two large projects has a lot of positive effects:

It requires you to live in a codebase, rather than just create and forget about one. If you build some small-sized (read: typical) project as a self-taught developer, you won’t discover many architectural mistakes you make. Many organizational and architectural decisions that seem great will work fine in small projects, but in a larger-scale project you’d easily see (and feel) their drawbacks
It requires you to plan in far greater detail. You can’t just write code, you have to conciously think about the repercussions of code, and figure out what the best ways to do things. You’ll be living with whatever decision you make for a while, so it’s important to plan
It requires you to keep production working well. For example, I once migrated all the data in one database to another database, with very minimal downtime. In a smaller, less serious project, I could have just said “who cares about migrating the data” and would not have learned anywhere near as much about the perils of data migrations.
It requires you to care about security and privacy. For example, I needed to have geographic redundancy as part of a contract my dad and I’s business had with a client. The service I was using for spinning up nodes didn’t offer VPNs at the time, so I had to use SSH tuneling. Even this year I had to use SSH tunnelling several times for work to forward ports from one machine to another - something I wouldn’t even know was possible if I was only working on smaller projects, where things like security and privacy are often as simple as “don’t save any user input on a server”, or “eh, I’m the only one using it anyway”
It will later demonstrate to people reading your resume that you didn’t just work on little toy applications that could be accomplished without much skill, but that your work was far more complex, and demanded much more

If you build a large project that you treat as a real product (note that it doesn’t need to even be a real product, you just have to treat it as seriously as if it were), it will improve your skills more than 100 small projects will teach you. Sure, then 100 small projects can teach you a wide variety of things, but that 1 project can teach you immensely valuable lessons you can then apply to projects of any size later on.

Here’s a concrete example. Imagine you build a simple “todo list” that stores the items in an array of strings, allowing you to only add strings to it. And now imagine you want to add the ability to actually put a checkmark by todos instead of just deleting them. Many self-taught programmers create an parallel array of booleans to handle this, and it works for them. If you were to treat this as a real product, you’d add features like deleting todos, reordering todos, persisting todos to some database, adding todo creation times, maybe even todo authors and sharing functionality, etc. As you built these features, you’d notice some things:

Peresisting the todos to a database will typically involve serializing the parallel arrays into a single array with both of those things combined into each element of the array, and deserializing back into parallel arrays for the UI
As you add on things to store about todos (creation date, user who created them, etc), you wind up creating more parallel arrays that you must now also add to serialization and deserialization
Many features require manipulating more than 1 of the parallel arrays. For example, if you add deletion, you need to go into each of the parallel arrays and delete the right element from them. And if you implement reordering todos, then you also need to delete from each of the parallel arrays. Note that this interacts with the previous point in such a way that you can easily end up with tons of repeated code, and can have subtle failure cases if you miss even 1 line of code (or forget to update just 1 line of code)
When you write code for working with a todo in another UI view, you may start using todos the way they’re stored in the database, so now you have 2 different ways of representing a todo even in your UI, which make it hard to reuse code between those two different UIs. For example, one needs serialization/deserialization, while the other does not. Or as another example, one may operate on an entire array with an index, while the other operates on an object
All of the above doesn’t just slow you down, but also feels increasingly “unhygenic”, making you want to rewrite the parallel arrays (which also takes time and effort)

Working on a larger project, you’d learn that the parallel array idea isn’t a good fit for this use case. But in a smaller project that you didn’t really treat as a product (like the original, simple todo list), you may not realize any of the above issues could occur. Even in a larger-than-average project, you may still not encounter some of the above issues, or may underestimate the severity of them.

Even this tiny design decision can have a very negative effect in a large codebase. Just imagine how many other decisions there are to learn from, that you’ll never learn in projects that are smaller or that you don’t take as seriously.