I Started Learning AI Engineering Two Days Ago. Here Is My Honest Take.

Two days ago I decided to start learning AI engineering properly. Not “using AI tools in my workflow” which I have been doing for a while. Not reading about large language models in a surface-level way. Actually learning to build things with AI at the layer below the chat interface.

I want to write about this now, at the very beginning, before I know too much and forget what the starting point felt like. The “I have been doing this for three years” articles are valuable, but they have a blind spot. The person writing them has forgotten what they did not know at the start. They have also forgotten which things looked confusing from the outside but turned out to be simple, and which things looked simple but turned out to matter a lot.

Two days in, I still have that perspective. Here is what I found.


Why I Am Making This Move

I have been writing software professionally for years. Web development, full-stack work, some startup building, the usual. I have watched the AI space from a distance while integrating AI tools into my workflow at the surface level.

The thing that finally made me take this seriously is not the hype. I have been ignoring the hype for two years. What shifted my thinking is what I started seeing in the job market and in the products that are actually getting built.

The job postings for AI engineers have gone from rare to everywhere, and the salary numbers are not subtle. But more importantly, the actual products I find interesting are almost all built at the AI layer now. The things I want to build involve AI in a real way, not just as a feature. If I want to build them properly, I need to understand what is actually happening underneath.

That is the honest reason. Not career optimization, not following a trend. The things I want to make require these skills, and I do not have them yet.


What AI Engineering Actually Is

This took me a while to get clear on, because the term is used to describe a range of things that are genuinely different from each other.

There are people who build and train AI models from scratch. That is machine learning research and engineering. It requires deep math, large compute budgets, and years of specialized knowledge. This is not what most “AI engineer” job postings are asking for, and it is not what most AI engineering work actually involves.

There is also a lot of marketing content that calls itself AI engineering but is really just “how to write a prompt.” That is also not what we are talking about.

The middle layer, which is what the actual demand is for, is building production applications that use AI models as components. You are not training models, and you are not just writing prompts into a chat interface. You are building systems where AI models do work inside a larger application: retrieving and processing information, making decisions based on context, using tools, running autonomously to complete tasks, and doing all of this in a way that is reliable enough to ship to real users.

That skill set sits on top of software engineering. You still need to know how to build backend services, design APIs, handle authentication, manage databases, deploy things reliably. AI engineering is not a replacement for those skills. It is an additional layer on top of them.

The way I think about it: a software engineer knows how to build the car. An AI engineer knows how to build the car and also how to install and operate the increasingly capable co-pilot system that is changing how the car works.


The Actual Demand Numbers

I usually distrust statistics in this kind of article because they are often cherry-picked or out of date. But the numbers here are striking enough to include.

AI engineer job postings grew nearly 200 times between 2021 and 2025. The role is growing about three times faster than traditional software engineering. Over half of AI engineer job listings offer six-figure salaries, and around a third are in the 160K to 200K range. Engineers who add AI skills to their existing software engineering background see salary uplifts of around 50% compared to peers without those skills.

The World Economic Forum is projecting that demand for AI and data roles will exceed supply by 30 to 40 percent by 2027. Which means if you have these skills in the next year or two, you are going to be in the rare position of being genuinely hard to find.

I do not usually lean on salary numbers as motivation. But the underlying signal here is not “AI is fashionable so salaries are temporarily high.” It is “the supply of people who can build real production AI systems is far behind the demand for those systems.” That gap is real. I have seen it firsthand in the hiring conversations that come my way.


The Roadmap I Found

There is a reasonable consensus on what the learning path looks like, and after two days of digging through resources, here is my working map of it:

Layer 1: LLM APIs and prompt engineering

This is the entry point. You learn to call the OpenAI, Anthropic, or similar APIs, understand how to structure messages and system prompts, and build your first basic applications. This is also where you learn the fundamentals: tokens, context windows, temperature, how inference pricing works, and how to think about model capabilities and limitations.

Most developers can get through this layer in a few weeks of focused work. If you can already build a REST API, calling an LLM API is not technically intimidating. The new skill is learning to think about prompts as a design problem.

Layer 2: RAG pipelines

RAG stands for Retrieval-Augmented Generation. The core idea is that AI models have a knowledge cutoff and cannot know about your specific data. RAG lets you give the model relevant information at query time by retrieving it from a database or document store and including it in the context.

This is the single most cited skill in AI engineering job postings right now. Building a RAG pipeline means learning about vector databases, embeddings (how text gets turned into numbers that encode meaning), retrieval strategies, and how to evaluate whether your pipeline is actually surfacing the right information. Tools in this space include Pinecone, Weaviate, Chroma, and pgvector for those who prefer staying in Postgres.

This layer is where a lot of real enterprise AI work lives. Most companies have valuable proprietary data and want AI that can reason over it. RAG is the standard architecture for that.

Layer 3: Agents and tool use

This is the layer I find most interesting and it connects directly to what I have been reading and writing about agentic coding. Instead of a model that just answers questions, you build a model that can take actions: search the web, query a database, call an API, write and execute code, read files, interact with external services.

The skill here is designing the agent architecture: what tools does the agent have access to, how does it decide which tool to use, how do you handle errors, how do you keep the agent from going off the rails on long tasks? Model Context Protocol, which I already wrote about, is becoming the standard for how agents connect to tools safely.

This is where AI engineering starts to feel genuinely novel rather than just “APIs with extra steps.”

Layer 4: LLMOps

This is deploying and operating AI systems at scale, and it is the bottleneck that most teams run into when they try to take an AI feature from prototype to production.

LLMOps covers: monitoring model performance over time, detecting drift (when model outputs start degrading without obvious cause), managing prompt versioning and rollbacks, evaluating outputs systematically rather than manually, handling cost optimization when you are making thousands of API calls per day, and managing latency and reliability for user-facing features.

A lot of this is analogous to DevOps but with new challenges. You cannot just run a test suite and get a binary pass or fail. Evaluating whether an AI response is good is a different kind of problem than evaluating whether a function returned the right value.


What Surprised Me

A few things caught me off guard in the first two days.

The tooling is more mature than I expected. I came in expecting to spend a lot of time fighting with immature libraries and inconsistent APIs. Some of that exists, but the core tools have stabilized significantly. LangChain had a rough early reputation for instability and over-engineering, but the ecosystem has settled. LlamaIndex is solid. The major cloud providers have invested heavily in ML infrastructure. You can build serious things without constantly hitting sharp edges in the tools.

The math requirement is much lower than I expected. I am not a pure mathematician. I can read the math in a technical paper but I am not going to derive gradient descent from scratch. I was bracing for that to be a bigger blocker. It is not. Building production AI applications does not require you to implement the underlying algorithms. It requires you to understand them well enough to use them intelligently, which is a different bar.

What is harder than I expected: evaluation. How do you know if your AI system is working well? Writing unit tests for a function that returns a number is easy. Evaluating whether an AI assistant gave a helpful, accurate, and safe response across thousands of diverse queries is a genuinely hard problem that the field is still working on. I am going to spend real time here.

And the domain knowledge angle caught my attention early. I kept seeing the stat that AI engineers with domain specialization command 30 to 50 percent higher salaries than generalists. After thinking about it for a day, I believe it. A general-purpose AI engineer can build a RAG pipeline over any corpus. An AI engineer who deeply understands legal documents, or medical records, or financial reporting can build a RAG pipeline that actually works for those specific use cases in ways a generalist cannot. The domain knowledge is the moat.


The Career Path I Am Mapping Out

Here is how I am thinking about the next six to twelve months personally.

I am starting with LLM APIs and building something small and real. Not a tutorial project, an actual thing that solves a problem I have or that I would use. The gap between tutorial projects and real projects is where a lot of learning happens that tutorials skip.

From there, I will go deep on RAG. This is the highest demand skill and the one with the clearest path from “I built a prototype” to “I built something a company would pay to run.” I want to build at least two serious RAG applications before I move to the next layer.

Agents and tool use is where I want to spend serious time. This is the part of the stack I find most intellectually interesting and where I think the real leverage is in the next two to three years. The overlap with what I have been learning about agentic coding is significant.

LLMOps I will learn alongside the other layers. You cannot evaluate what you have not built, so it makes sense to develop those skills in parallel rather than treating them as a later chapter.

The domain specialization question is one I am genuinely thinking about. I have a background in web development, startups, and developer tooling. The most natural domain for me to go deep on is probably developer tools and developer workflows, which is a fast-moving space with significant demand. I am watching that more carefully now.


What Makes This Different from Learning a New Framework

I have learned a lot of new tools and frameworks over my career. React, Next.js, Astro, various backend frameworks, database tools. There is a familiar shape to that learning: read the docs, build some things, hit the edge cases, develop intuitions.

AI engineering has some of that, but it has a different quality too.

When you learn a new framework, the system behaves deterministically. Given the same inputs, it produces the same outputs. You can test this precisely. You can reason about it mechanically if needed.

AI systems are probabilistic. The model might give a different answer to the same question twice. The quality of outputs is a distribution, not a fixed value. The failure modes are softer: instead of a crash, you get a subtly wrong answer that looks plausible. Testing requires different approaches. Debugging requires different approaches.

This is an adjustment. I can feel myself recalibrating how I think about correctness and reliability. It is not better or worse, it is different. But it requires genuinely updating your mental model rather than just adding new syntax to an existing one.

There is also a pacing difference. Frameworks move fast but they have stable core APIs. The AI space is moving at a different speed. Models that were state of the art six months ago have been replaced. Frameworks that were the standard approach a year ago have been deprecated or superseded. The half-life of specific tool choices is shorter. Learning principles over tool specifics matters more here than in most other areas of the stack.


The Honest Hard Part

I want to be direct about what I think the actual difficulty is, not the theoretical difficulty.

The theoretical difficulty people warn about is the math. Understand transformers, attention mechanisms, loss functions, backpropagation. That is real knowledge, but it is not the blocker most people think it is for building applications.

The actual difficulty, I think, is developing good judgment about when AI is and is not the right tool for a given problem. There is enormous enthusiasm for applying AI to everything right now. A lot of those applications are bad ideas. The model adds cost, latency, unpredictability, and failure modes that a simpler system would not have. Knowing when to use a regular if-statement versus a language model is judgment that takes time to develop.

The second hard part is evaluation. I keep coming back to this. Building the system is learnable quickly. Knowing if it is actually working well, across a wide range of real inputs, in a way that degrades gracefully when it fails, is harder. I expect this to be the part that takes the longest.


Where I Am Going to Learn

Since people will ask: the resources I found most useful in the first two days.

The DeepLearning.AI short courses are genuinely good and efficient. Andrew Ng has kept them updated. The RAG and agents courses are worth the time.

The Anthropic documentation on prompt engineering and Claude’s API is unusually good. They have clearly invested in making it possible to understand how to use their models well, not just technically but conceptually.

Building things. I know that sounds obvious, but every hour I spent reading about embeddings was less useful than the forty minutes I spent building a small thing that used embeddings. The concepts crystallize fast when there is something running.

The AI engineering community on X is active and moves fast. More useful for staying current than for foundational learning, but following a handful of people who are building real things in this space is worth it.


Two Days In

I am early. Very early. I am not going to pretend I have achieved anything yet beyond orientation.

But I know where I am going. The path is clearer than I expected. The first steps are concrete. And the pull of what is on the other side of this learning curve is real.

I am going to keep writing about this as I go. Not polished retrospectives from a position of expertise, but honest ongoing documentation of what it actually looks like to make this transition. Including the parts where I get confused, make wrong assumptions, and have to backtrack.

If you are considering the same move, I hope this was useful. And if you are further along this path, I would genuinely like to hear what surprised you at the beginning that turned out to matter a lot.

The field is moving fast. The best time to start was probably a year ago. The second best time is now.