I am often asked how to create a commercial AI Agent(Agent)?
- Should I choose Coze, Dify, or LangGraph?
- What are the do's and don'ts of the process?
- Where should my data be stored?
- Why some pages can't be crawled with this tool?
- ------
So in early 2025, I combined these two years of AI Agent 's development experience to launch a collection, Building an AI Agent from Zero to One.
The content contains theory, hands-on and case studies of AI Agent, and I will show step by step the complete process of how to build a commercial AI Agent.
If you're a non-technical person, this is very helpful; if you're a technician, this video will help you get started quickly with fewer detours!
This content will take a holistic view of the seven steps to building an AI Agent, which are requirements grooming, software selection, prompt engineering, database, building UI interface, testing and evaluation, and deployment and release.
Sorting out needs
Sorting out workflows
The first step we need to do is to sort out the requirements.
First of all, we need to clarify what problem we are doing this AI Agent to help us solve?
- If you are a self-publisher, you may want to make an AI Agent to help you handle some repetitive tasks, such as finding counterparts, finding hotspots, doing analysis, writing first drafts, etc., so that you can focus on content creation;
- If you're a trading company owner, you might want to make an AI Agent to help you aggregate orders from different platforms and make product inquiries and price comparisons across platforms.
Remember, it's the repetitive, mechanized tasks that don't require much thought that you need to focus on sorting out, and the more detailed the better.
Of course, you can also use AI tools to communicate with it and form a first draft before adding to it.
- You are a workflow sorting expert, please help me sort out this role in the daily work need to repeat the work, and mark which can be assisted by the AI, which mainly I do, first in the form of a table (work content / AI assistance / manually do it) output, I when I think that the form of the contents of the complete, I replied "continue! When I think the form is complete, I reply "continue", and then you output it in the form of a mermaid flowchart, with each process node indicating whether it can be completed by AI assistance, and the flowchart is horizontal.
What tools are used
After sorting out the requirements, we need to list the tools we need to use based on the sorted workflow.
For example, collecting data requires a web crawler tool; publishing articles requires interfacing with the WeChat public platform.
Therefore, the selection of tools is also very important. With the help of different tools, it is possible to automate the tasks performed by the AI Agent between different systems, thus reducing manual operations.
AI Agent Selection
In the second step, we need to choose the AI Agent development platform according to the scenario, select the appropriate large model, and choose different tools to perform the operations of different systems.
Which AI Agent platform to choose
Let's talk about AI Agent development platforms, Dify, Coze, FastGPT and so many other no-code Agent development platforms, which one should we choose to be more suitable?
- Coze can only be used in the cloud and cannot be deployed locally.
- Dify is completely open source and has no limitations, but is weak in knowledge quizzing.
- FastGPT has some limitations in its use, but is relatively competent in knowledge answering.
More advanced development platforms, such as LangGraph, CrewAI, etc., allow the AI to plan and execute tasks on its own, but require code to be written.
Whether or not we choose these platforms depends on our specific needs, but of course there can be a mix.
This requires an in-depth understanding of the characteristics of each development platform, what it is good at, what it is not good at, and what are the obvious flaws. Only with this information can we make the right selection for our scenario.
Which LLM to choose
Now there are overseas OpenAI, Claude, Gemini, domestic Kimi, Tongyi Qianqian and the recently popular DeepSeek, as well as some open-source models such as LLaMA, Grok, and also small models like Mistral.
So, based on your AI Agent scenario, which of so many models should be the most appropriate to choose?
If you don't have private data, the best choices are OpenAI and Claude, as they are head big models. If you are just doing tasks such as translating and summarizing articles, choosing a domestic big model works about the same, and the current DeepSeek is very cost-effective.
Which model you choose will depend on your specific usage scenario, and you can certainly consider a mix. At this time, it is recommended that you gain a deeper understanding of the capabilities of the different models.
- What's the difference between a small model and a large model?
- Which model has the strongest reasoning power?
- What is the difference between different sizes of the same model such as 8K, 32K, etc.?
- If deployed locally, what configurations can run what models? What are the capabilities of these models?
- If using a large model in the cloud, what is the billing unit price for the model?
- Is it possible to mix different models?
- Can you use big models in the cloud for enterprise privacy data?
I'll cover each of these issues in subsequent content.
Which tools to choose
Finally, there is tool selection. A tool is a capability that can be as simple as generating a picture, going to the Internet and searching, or even interfacing with a system.
The capabilities of the AI Agent development platform are only the ability to utilize large models, so tools are needed if interaction with external systems is required. Tools can be broadly categorized into those with and without API interfaces.
Tools that have API interfaces are very simple to interface with. On platforms such as Coze, Dify, etc., many tools are already integrated and can be configured and used directly.
And for tools without API interfaces, they need to be handled through RPA (Robotic Process Automation). Simply put, RPA is an automation tool that can perform a series of actions by controlling the browser.
Tips Engineering
In the third step, cue engineering is the core of AI Agent, and a good cue word can greatly improve the accuracy of the output of the big model.
- A good cue word can help the AI Agent understand the task accurately and improve the quality of the output of the larger model.
- A good cue word can reduce token consumption and lower costs.
- A good cue word can help the AI Agent understand the context and ensure a coherent conversation.
Therefore, we need to master how to write effective prompts.
- What is the CRISPE framework?
- What is the BROKE framework?
- What is the ICIO framework?
- What is CoT (Chain of Thought)?
We also need to understand the rules for interacting with large models, for example:
- A long article in multiple outputs is of higher quality than a one-time output.
- Using different symbols to separate different information enhances the understanding of the larger model.
- Giving examples can help the larger model understand your requirements quickly.
- For complex tasks, it is more effective to break them down into steps and guide the large model to execute them in steps.
- Clearly define the limits of the output, such as word count, format, style, and language difficulty.
ICIO framework:
- Intruction: Specify the specific task you want the AI to perform, such as "translate a text" or "write a blog post about AI ethics".
- Context: Provides background information about the task to help the AI understand the context of the task, e.g., "This text is used for the opening of an internal company meeting."
- Input Data: Specify the specific data that the AI needs to process, e.g., "Please translate the following sentence: 'Artificial Intelligence is changing the world'".
- Output Indicator: Sets the desired format and style of the output, e.g., "Please translate in formal business English style".
BROKE framework:
- Background: e.g., "You are writing a press release for a tech startup about its latest product."
- Role: Designate the AI as the "press release writer" so that it can answer questions from a professional perspective.
- Objectives: Give a description of the task, e.g., "Write an engaging press release that highlights the unique selling points of the product."
- Key Result: Set the key result of the answer, e.g., "Use formal and professional language and include the product's key features and market positioning."
- Evolve: After the AI gives an answer, it offers three ways to improve it, such as "Adjust language style to appeal to target audience," "Add product use cases," or "Optimize structure to improve reading fluency".
CRISPE framework:
- 1. Capacity and Role: Clarify the role the AI should play in the interaction, such as educator, translator, or advisor.
- 2. Insight: Provides contextual information about the role-play to help the AI understand its role in a given situation.
- 3. Statement: A direct statement of the tasks that the AI needs to perform to ensure that it understands and executes the user's request.
- 4. Personality: Set the style and format of the AI responses to better match user expectations and scenario requirements.
- 5. Experiment: If desired, the AI can be asked to provide multiple examples for the user to choose the best response.
CoT framework:
Chain-of-Thought, a method of guiding a large model to think about a problem solution step-by-step in a sequential order, like a human.
comprise mainly Few-Shot CoT and Zero-Shot CoT Two application methods.
Few-Shot CoT (small number of examples)
Describe the steps of thinking to understand the customer's needs, then consider the , and finally give a recommendation and explain why.
Examples are also provided that show how AI can think along the chain of thought to give an answer.
Zero-Shot CoT (no examples)
Simply add a cue word:
Let's think step by step.
Database Selection
Step 4, where to store the chat logs, collected data and other contents generated during the operation of AI Agent? This time you need a database.
For non-technical people, I recommend using Flybook's multidimensional forms for their visualization, ease of use, and simplicity of docking.
The downside is that when the amount of data becomes large, the read speed will become slower and cannot handle complex business logic.
And for technicians, you can use common databases such as MySQL and NoSQL.
Building the UI Interface
Step 5: Build your own UI. On Coze, you can DIY your own interface, while on Dify there are ready-made interfaces, just not modifiable.
Both platforms can also be published as service APIs, which means that instead of using the interfaces they provide, you can develop an interface independently to interface with them.
If you want to develop your own interface, you can custom develop one with the help of an AI programming platform like Cursor.
Another reason to develop an interface of your own is that on Coze and Dify, you can define multiple AI Agents that you can call with your own defined interface, which allows you to always operate on one interface when using it.
Testing and Evaluation
Step 6, Test and Evaluate. Testing is to make sure that your AI Agent doesn't have errors, such as the program reporting errors, large models that can't handle user requests, and other issues.
Evaluation is to ensure that the AI Agent is outputting the correct responses. During the evaluation process, we need to continuously optimize the AI Agent to ensure that it outputs the correct responses and reduces token consumption.
We can use LangSmith to monitor the running of the project.
LangSmith helps you work better with large models:
• Debugging and Testing: It can help you identify problems in your program and provide solutions to ensure that the AI Agent can answer questions or complete tasks correctly.
• valuation: Test the AI Agent's performance by creating a variety of test cases, such as how accurately and reliably it answers questions.
• monitorIt allows real-time observation of the AI Agent's working status, such as the speed of processing requests, the cost spent, etc.
• LoggingIt records all the details of the AI Agent's work, including the questions it receives, the answers it gives, and the parameters it uses, making it easy for you to analyze and improve.
Deployment Releases
Step 7, deploy and publish. Different AI Agent development platforms have different deployment methods. coze can be directly published to platforms such as beanbags and applets, while Dify can be directly published as a web application or embedded into your system.
If you are developing the AI Agent independently, then you can purchase servers for standalone deployment.
wind up
That's the series of things I'll be covering in the compilation Building an AI Agent from Zero to One, and it's been a pleasure to dive into these with you to help you build your own perfect AI Agent.
I'm sure you have many more questions, and we'll dive in and discuss them together in the follow-up.