header-langage
简体中文
繁體中文
English
Tiếng Việt
한국어
日本語
ภาษาไทย
Türkçe
Scan to Download the APP

CZ backed a Chinese college junior, $11 million seed round, doing Education Agent

2025-10-30 15:22
Read this article in 64 Minutes
In a nutshell, VideoTutor, a personalized educational explanation video service targeting the K-12 education sector, announced today that it has successfully raised $11 million in seed funding.
Original Title: "Zhao Changpeng Invested in a Chinese College Junior, $11 Million Seed Round, Creating an Education Agent"
Original Author: Founder Park, a community for entrepreneurs under GeekPark


A Chinese college junior, $11 million seed round, currently the highest funded product in Silicon Valley's student entrepreneurship.


Featuring a product called VideoTutor, an education agent for K12 that can generate personalized teaching/explanatory videos with just one sentence, announced today the completion of an $11 million seed round. This round of funding was led by YZi Labs, with participation from Baidu Ventures, Jin Qiu Fund, Amino Capital, BridgeOne Capital, and several well-known investors.


This is also the first AI product company invested in by YZi Labs.


Founder Kai Zhao stated that VideoTutor has received recognition and support from CZ and the YZi Labs investment team, and ultimately this round of funding was led by YZi Labs. They received over 10 term sheets and selected these few in the end.


The product launched its first version on May 14th (premiered in the Founder Park product market), received market recognition and PMF validation, and in less than 5 months, closed this $11 million seed round of funding.


In Kai's view, the core reason they were able to secure this funding is that, on the premise of the right direction, the "young genius team" used visual learning to address the pain points of American SAT preparation in the K12 field.


"This field is more suitable for young people to work in, coupled with very good engineering hands-on ability, and the founder himself has very good insight and experience, with very fast execution."


Not just them, Cursor, Mercor, Pika, GPTZero, and others, Silicon Valley college students are setting new records in funding with AI products one after another, reshaping everyone's perception of AI entrepreneurship.


Entrepreneurship in the AI era is indeed somewhat different.


We talked to these young people at VideoTutor to understand why they were able to secure this seed funding, what kind of changes are happening in Silicon Valley entrepreneurship today, and why they are so eager to recruit employees from major Chinese tech companies.


Guests: CEO Kai Zhao, CTO James Zhan


Interview & Editing | Wanhu


The following is the interview content, edited and organized by Founder Park.



K12 Track: The True Direction is Visual Learning


Founder Park: With so many institutions showing great interest in your project, what key aspect do you think impressed them the most?


Kai: I believe the first key aspect is being on the right track. The AI education track has great potential and prospects, and the educational field we are entering is focused on the American college entrance exams SAT and AP. Targeting K12 high school students, we have a very small generation gap with our user base. We have gone through the entire exam preparation and study cycle, understanding where the pain points of exams and preparation lie, and we were able to create a product that truly addresses these pain points.


Secondly, the team is extremely talented. James comes from Gemini and was a core engineer at Google specializing in AI engineering and algorithms. I personally have three experiences in educational entrepreneurship, starting my entrepreneurial journey with educational software since my freshman year. During my sophomore year, I participated in the creation of the MathGPTPro project, which was selected for the Qijie Innovation Forum, among others. I have experience in successfully building educational products.


Thirdly, in the AI education field we are in, the core lies in the animation engine, and we are the core developers of VideoTutor. We are the team that understands the core technology the best and can render the animation engine with great precision.


The team itself has a very strong marketing gene and knows how to promote effectively.


VideoTutor aligns well with a common investment consensus among mainstream American VCs, known as the "young genius team." This refers to the idea that this field is more suitable for young people to engage in, coupled with very strong engineering hands-on capabilities, as well as the founder having great insights and experience, and very rapid execution capabilities. I believe this is a consensus reason why all investors can be optimistic about it.


VideoTutor Rings the NYSE Bell at YZi Labs EASY Residency Demo Day


Founder Park: What core problem in the education industry does your product aim to solve?


Kai: Currently, learning products in the market can be classified into two categories: active learning products and passive learning products. Passive learning products, such as ByteDance's Gauth, Chegg, AnswersAi, etc., cover what we call the "Homework Help" scenario, where the learning process is very short and mainly involves students paying to get their homework answers.


On the other hand, VideoTutor covers the active learning scenario. In this scenario, we don't need to consider students' learning motivation because they must learn and take exams, such as the American SAT, AP exams. In this scenario, there is a great need for visualization, as 80% of the content in the American SAT exam involves knowledge like functions, calculus, which require complex image rendering. VideoTutor's animation engine is well-equipped to address this scenario.



Moreover, the average order value in this field is very high. In the U.S., around 2.6 million students take the SAT exam every year, leading to a significant demand for paid services. Offline SAT courses are very expensive, charged not as packages but by the hour, starting at an average of $150 per hour and going up to $230 in most cases. Many students and parents are willing to pay for these courses. However, VideoTutor can effectively shift or even replace teacher-led training, as the current AI-generated videos are almost indistinguishable from teacher-led content. This way, students can have their own AI personalized exam preparation teacher at a minimal cost.


Founder Park: What was the catalyst for deciding to develop this product at that time?


Kai: Actually, even before us, there was a team at Stanford called Gatekeep Ai working on a similar concept of visual learning. We were already aware of the impact of this direction. In previous entrepreneurial attempts, most educational products were essentially leveraging GPT's API, resembling a ChatGPT Wrapper product. However, we realized that products based solely on text-based Q&A have a limit. It's evident that businesses like Chegg and Gauth are declining, with a significant portion of their scenarios being replaced by ChatGPT, as students can pay $20 and get their homework questions answered through ChatGPT.


The era of API-wrapper-based products with optimization layers has reached its peak.


However, multimodal visual generation has a very promising future, especially in the context of visual learning scenarios such as the American SAT. Unfortunately, Gatekeep led the way but did not continue because it launched a bit early when the foundational model programming abilities were not mature yet, and GPT-4 had not been released. Additionally, the math animation engine involved rendering and algorithms, which they did not conquer. But our team mastered all core development of the animation engine, solved this issue, and made video rendering very accurate.

PMF: Strong User Willingness to Pay


Founder Park: After your product went live, you partnered with several schools. In your opinion, when or which feature made you feel "we got this product right, hit the pain point correctly," and felt you found PMF?


Kai: You can look at it from three dimensions.


First, from a revenue standpoint, VideoTutor has received API requests from 1000 companies to date, including all major educational institutions in the US and even domestic institutions. Furthermore, many schools want to purchase the service. Directly from the consumer side, there is a student's parent who is also an investor. After experiencing the product, he gave it to all relatives and friends to try, and everyone is willing to pay. Then, he somehow got my number from somewhere, texted me, and wanted to invest in us. Consumers have a very strong willingness to pay.


Secondly, from a user demand perspective. Why is one-on-one tutor education in the US so rigid? Because parents believe that one-on-one teaching is effective and are willing to pay for it. Now, multimodal AI technology can humanize the one-on-one teaching effect, providing instant personalized responses. Moreover, live video lessons recorded by teachers in the US online one-on-one teaching are actually no different from AI-generated videos. This is what I call "demand shift." The expensive prerecorded courses purchased by students are no different from what my AI generates. So why not use AI? It has lower costs and better teaching results.


We have received a lot of very positive feedback from students, and many teachers are willing to promote this product. The early completion rate and usage time were particularly good. The 200 seed users we have selected now are from the early accumulation.


Thirdly, it is a matter of product taste and sense. When you keep iterating, from the progress of the entire education industry to the core needs of students and parents paying, and then to the evolution of the product itself, when you think back, the whole logic is closed-loop. So from these three dimensions, you can see that PMF is already sufficient. The most critical thing is that there is a very, very strong willingness to pay.


Collaborated with FIZZ


Founder Park: Many users are willing to pay proactively, and some have reached out to you proactively to invest.


Kai: Right. In the SAT and AP field, the willingness to pay is already strong. The average order value in this field starts at $100 to $200, and offline classes are even more expensive, possibly around $800. In the U.S., there are 2.6 million students taking the SAT, and 37% of these students are willing to pay proactively. This is a market with a very strong willingness to pay and demand. Our product can meet this demand very well.


Founder Park: In the SAT track, for test-takers, will they trust AI as much as a human teacher?


Kai: Currently, AI answering questions at the level of exams like the U.S. SAT and AP rarely make factual errors. In this case, why is it better than an offline tutor? One, it is cheaper, and two, students can ask questions continuously without worrying about asking silly questions or the teacher being impatient. They can learn 24/7 from anywhere.


Moreover, this market is scalable. After completing the U.S. market, we can expand to Canada, the UK's A-Level exams, and so on, where the demand for paid services is very high.


Founder Park: How are you currently considering the paid aspect?


Kai: We offer monthly subscriptions and also a pay-per-performance model. I think AI can now achieve pay-per-performance. We may introduce a package, for example, you pay $799, and we guarantee that your child can score a perfect SAT Math.


Founder Park: But with pay-per-performance, doesn't it still depend on the student's personal initiative?


Kai: This may not be feasible for the national college entrance examination in China because there are many assessment points, over a thousand. However, the U.S. SAT has only 62 test centers, with 50 being regular centers where most students have no issues, and the remaining 12 centers can also be mastered by students. Unless the student has a genuine issue with logical reasoning, there is essentially no situation where they cannot learn. Moreover, the efficiency improvement of AI is very evident.


In fact, many American online tutors also provide this service. You pay the teacher $1800, and the teacher tutors the child, with a success rate of about 100% because the SAT testing site is fixed. As long as the student's intelligence level is normal, there shouldn't be many issues. However, this approach doesn't work for the Gaokao (Chinese National College Entrance Exam) as it cannot be significantly improved in a short period. Additionally, the Gaokao requires creating score gaps and may present difficult questions, whereas the American college entrance exams do not have absolute difficult questions since they mainly test whether you have mastered the knowledge points.


Pay-for-performance is also a model that previous supplementary teachers have used, and having this precondition is necessary.


Founder Park: So, in your pricing, is the model cost a concern? Is it a high percentage?


Kai: The average order value in our field is very high, starting from $69 per month. The model cost is currently very low, so it's not a problem. The education industry is not like the coding field, where everyone is cutting prices because coding requires supporting a lot of context.


Product Targeting High School Students, Web Platform Is Key


Founder Park: I remember you mentioned last time that your first version prototype took only a little over two months to develop. How did you consider the entire development cycle at that time, such as division of labor, deciding which features to include, and which not to include?


Kai: The consensus of our team is that iteration should be fast because speed is necessary to quickly receive feedback from early users.


After the first version was posted on Twitter, it caused a huge sensation and brought in a large number of users. However, many of these users were programmers, investors, or tech enthusiasts, whom we can collectively refer to as "tech early adopters." At that stage, the feedback we received from them was quite scattered and not very valuable. We still needed to sift through these diverse users and identify the truly core seed users, namely high-quality high school students, and then obtain useful feedback through consultations.


The key feedback we received was that the video rendering accuracy must reach 100%, which was the top priority for optimization. Features like UI aesthetics or support for different TTS (text-to-speech) voice selections were all cut. Reverting to the core of the product: what we are doing is knowledge learning in scientific scenarios, so the accuracy of graphic rendering is crucial.


Founder Park: How did you decide on the generation duration at that time?


Kai: At that time, the longest peak duration was approximately 6 minutes. The main consideration at that time was that the explanation of regular questions and key points should not exceed 6 minutes. However, in subsequent feedback, we found that some students with lower learning abilities hoped the content could be explained slower and in more depth. We realized that the duration should not be restricted, and it depends more on the user's learning ability.


Founder Park: What is the longest duration now?


Kai: The longest should be within an hour, and you can keep probing deeper. It's interactive and generated in real-time, but this feature was added recently; it was not available in the initial versions.


Founder Park: Were there any features that you initially considered but later found not so important and decided not to implement?


Kai: For example, an app. At that time, we wondered if we should quickly develop an app. Still, later on, we realized that the majority of American students mainly use laptops or iPads for studying. Most K12 schools in the U.S. provide students with Chromebook laptops, which are widely adopted. The students' homework is also completed on their computers. In high school, almost every student has a computer, and the proportion of smartphones in the learning environment is less than 5%, a very low percentage.


Founder Park: So, if it's a product primarily targeting education or student groups, the web version is more critical to develop first, and the app is not as important.


Kai: Yes, actually, we already knew this data since I studied in the U.S. for many years. Later on, we conducted surveys with a hundred students extracted from the early tens of thousands of users. Among these 100 students, over 90 of them had computers, so we were even more convinced of this point.


Founder Park: When you launched the first version, did you also target the K12 group?


Kai: Yes, and later on, we continued to target this group. We don't consider ourselves competitors with Gauth. We focus more on exam training scenarios. A large number of American high school students already choose offline training or online learning platforms, and VideoTutor effectively transitioned this demand.


Founder Park: Will K12 be your core user group at least within a year?


Kai: It should be a core metric within two years.


Using Large Models, But Not Relying Solely on Large Models


Founder Park: Could you briefly introduce your current technical implementation? In terms of generating courses and graphics, VideoTutor has indeed outperformed other video generation models by a large margin. Even when many models struggle to accurately generate text, your technology is very impressive.


James: The videos we generate contain both text and graphics. The general production process is as follows: we use a large language model to generate text and corresponding animation instructions, which are then rendered through our animation engine and ultimately displayed in the video.


The text part is relatively straightforward; we have the large language model generate the text, which is then rendered directly. However, for the animation part, we have our own mathematical animation rendering engine. Its advantage lies in the high precision of rendering content such as coordinate axes and geometric shapes, which is our core technology.


Currently, the output of the large language model is only text. The agent we have developed is like giving the large language model a piece of paper and a pen, allowing it to draw out the suitable educational animations it imagines. The part that is drawn out is entirely our technology.


Founder Park: How is the entire video synthesis process, including audio and video, handled?


James: Initially, the user provides a prompt, such as "What is the Pythagorean Theorem?" The first step is to have the large language model reason through all scenarios, typically defining 3 to 5 scenarios depending on the question's difficulty. Then, the model generates a rough script for each scenario. Subsequently, based on the script for each scenario, a second round of reasoning is done to generate the text, corresponding graphics, and voice text for each scenario. The voice text is then synthesized using TTS.


Finally, we concatenate all scenarios to create a complete video.


Founder Park: My understanding is that this was the approach for the first version. Now, with the addition of the on-demand interactive process, has the generation process changed?


James: Indeed, there has been a change. Now, in order for users to quickly see the content, we first generate the initial scene for them to view while the subsequent scenes are rendered in the background. When a user asks a question, we convert their speech to text and provide this text, along with the content from previous scenes, to a large language model for reasoning and planning the next teaching scene. The rendering process for the following scenes then proceeds as before.


Founder Park: If a user has a question after listening for a minute, they would ask it directly. Upon receiving the question, do you return the user's question along with previously covered content to the model for processing? During this process, after the user asks the question, does the animation continue or does it pause?


James: Our current latency has been reduced from the initial 20 to 30 seconds to within 5 seconds. In terms of interaction, we implement transitions to ensure that users do not overly focus on these 5 seconds, making the overall process seamless. Within 4 to 5 seconds, the user can see newly presented content based on their question.


The current design involves the AI teacher saying, "Hmm, let me think about it" and then erasing the blackboard, just like a real teacher simulation. If you think there's a problem with what was explained, I'll erase it and write it again for you. This process feels more natural.


Furthermore, we are not just passively waiting for user questions; we also conduct quizzes midway. We reason based on the quiz feedback and user questions. Additionally, rather than being constantly open, we require users to manually enable the microphone, with an action to open and close it.


Founder Park: So, based on this mechanism, you can generate an explanation lasting up to approximately an hour.


James: To be precise, there is no limitation. If a user constantly has questions, they can keep asking.


Kai: Yes, there is no predefined limit. In fact, VideoTutor is heading in this direction as the multimodal AI advances. We are not creating demand but rather better meeting existing needs. Look at offline human education; why are American parents willing to pay high fees? It's because the US education and training industry largely focus on one-on-one teaching, starting at $100 per hour. It's because offline teachers can engage in guided questioning, observe where you lack understanding, and then follow up with questions. VideoTutor also aims to achieve this real teacher's teaching effect, enabling every child to engage in real-time interaction and teaching.


Founder Park: During Founder Park classes, are students required to turn on their cameras?


Kai: Not really. Whether students turn on their cameras mainly depends on US privacy laws. The product is not designed with a mandatory camera-on feature. The decision to turn on the camera is up to the students. The main interaction is still through questioning and verbal feedback.


Founder Park: Technically, do you follow a strategy of using small models in combination with cloud-based large models, or how does it work?


Kai: It's a combination. We have an internal dataset with over 100,000 video data points. The best data in these datasets is manually double-annotated and then used to fine-tune the model. For example, we have over 8,000 SAT sample training data points. These fine-tuned small models work in conjunction with cloud-based general-purpose models like Claude and Gemini.


Founder Park: Will using Claude, Gemini, or GPT impact the core performance of the product?


Kai: We mainly focus on the K12 field, and the level of the base model is already sufficient. However, to ensure 100% accuracy, we use two models simultaneously for verification. If the two models provide the same answer, then there are essentially no errors. Regarding code generation, Claude is primarily used because of its strong coding capabilities.


Founder Park: Where is the current technological bottleneck of the product? Is it in the model's capabilities or in code generation?


Kai: The model's capabilities are one aspect. Another aspect is rendering, which we have managed to reduce to under 5 seconds. With more GPU deployment, it will become even faster. Long-term memory capacity is another challenge. We need to accumulate long-term learning behavior data from students, understand which concepts a student does not grasp, and remind them if they have forgotten a topic learned a month ago.


James: In terms of rendering time, we have made a lot of efforts and continuous technological breakthroughs, from the initial 2 minutes to 1 minute, and now to under 10 seconds. Our ultimate goal is to achieve almost zero rendering delay, where as soon as a user asks a question, the reasoning finishes, and results are immediately displayed. This is a tough challenge our team is currently tackling, but we have found a new direction.


Focus on Exam Results, Not Completion Rates


Founder Park: How do you currently measure the core metrics of your product? How do you determine if a video is helpful to users?


Kai: The most crucial metric is the exam. In the new version, after you watch a video, there will be a quiz at the end. If you answer correctly, it proves you understand; if not, it shows that the concept was not explained clearly.


Evaluating learning effectiveness cannot solely rely on completion rates since some students may grasp the content halfway through. When a student reaches a certain point in the video and demonstrates understanding through a test, they can skip the rest. The core metric of our product is to see how many students have improved their scores through this approach.


Founder Park: However, the final exam is taken in a different setting. How do you obtain the results to determine if they passed?


Kai: This relates to the product culture in the United States, where users who achieve positive results through a product tend to spontaneously share their experiences. Many students who use VideoTutor to prepare for the SAT exam voluntarily come forward to share their experiences and scores. We also engage them as campus ambassadors for further dissemination.


We have a team of 20 high school students serving as campus ambassadors. If you look at Mercor's early success, they primarily used the "user success story" model. Mercor initially helped many Indian programmers secure jobs in the United States. Subsequently, they would reach out to these users, create a user story, and share how they used Mercor to find employment. This created excellent word-of-mouth promotion. The same applies to VideoTutor; we aim for more students to achieve significant results using the product and then share their experiences through user stories.


Founder Park: Where do students primarily share their experiences?


Kai: Students mainly share on TikTok, while parents engage in Facebook groups.


Founder Park: Looking at a six-month or one-year timeframe, what is your planned approach for product growth?


Kai: Fundamentally, VideoTutor is a B2C product where word-of-mouth promotion is crucial. Many successful AI applications initially relied on word-of-mouth from early adopters; for instance, when a designer found a product helpful, they spread the word. For us, the core metric is how many SAT test-takers improved their scores using our product and then shared this success with other students and parents. Parents mainly use Facebook and Instagram, while students prefer TikTok, so we leverage these platforms for dissemination. Once a consensus regarding the quality of our product is established through word-of-mouth, teachers in schools naturally take notice. The reason many schools became aware of us early on is because numerous teachers used the product, found it beneficial, and recommended it to the school's procurement officers. Therefore, the primary focus remains on B2C word-of-mouth promotion and the key metric is how many students improved their scores after using the product.


Founder Park: What is the general status and expected release timing of Founder Park's new version?


Kai: We hope to officially release it within two months at the soonest. By then, students will be able to receive answers with very low latency and the graphical rendering in STEM scenarios will be 100% accurate. Of course, we will not cover competition scenarios or complex university subjects like linear algebra for now. Our focus will be more on the K12 field.


Founder Park: What are the current barriers or moats for VideoTutor?


Kai: I think there are a few points. Firstly, the data flywheel. Behind every video is code, and good user-generated video data, after secondary annotation, can be used to retrain and fine-tune models. The more data, the better the video quality. Additionally, there is learning behavior data. Knowing which topics are weak for different students allows us to establish a data flywheel; the more people use it, the better the product understands the students. Secondly, we have a leading technological advantage, such as the animation engine algorithm. Although the algorithm itself is not the core advantage, with our rapid iterations and increasing data, the advantage will become more apparent.


The third point is the brand. VideoTutor has already become a leading brand in the AI education field among North American parents' circles, and the trust of parents is also an intangible moat.


Founder Park: In three to five years, what do you expect VideoTutor to ultimately evolve into as a product?


Kai: In the future, we hope VideoTutor can become an AI teacher for everyone to learn STEM knowledge. We only focus on STEM. I believe it will surpass Duolingo. Duolingo is a world-class language learning product, but in the STEM field, there has not yet been a world-class product because STEM requires extensive graphical rendering. Now that the foundational model technology is ready, I believe the STEM field will give birth to the next "Duolingo."


We're hiring, especially seeking talent from major domestic tech companies.


Founder Park: You have had several entrepreneurial experiences before. What were they mainly about?


Kai: I am currently a junior. When I was a freshman, I started a business with James to create an educational product and received $200,000 in angel investment. Although that venture failed, I gained valuable experience: you cannot get stuck in homogeneous competition. At that time, we developed an app, but there were many similar products on the market, so we were early on in a price war and it was challenging to charge for the service.


During my second entrepreneurial endeavor, I joined another team, MathGPTPro, as a co-founder and stayed for a few months. During that phase, I learned how to analyze product metrics, how to build products, and how to achieve user growth. It was also at that time that I came to a conclusion: text-based answer-oriented educational products had reached their limit. This was because they were not much different from ChatGPT, and structured knowledge question banks like what platforms such as Homework Help had invested heavily in were also being replaced by the editing capabilities of large models. So, for my third entrepreneurial journey, I knew that visualization was an inevitable trend.


Photo of Zhao Kai pitching at Harvard University with Sam Altman


Founder Park: In addition to realizing the limitations of text-based products through your past experiences, how did those experiences in terms of team or other aspects help you with what you are doing now with VideoTutor?


Kai: It was very helpful.


First, it helped me better assess the direction and future potential of the product. I would assess the overall product evolution by looking at competitor website traffic and revenue.


Second, in terms of product development, it helped me better gauge the pace of product development, including product design, frontend-backend integration, and which metrics to look at.


Third, in terms of team management and organizational culture, it enhanced my ability. I established a more complete management system, including defining each team member's responsibilities, rewards, and equity distribution. Additionally, I learned how to raise funds. We completed this $10 million funding round in less than 20 days.


Founder Park: How many people are currently on your team?


Kai: 6 people, and everyone lives together.


Founder Park: How was the team initially formed?


Kai: James and I have already started two businesses together. We both graduated from the same school and developed an app together during our freshman year. By our sophomore year, I started another business with two other people, and we all got to know each other. When we realized the significant product vision this technology could bring, we contacted each other to form a team to work on this product. Everyone had been alumni, and our team's other partner, Nick, was also my college roommate.


Founder Park: You are now also planning to expand your team. What kind of people are you looking to hire?


Kai: We are mainly looking for roles in backend, frontend, large language models, and UI/UX, preferably with experience. Since we have now passed the trial and error phase and entered the stage of rapid product build, we need experienced individuals to help us grow.


Founder Park: You need experienced engineers, product managers, and growth leads to take the product from 1 to 10, or even from 10 to 100.


Kai: Exactly, that's the stage we are at. We anticipate expanding the team to 9 to 10 people, with a focus on hiring engineers.


This round of hiring may be within the country, so it will be a mix of in-person and remote interviews.


Founder Park: What kind of person do you hope this individual would be?


Kai: We prefer someone who has experience at big tech companies, such as ByteDance or Meituan. ByteDance has a fast-paced, dynamic organizational culture that values young talent. People trained at ByteDance usually have good methodologies and capabilities, and after joining us, they can bring these successful experiences and engage in blended learning.


We are looking for individuals who have faced challenges in top Chinese tech companies, have experience with rapid iteration, and have moved beyond the stage of student entrepreneurship. We are not looking to hire beginners; we need individuals with experience who are not traditional industry veterans. Industry veterans may have family responsibilities that prevent them from fully immersing themselves in work. Therefore, we are looking for individuals in the intermediate level—young, energetic, and willing to commit.


We are willing to offer substantial stock options to outstanding talents. Although we have raised $11 million in funding, why aren't we hiring engineers in the U.S.? It is because we believe that the product prowess and engineering capabilities in China are truly excellent. This wave will definitely see a team run by Chinese entrepreneurs produce great products that will go international. Many AI applications today are built by Chinese individuals, showcasing the formidable engineering capabilities in China. This is our advantage, and we aim to leverage the strengths between the U.S. and China.


Silicon Valley College Students Are All Entrepreneurship in AI


Founder Park: Especially in Silicon Valley now, the trend of college student entrepreneurship is particularly prominent. What kind of situation do you see?


Kai: Let's look at a fact about this round of companies valued at tens of billions of dollars: Mercor, a company focused on AI recruitment, has raised over $3 billion in new funding, reaching a valuation of tens of billions of dollars; meanwhile, Cursor has already reached a solid $10 billion valuation. There are also others like GPTZero, Pika, and so on. These are all student startup projects, especially the founders of Cursor and Mercor are college dropouts in their junior year.


This wave of young entrepreneurship has a common characteristic, which is highly differentiated competition. They focus on a very narrow field and do not work on generic things. For example, Mercor focuses on AI recruitment and initially only recruited Indian programmers.


The second point is the environment. The capital environment of the entire Silicon Valley and grassroots innovation, such as Stanford, YC, Peter Thiel's fund, all support college student entrepreneurship at the earliest stage, regardless of whether you have a mature idea or not, and are willing to support you, providing a powerful network of connections.


The third point, I think, is the quality of these college students. Whether it's us or those who come out of Silicon Valley, we all have a very brave spirit of adventure and a strong ability to learn. This adventurous spirit, which dares to explore, may not be possessed by many students in China. Because in Silicon Valley, you are inspired by many successful cases around you when you see people of the same age succeeding, and the capital environment is also willing to believe in young people.


For me, I also compared the costs and benefits at that time. If I chose to finish university and then find a job, I might not be able to repay the cost of my study abroad, and I might not have a great return on investment. But if I chose to start a business, I could go crazy learning at a very young age, and my life would have infinite possibilities. I have always wanted to create a great company since I was young.


Founder Park: Why can today's generation of college students start companies worth tens of billions of dollars, while in the past, selling a company for one or two million dollars was considered very remarkable? Is there an AI hype and bubble factor in this?


Kai: I don't think it's entirely a bubble. Cursor has $4.5 billion in actual revenue, which is very reliable. Behind this, the methodology and cognitive insight of this young generation of teams are crucial. Look at these teams; their backgrounds are quite outstanding, and they have a very good learning ability.


Cursor relied on college student programmers early on, and these individuals have a high acceptance of AI, providing strong feedback. The founder himself is a little genius engineer who can deeply understand users, with strong engineering iteration capabilities. Early on, the four of them got the product up and running. Once they iterated the product well, they built a good reputation with users, gained revenue, and investors were afraid of missing the next Mark Zuckerberg, so capital came to support them.


At the most fundamental level, the key condition is that many of the technologies in this wave of AI are new, and young people learn quickly, are practical, reliable, and daring, allowing for an extreme user understanding and an ultra-fast iteration speed to surpass traditional products. For example, before Cursor, GitHub Copilot also did quite well, but why didn't it succeed? It was because of user experience and execution speed.


Founder Park: Can we say that because AI is a new technology, many products also need to be seen from a new perspective?


Kai: Yes, this younger generation has deeper cognitive insights than the previous generation of entrepreneurs and can be closer to users. The mainstream AI users now are post-2000-borns, and their learning speed, feedback iteration speed, and tolerance are all faster than the previous generation of entrepreneurs.


Therefore, cognitive iteration speed is key. In the era of mobile internet, technology iteration was based on years or quarters, but in the AI era, technology iteration could be based on days. As a founder, you must learn quickly, and young people can stay up late, and they are more competitive.


Founder Park: Some media said that many Silicon Valley founders have also started working 996, what is your view on this?


Kai: Some of my white entrepreneur friends, who have raised a lot of money, also work 996. They are like us, renting a big house, where everyone lives and works together. I think 996 is more a forced environment. Nowadays, Silicon Valley is a bit like a gold rush, and no one wants to fall behind, so the only way is to iterate products quickly and must work late nights for rapid iteration. This is a kind of environment that forces people to do so.


Founder Park: Do these college student entrepreneurs in Silicon Valley have any trends in their track selection?


Kai: I think whether it's our education or others, everyone has a trend, which is to start a business within their comfort zone. The comfort zone refers to your sufficient understanding of the field and users. Cursor's founder has a deep understanding of coding, and we also do education because we have a good understanding of this group of people. Nowadays, young people are more likely to start a business within their existing cognitive comfort zone, rather than hastily jumping into an unfamiliar field. Because this way, you will receive user feedback quickly and accurately enough.


There is also cognitive stacking. We have done education three times, and my understanding is continuously stacking. These college students are not likely to rashly do something they have never done before but instead focus on how to do it better. They have a new generation's way of thinking, continuously iterating within their cognitive circle, and are brave in creating opportunities.


Another point is the spirit of daring exploration, not easily swayed by others' negativities, with an "I don't care what you think about me" attitude, very self-assured. Behind this is the culture of "rapid experimentation," where I know my product is not yet ready, but I don't care, quick to launch, quick to iterate, quick to receive feedback.


Founder Park: When did this trend start?


Kai: I think it's a consensus-driven success. When everyone sees projects like GPTZero that grow from dorm rooms, continuously iterate, and then receive capital support and user recognition, with many such successful cases of rapid trial and error and rapid growth, a consensus is formed.


In one sentence, "Better done than perfect," completion is more important than perfection. Moreover, people are not too worried about competition; many founders in Silicon Valley are willing to share their product concepts, not afraid of being copied, as long as they iterate quickly. I think this wave of young people also has a good ability in storytelling. This storytelling is not empty talk but is based on practicality and truth-seeking, coupled with their outlook on the future.


Founder Park: Market yourself first.


Kai: Yes. I think the underlying concept is the spirit of adventure and extreme confidence. Driven by this, they constantly dare to make mistakes and are not afraid to speak incorrectly. Boldly articulate their product concepts, boldly execute, and if they make a mistake, they can always correct it. This culture of not being afraid to make mistakes has led to the current wave of college student entrepreneurship and success.


VCs in the United States also look at projects by college students, and Y Combinator invests in a few college student projects every batch.


Fundraising is the Last Thing VideoTutor Needs to Worry About Now


Founder Park: If you could go back to when you first started VideoTutor, what advice would you give yourself? What areas could have been improved?


Kai: I think it should be to move faster. Also, team composition. The VideoTutor team has gone through many rounds of tempering. If I knew earlier, I would have assembled the team better based on the skills required for the product. I believe that at the end of the day, organizational ability is crucial to entrepreneurship. I would spend more time on organizational ability: selecting people, recognizing talents, and effectively utilizing people.


The current team is suitable for growing from 0 to 1, but to scale VideoTutor further, it is necessary to bring in more experienced individuals who can contribute their excellent expertise and skills to the team, helping the entire team grow together.


Founder Park: In the next six months, what kind of product or technical challenges do you think VideoTutor might encounter?


Kai: I think one challenge is rendering, aiming to achieve real zero latency, which will require an engineering breakthrough. The second point is related to growth. I believe it revolves around the product's taste, which encompasses many aspects such as whether the UI and interaction design are smooth and flawless, if the features are bug-free, and if the visual layout is appealing, among others. All of these are tests for us.


James: Initially, our positioning for VideoTutor was visual teaching aids for all subjects, but later on, we became very vertical, focusing only on the field of mathematics because that is where we excel. Our math rendering engine is top-notch. The next key breakthrough we need to make may lie in horizontal expansion. For instance, how can we bring the advantage of visualization to scenarios in humanities? For example, explaining "Ploughing at noon, sweat falls on the soil beneath the wheat." This is a technical consideration for us moving forward.


Founder Park: Do you think the founders' background might pose challenges for future expansions?


Kai: Not really. In fact, many major VCs have approached us, such as a16z, who won't invest too early but rather when the team has shown signs of success, so they know the investment won't fail. We have maintained very good relationships with many top VCs.


Funding is the least of VideoTutor's concerns; the most critical areas of focus are the user ecosystem and the product.


Original Article Link


Welcome to join the official BlockBeats community:

Telegram Subscription Group: https://t.me/theblockbeats

Telegram Discussion Group: https://t.me/BlockBeats_App

Official Twitter Account: https://twitter.com/BlockBeatsAsia

This platform has fully integrated the Farcaster protocol. If you have a Farcaster account, you canLogin to comment
Choose Library
Add Library
Cancel
Finish
Add Library
Visible to myself only
Public
Save
Correction/Report
Submit