Sitemap

The 8 Billion Person AI Problem

7 min readOct 4, 2025
Press enter or click to view image in full size
Photo by Timon Studler on Unsplash

TLDR: AI permeates daily life through tools like GPS and chatbots, but its types — generative, predictive, and argument mining — are not interchangeable for every problem. Larger models with more parameters can underperform if fed irrelevant or low-quality data, degrading outputs. Even with complete global data, predictive AI can’t guarantee certainty, so success hinges on asking the right questions.

The media and the world agree: AI is here. AI has actually been around for a long time, with AI applications dating back to the 1960s and earlier. However, it was not until recently that consumer-focused AI products became visible to users. These products shift AI technology from background processes we take for granted to everyday tools with explicit AI labels.

Now that AI is here, it is everywhere. It’s in your GPS, helping guide you to work during rush hour. It allows recruiters to prescreen thousands of applicants to find an optimal dozen interviewees. It summarizes a lengthy email into a few sentences to help you clear out your inbox quicker.

AI is sold as a solution for any problem. The list of applications above appears to prove this assumption. However, there are many problems that AI cannot and will not ever be able to solve. Let’s talk about the eight billion-person AI problem and issues to avoid if you want to implement AI in your work and life successfully.

What are popular applications of AI today?

AI is not a new entrant into your life. As mentioned, AI is part of technologies such as GPS, which guides you to places, the demand manager that regulates electricity to your house, and the voice assistant on your phone.

The applications are endless, but the underlying technology is not interchangeable. AI is a broad term that encompasses various technologies. To better understand the applicability of AI to specific problems, it is essential to be aware of the various types of AI.

Some standard AI technologies include generative AI, predictive AI, and AI argument mining. This article won’t create an exhaustive list, but it is essential to acknowledge that differences in AI exist.

Generative AI drives the current AI interest boom. Consumer tools like ChatGPT and Grok use large language models (LLMs) to turn user inputs into text outputs. They generate content based on their training data.

Press enter or click to view image in full size
Figure 1. Examples of generative AI for different sensory outputs.

Predictive AI is similar to generative AI but focuses on returning answers based on datasets rather than generating content from queries. Generative AI is well-suited for tasks such as optimizing revenue from insurance risk pools or planning electricity distribution for grid load management.

The last AI tech we will discuss is AI argument mining. This technology accepts user input and attempts to reason with the text to provide the most relevant response. It uses a deeper context than generative AI to identify factual and logical issues with an input. This approach returns more than a simple generative response.

An example of AI argument mining technology is a call center assistant app. This app receives real-time inputs to a call center operator and assists the operator in providing the best service and upselling customers based on the current conversation.

Figure 2. AI Argument mining process diagram.

While all of these technologies leverage AI, including LLMs, their actual processes vary greatly. Not all AI is the same and is not interchangeable for solving problems. The idea of artificial general intelligence (AGI) is not yet available because AI applications are so use-case specific. To solve a problem, you must choose the right AI to apply.

Is smaller always worse for AI models?

AI companies highlight benchmark metrics when proving that their LLM is the best. Parameter size is a widely recognized and easily understood metric. This variable refers to the number of data points used to train the model.

Press enter or click to view image in full size
Figure 3. Different size of models based on their parameters.

As with many other consumer products, the more, the better. This thought is not always the case, especially with parameters. The more a parameter appears in your LLM’s dataset, the more likely it will be used when generating an output.

While you need a certain number of parameters to have a functional LLM that models human speech, having more data for the sake of more data can degrade model performance. There’s a saying with manufacturing engineering that you can’t add quality to a product at the last step. A quality product comes from quality actions at every step.

More data does not ensure good data. Obtaining sufficient and accurate data to train models is a challenge when creating a domain-specific export. Let’s consider two ways that more data could be worse for your model.

First, more data that is irrelevant to your model does not make it more capable than a smaller parameter model. Imagine having 20% of the model be non-applicable data for the sake of more parameters. How would this look?

You have a model used for stock trading. Your product manager suggests that you need a larger parameter size to sell a new model to customers. You increase the size of the model by 15% by adding philosophy texts and articles. These parameters increase the size of the text, but will primarily be unused by the model. The new parameters offer no new value.

Press enter or click to view image in full size
Figure 4. Expanded model size with unrelated data.

Another way data can harm performance is the inclusion of poor-quality data. Not all data is equal. A well-researched book on a topic is more impactful than 10,000 internet comments on the same subject. By adding both the comments and the book to your model, you have increased the data on the topic. However, you have also degraded the quality of the model’s outputs.

The book may have one passage on a question, but there may be 50 comments on that same question. With equal weighting, you likely base LLM responses on the comments rather than the authoritative output of the book.

What is the problem with predictive AI?

Adding more data to your LLM does not always result in a better model or more accurate responses. Sometimes the extra data is benign and slows down the response processing. On a more harmful side, the additional data may shift weights on your LLM and produce poor or incorrect responses. In either case, the extra data does not make the LLM more powerful or better.

Good AI applications use AI to answer the right question. The request “tell me what my wife wants for dinner from the following text” will have a higher chance of success than requesting “tell me the winner of the 2026 Super Bowl.”

One request involves manipulating data that already exists in context, while the other attempts to predict an event without all the necessary information.

AI outputs are not deterministic, meaning input X will not always yield output Y. This can be due to variables in your control, like output temperature. The outputs also rely on variables such as the LLM’s parameters, the LLM’s system prompt, and the form of the requests.

We may encounter incomplete or biased data sets even when accounting for output variance. Samples are only helpful if they correctly represent the population of the study.

If you wanted to know how people in your neighborhood felt about a shared landscape service, you could ask 5 of the 20 houses. This method could work, unless you ask the five wealthiest individuals, which would skew your results. You get the most certain results by surveying the entire neighborhood. We usually don’t approach problems this way because the process is time and resource-intensive.

What if time and cost were not a concern when gathering data? What if we solved this sampling issue by getting all relevant data? If we had all the information on all 8 billion humans in the world, could we answer any question about them?

The answer is no. AI is really good at generating answers and predictions, but struggles with accurately calculating certainty. Even if we had every human fingerprint on Earth, there is no way to say for sure what the next human being’s fingerprint would be.

Press enter or click to view image in full size
Figure 5. Attempting to use previous unique fingerprints to predict new fingerprint.

To get good results from AI, you must ask the right questions. When deciding to use AI, ask whether having all the data on your population of items will produce a particular outcome. As with all things AI, be aware of its strengths and limitations and apply them accordingly.

Key Takeaways

  • Choose AI wisely: Not all AI technologies (e.g., generative vs. predictive) are interchangeable; match the type to your specific use case for best results.
  • Bigger isn’t better: Adding more parameters can harm model performance if the data is irrelevant or low-quality, like mixing stock trading info with philosophy texts.
  • Prediction limits: AI struggles with certainty — even with data on all 8 billion people, it can’t reliably forecast unknowns, so focus on contextual, solvable queries.

--

--

Daniel Pericich
Daniel Pericich

Written by Daniel Pericich

Full Stack Software Engineer writing about Web Dev, Cybersecurity, AI and all other Tech Topics 🔗 [Want to Work Together] https://www.danielpericich.com

No responses yet