AI in the policy sector: addressing hallucinations

In PolicyTech, accuracy and reliability are not optional

Sonia Polidori

15 minutes to readOctober 23, 2025

EU policy professional working on notebook

AI & TechnologyAbout ThembiFeatures

The integration of artificial intelligence into governance processes and public policy is becoming increasingly common. Systems such as ChatGPT and other language models offer new opportunities to improve administrative efficiency, data analysis, and communication with citizens. However, like any technology, these tools have specific characteristics that need to be understood in order to use them effectively.

One such characteristic is the phenomenon of “hallucinations”, or the tendency of language models to generate information that appears plausible but is not necessarily accurate or verifiable. In the context of public applications, where the accuracy of information is particularly relevant, it becomes essential to understand this phenomenon and develop approaches to manage it.

Hallucinations in language models: a technical feature

Hallucinations in language models are an intrinsic technical feature of current architectures, based on the Transformer architecture introduced in 2017. This model, based on the self-attention mechanism that allows the model to process text sequences by considering the relationships between all words simultaneously, has revolutionized the field of natural language processing, but retains a probabilistic nature in text generation. This feature should be understood and managed rather than completely eliminated, as it is structural to current generation models.

Language models such as GPT work through a probabilistic process that analyzes patterns in training data to predict which word should follow in a sequence. This approach allows them to generate fluent and coherent text, but also carries with it the possibility of producing content that does not correspond to verifiable facts.

In the context of policy applications, hallucinations can manifest in different forms. The model may generate statistical information that appears credible but does not correspond to real data, such as approval ratings or fabricated demographics. Similarly, the system may refer to non-existent laws, decrees, or regulations, or attribute incorrect content to real regulations. It is not uncommon for the model to attribute statements or positions to public figures who have never expressed them, or to describe bureaucratic procedures and legislative processes inaccurately.

It is important to understand that hallucinations are not "errors" in the traditional sense, but a direct consequence of how these models work. As highlighted in technical literature, language models have no concept of what they do not know and tend to generate responses even when the requested information is not present in their training data. This behavior stems from their probabilistic nature: being guided by the search for the highest probability, they always exhibit a certain assertiveness that can lead to conclusions not supported by facts.

Applications in PolicyTech: opportunities and considerations

Despite these structural weaknesses, artificial intelligence is still used in contexts where precision is essential. How? In the public sector, for example, AI is used for document analysis, which is useful in the drafting of legislative texts, research reports, or administrative documentation, and to support the drafting of official communications and information materials for citizens. These systems are also used for research and analysis, helping to identify trends, correlations, and relevant information in large amounts of data, as well as to provide procedural assistance and guidance on administrative processes and available public services.

The use of AI in these contexts requires some specific considerations. The information generated should always be verifiable through official sources, and users should be aware of when they are interacting with an AI system. Furthermore, artificial intelligence works best as a support tool rather than a substitute for human judgment, always maintaining the need for supervision and control by experts in the field.

Strategies for reducing hallucinations

One of the most effective approaches to managing hallucinations is Retrieval Augmented Generation, commonly known as RAG. This method combines the generative capacity of language models with access to verified knowledge bases or useful information that is included in the context (in the prompt), representing a particularly effective solution for all applications where strict adherence to the source data is required, including applications in the public sector.

The system is based on the interaction of three main models. The first is the knowledge base, which represents the set of documents and sources that the system can refer to. This base can include legislative texts, institutional reports, judgments, regulations, but also images, tables, or graphs if enriched with multimodal content. It is necessary to store the data in a representation that is as appropriate as possible for the type of use that will be made of it by language models. Often, this also involves the use of vector representations.

The information is not simply stored, but transformed into vector representations (called embeddings) that allow texts and questions to be compared based on meaning, and not just on the exact words used.

The second is the retrieval model, which searches for the most relevant information within the knowledge base when the user asks a question. This model has the task of understanding what the user wants to know and finding the most relevant parts of text in the corpus. Depending on the level of complexity of the corpus, which consists not only of the number of documents, but also of the number of dimensions by which the information can be filtered and selected, it is often not possible to limit oneself to keyword searches, but a true art of “knowledge description” may be necessary, including knowledge of the semantic level, recognizing similarities in meaning even when the formulations are different. For example, it is able to associate a question about ‘tax benefits for start-ups’ with a document that talks about ‘tax breaks for new businesses’, even if the words do not match exactly.

Finally, there is the generative model, which receives the retrieved documents as input and constructs the response. This linguistic model is trained to use only the information contained in the documents provided, avoiding reliance on previously learned knowledge or unjustified inferences. To obtain accurate and reliable answers, it is essential to guide the model correctly through a well-constructed prompt, which clearly instructs it to stick to the sources provided and not to invent anything.

This synergy between models brings several advantages. The accuracy of the responses improves because they are based on real, up-to-date data. All information is traceable, meaning that the document from which it was extracted can always be traced back. In addition, the knowledge base can be easily updated without having to retrain the entire system. This makes RAG flexible and adaptable to different domains: from legal and healthcare to complex administrative contexts.

To mitigate the risk of hallucinations in this area, the implementation of RAG architectures is essential. These technologies ensure that the system's responses are always anchored to verifiable documents in the knowledge base, limiting the possibility that the model will generate information not supported by reliable sources. The RAG system first retrieves relevant documents from the legal archive and then uses only these sources to formulate the response, creating a direct and traceable link between the query and existing regulatory content.

Ultimately, the challenge of hallucinations in large language models can be effectively addressed through a systematic and collaborative approach. The key to mitigating this phenomenon lies in creating an accurate representation of knowledge, developed in close collaboration with domain experts.

For critical applications, it is useful to implement control systems that include comparison of generated information with official databases, expert review for the most sensitive information, and mechanisms for collecting reports of inaccuracies from users. These additional controls create a multi-layered verification system that can identify and correct any inaccuracies before they reach end users.

Intelligent querying of knowledge bases

In the context of public law, the effective implementation of AI systems for advanced querying of legal knowledge bases requires a structured approach based on principles of gradualism and complementarity. Artificial intelligence systems enable legal practitioners to perform sophisticated semantic searches within vast regulatory and case law archives, overcoming the limitations of traditional keyword searches. Implementation should begin with low-risk applications, such as searching for precedents in established areas, and then gradually expand usage as experience with the system is gained. Through natural language processing techniques, it is possible to formulate complex queries in natural language to identify relevant precedents, identify regulatory conflicts, or extract legal principles from extensive document corpora. The principle of complementarity emphasizes the importance of using these tools as a support to human work, always maintaining the supervision of the legal operator, while transparency requires clearly communicating to users when and how artificial intelligence is used in the search process.

Ingestion and structuring of unstructured documents

The implementation of systems for transforming unstructured documents into organized information systems must follow a structured development process that includes consecutive phases of evaluation, design, and pilot deployment. AI systems are capable of automatically processing large volumes of heterogeneous documents - judgments, legal opinions, administrative acts, resolutions, ministerial circulars - extracting their informational content and transforming it into data structured according to predefined legal taxonomies. The initial assessment phase identifies the most appropriate document areas for the application, while the system design involves choosing the most suitable technologies and defining the overall architecture for document ingestion. This intelligent digitization process is not limited to simple scanning, but includes the automatic identification of legal entities, classification by type of document, and the extraction of relevant dates and regulatory references. Continuous improvement involves implementing mechanisms to collect feedback on the quality of structuring and constantly refine classification algorithms, creating consistent and semantically rich databases that enable cross-sectional analysis and facilitate the identification of regulatory and jurisprudential trends.

In this context, hallucination mitigation is addressed through post processing techniques, where processing takes place on controlled infrastructures that guarantee the traceability and verifiability of each stage of the document ingestion process. Host processing allows for the implementation of rigorous quality controls, automatic validation of extracted data through cross-checking, and rollback mechanisms that allow for the identification and correction of any classification or extraction errors. This architecture ensures that each processed document maintains a verifiable transformation chain, significantly reducing the risk of generating incorrect metadata or classifications.

Conclusions

Hallucinations can be effectively mitigated through approaches such as Retrieval Augmented Generation, appropriate prompting techniques, and structured verification systems.

User education is an important element for successful implementation. Those who use AI systems in the public sector need to understand what the system can and cannot do effectively, how to formulate effective requests and interpret responses, and in which situations it is particularly important to verify the information provided. This understanding helps to set realistic expectations and use the technology more effectively. Training should include developing specific skills for interacting with AI systems, critically evaluating the responses generated, and understanding how to structure and maintain knowledge bases, as well as awareness of the ethical implications and principles of transparency and accountability in the public sector.

The implementation of AI systems in PolicyTech requires a balanced approach that considers both the opportunities and specific characteristics of these technologies, representing an opportunity to develop more efficient and accessible services for citizens. With the right combination of technical tools, organizational processes, and human skills, it is possible to fully exploit the benefits of artificial intelligence while maintaining high standards of accuracy and reliability.

The goal is not to achieve technical perfection, but to develop systems that are useful, reliable, and appropriate for the contexts in which they are used. In this sense, understanding hallucinations and strategies for addressing them is a fundamental element for the responsible adoption of AI in PolicyTech applications. Ultimately, the key to effectively implementing AI systems in language models lies in creating an accurate representation of knowledge, developed in close collaboration with domain experts. This synergy between technical expertise and specialist knowledge ensures reliable systems that are truly useful for end users.

Stay in the loop with Thembi

Thembi helps you stop chasing data and start driving policy work.

Thembi.ai

AI in the policy sector: addressing hallucinations

Hallucinations in language models: a technical feature

Applications in PolicyTech: opportunities and considerations

Strategies for reducing hallucinations

Intelligent querying of knowledge bases

Ingestion and structuring of unstructured documents

Conclusions

Stay in the loop with Thembi

Other signals

EU policymakers and social media

From compliance to competitive edge

Mapping the EU policy landscape: who’s who and why it matters