Cloudflare is launching a new set of tools called AI Audit to give website owners and bloggers better control over their content, and how it’s being made available to automated systems. With the security tool, users will be able to block or allow AI bots that crawl web sites for scraping data. This tool will be free for all its existing customers. There will also be a feature that will allow website owners to see detailed analytics of which bots are visiting the platform and their behavior patterns.

With surging popularity of generative AI technology, there is a rush to train large language models (LLMs) with human-created data. This not only lays down the foundation model but it also helps advance and improve it. The main problem, though, is accessing the publicly available data sources. Most AI firms have already passed through these datasets, and now they need more data to train AI models.
AI bots have been specifically designed to help developers gather more information for training AI models. It is, in fact, a simple programme that mimics a real user who can enter websites and copy text, image, and video data. These AI bots can scrape through this huge amount of data within a short period of time and deliver it to the AI model. Indeed, in recent times, many media firms, along with the biggest websites, have filed several court cases against AI firms citing accusations of plagiarism and illegally using data feeds to power LLMs.
Cloudflare’s AI Audit tool comes as a shield that can protect such bots from gaining access to your website. In its announcement, the company also made public notice of working on the tool to enable users to have leeway over which bots are restricted from the platform, and which ones may be allowed access. This is useful when the platform has an understanding with an AI firm and does not care if the bots use the data. On the other hand, the owner of the website may wish to share specific AI models attributed with the origin of the data for more extensive reach.
Cloudflare also pointed out that it is working on a workflow where owners of websites can put a fair price on their content. On the other hand, the firewall owners can transact with this and after the payment of such amount, they would be authorized to scan content. Significantly, the firm has noted that its marketplace-like tool would be helpful for the users who would be quite unable to afford the bandwidth or resources to negotiate and drive such deals with each firm that approaches its website.
Bhupendra Singh Chundawat is a seasoned technology journalist with over 22 years of experience in the media industry. He specializes in covering the global technology landscape, with a deep focus on manufacturing trends and the geopolitical impact on tech companies. Currently serving as the Editor at Udaipur Kiran, his insights are shaped by decades of hands-on reporting and editorial leadership in the fast-evolving world of technology.




