As previously mentioned, OpenAI has just announced a new product, the SearchGPT search engine. However, not all websites are supported by this tool for user searches and data access. You need to know how to guide OpenAI Crawlers to access your website’s data, helping the AI retrieve and provide information for users to search, exploit, and access.
Benefits of Adding a Website to SearchGPT
There are many benefits to adding a website to SearchGPT, including:
- Increased Visibility: Your website will be displayed to more users when they search for information related to the content on your website.
- Improved Search Ranking: SearchGPT uses AI to evaluate the quality of your website and rank it accordingly.
- Attract More Potential Customers: When your website is ranked higher, you will attract more visitors.
How to Add a Website to the SearchGPT Search Engine
OpenAI uses web crawlers (“robots.txt – a file used to control the crawling/access of your website’s data, placed in the root directory of your website”) and user agents to perform actions for its product, either automatically or triggered by user requests. OpenAI uses the following robots.txt tags to allow web administrators to manage how their websites and content interact with AI. Each setting is independent of the others – for example, a web administrator can allow OAI-SearchBot to appear in search results while disallowing GPTbot from using the collected content to train OpenAI’s foundational generative AI models. For search results, it should be noted that it may take about 24 hours from updating the site’s robots.txt for OpenAI’s system to exploit and retrieve information for users.
- OAI-SearchBot: Used to support search. OAI-SearchBot is used to link to and display websites in search results in the SearchGPT search engine. It is not used to collect content for training OpenAI’s foundational generative AI models. To help ensure your site appears in search results, you need to allow OAI-Searchbot in your site’s robots.txt file and allow requests from OpenAI’s published IP ranges as follows:
User-agent: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; OAI-SearchBot/1.0; +https://openai.com/searchbot)
IP Address: https://openai.com/searchbot.json
- ChatGPT-User: For user actions in ChatGPT and Custom GPT. When users ask ChatGPT or Custom GPT a question, ChatGPT or Custom GPT may access the website to answer and include a link to the source in its response. ChatGPT users can also interact with external applications via GPT Actions. ChatGPT-User manages the websites where these user requests can be made. It is not used to collect web data in any automatic way and is also not used to collect content data for generative AI training:
User-agent: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot)
IP Address: https://openai.com/chatgpt-user.json
- GPTBot: Used to make OpenAI’s foundational generative AI models more useful and safe. It is used to collect content that can be used in the training of OpenAI’s foundational generative AI models. Disallowing GPTBot indicates that the website’s content should not be used in the training of foundational AI models:
User-agent: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.1; +https://openai.com/gptbot)
IP Address: https://openai.com/gptbot.json
For example, you can create a robots.txt file for a WordPress website as follows:
User-agent: *
Disallow: /wp-admin/
User-agent: Bingbot
Disallow: /
User-agent: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; OAI-SearchBot/1.0; +https://openai.com/searchbot)
Allow: /
According to the above configuration, all bots will be blocked from accessing /wp-admin/, Bingbot will be blocked from accessing your entire site, and OAI-SearchBot is allowed to operate.
Wish you success in adding your website to the SearchGPT tool and knowing how to manage AI to exploit and use the data on your website appropriately according to your desired goals.