Feedback submitted

Crawler Strict Limitations

Related products:AI

Forum|Forum|11 months ago
May 28, 2025
3 replies
13 views

Nicole16

During the configuration of the Zendesk crawler, we are encountering issues with the validation of sitemaps generated by e-commerce platforms such as Bigcommerce and Shopify.

The validation policies seem to be too restrictive, preventing the proper indexing of content.

Specific examples:

Bigcommerce: The main sitemap is available at the URL example.org/xmlsitemap.php. This URL is rejected because it does not have the .xml extension, even though it is a valid XML file. Even when specifying more granular URLs like example.org/xmlsitemap.php?type=products&page=1 to index only products, the system continues to reject the URL both for the .php extension and the presence of URL parameters.
Shopify: The main sitemap is available at example.org/sitemap.xml, which is a sitemap index. When trying to index only the products via URLs like example.org/sitemap_products_1.xml?from=7417528942729&to=14935079453047 , the system rejects it due to the presence of URL parameters, even though the .xml extension is correct.

These restrictions prevent us from fully leveraging the crawler's features to effectively index our content.

We kindly ask that you review the sitemap validation policies to allow the use of URLs without the .xml extension (when the content is actually XML) and to properly handle URL parameters, especially when they are used to filter specific content within a sitemap file.

Thank you for your attention.

Shawna James
Community Manager
Forum|Forum|11 months ago
May 30, 2025

Hey Nicole,

Thank you for taking the time to provide us with your feedback. This has been logged for our PM team to review. For others who may be interested in this feature request, please add your support by upvoting this post and/or adding your use case to the comments below. Thank you again!

Like

Tiphaine
Forum|Forum|5 months ago
December 6, 2025

Thank you Nicole for raising this exact issue.

I have just migrated my Shopify store support to Zendesk and am encountering the same problem with my Shopify sitemaps.

The strictness of the crawler's validation policies is severely limiting our ability to properly index product content. Specifically:

URL Parameters Rejection: browsers (and crawlers) consider any parameters after the ‘?’ (e.g., used for filtering specific sitemap sections like products) as part of the file name.
Overly Restrictive Regex: It appears the Regular Expression (Regex) used by Zendesk to validate XML sitemap URLs is far too restrictive and does not accommodate standard e-commerce sitemap structures (Bigcommerce, Shopify, etc.).

These limitations completely defeat the purpose of using the crawler for large e-commerce catalogs.

I strongly support the request to review the sitemap validation policies.

Could you update us on this?

Big thanks,

Tiphaine

Like

K

KROB
Newcomer
Forum|Forum|5 months ago
December 10, 2025

I agree that we need more control over the crawler. Strictly basing it on the sitemap is very limiting. It would be great if we could manually add or remove pages that are not included as part of the sitemap as additional content sources. In our case, we exclude some pages from our sitemap or we have tags in the head so they don't appear on Google or other search engines, but those content sources would still be useful for our AI Agent. Just because we don't want a page to be indexed on Google doesn't mean that it's not a relevant content source for an AI Agent.

KR

Like

Sign up

Login with SSO

Login to the community

Login with SSO

Scanning file for viruses.

This file cannot be downloaded