Crawler Strict Limitations | The place for Zendesk users to come together and share
Skip to main content
Feedback submitted

Crawler Strict Limitations

Related products:AI
  • May 28, 2025
  • 3 replies
  • 13 views

During the configuration of the Zendesk crawler, we are encountering issues with the validation of sitemaps generated by e-commerce platforms such as Bigcommerce and Shopify.

The validation policies seem to be too restrictive, preventing the proper indexing of content.

Specific examples:

These restrictions prevent us from fully leveraging the crawler's features to effectively index our content.

We kindly ask that you review the sitemap validation policies to allow the use of URLs without the .xml extension (when the content is actually XML) and to properly handle URL parameters, especially when they are used to filter specific content within a sitemap file.

Thank you for your attention.

3 replies

Shawna James
  • Community Manager
  • May 30, 2025
Hey Nicole,
 
Thank you for taking the time to provide us with your feedback. This has been logged for our PM team to review. For others who may be interested in this feature request, please add your support by upvoting this post and/or adding your use case to the comments below. Thank you again!

Tiphaine
  • December 6, 2025

Thank you Nicole for raising this exact issue.
 

I have just migrated my Shopify store support to Zendesk and am encountering the same problem with my Shopify sitemaps.
 

The strictness of the crawler's validation policies is severely limiting our ability to properly index product content. Specifically:

  1. URL Parameters Rejection: browsers (and crawlers) consider any parameters after the ‘?’ (e.g., used for filtering specific sitemap sections like products) as part of the file name.
  2. Overly Restrictive Regex: It appears the Regular Expression (Regex) used by Zendesk to validate XML sitemap URLs is far too restrictive and does not accommodate standard e-commerce sitemap structures (Bigcommerce, Shopify, etc.).

These limitations completely defeat the purpose of using the crawler for large e-commerce catalogs.

 

I strongly support the request to review the sitemap validation policies.

 

Could you update us on this?

Big thanks,

 

Tiphaine


  • Newcomer
  • December 10, 2025

I agree that we need more control over the crawler. Strictly basing it on the sitemap is very limiting. It would be great if we could manually add or remove pages that are not included as part of the sitemap as additional content sources. In our case, we exclude some pages from our sitemap or we have tags in the head so they don't appear on Google or other search engines, but those content sources would still be useful for our AI Agent. Just because we don't want a page to be indexed on Google doesn't mean that it's not a relevant content source for an AI Agent.