# As a condition of accessing this website, you agree to abide by the following # content signals: # (a) If a content-signal = yes, you may collect content for the corresponding # use. # (b) If a content-signal = no, you may not collect content for the # corresponding use. # (c) If the website operator does not include a content signal for a # corresponding use, the website operator neither grants nor restricts # permission via content signal with respect to the corresponding use. # The content signals and their meanings are: # search: building a search index and providing search results (e.g., returning # hyperlinks and short excerpts from your website's contents). Search does not # include providing AI-generated search summaries. # ai-input: inputting content into one or more AI models (e.g., retrieval # augmented generation, grounding, or other real-time taking of content for # generative AI search answers). # ai-train: training or fine-tuning AI models. # ANY RESTRICTIONS EXPRESSED VIA CONTENT SIGNALS ARE EXPRESS RESERVATIONS OF # RIGHTS UNDER ARTICLE 4 OF THE EUROPEAN UNION DIRECTIVE 2019/790 ON COPYRIGHT # AND RELATED RIGHTS IN THE DIGITAL SINGLE MARKET. # BEGIN Cloudflare Managed content User-Agent: * Content-signal: search=yes,ai-train=no Allow: / User-agent: Amazonbot Disallow: / User-agent: Applebot-Extended Disallow: / User-agent: Bytespider Disallow: / User-agent: CCBot Disallow: / User-agent: ClaudeBot Disallow: / User-agent: Google-Extended Disallow: / User-agent: GPTBot Disallow: / User-agent: meta-externalagent Disallow: / # END Cloudflare Managed Content # ==================================================================== # ULTIMATE ROBOTS.TXT - CCTV ONLINE KOTA PEMATANGSIANTAR # Website: https://cctv.pematangsiantar.go.id/ # Managed by: Dinas Komunikasi dan Informatika Kota Pematangsiantar # Version: Ultimate 5.0 - SEO Champion Edition # ==================================================================== # ==================================================================== # ALLOW ALL LEGITIMATE CRAWLERS - MAXIMUM INDEXING # ==================================================================== User-agent: * Allow: / # ==================================================================== # PREMIUM ACCESS FOR MAJOR SEARCH ENGINES # ==================================================================== # Google - Maximum Access & Optimization User-agent: Googlebot Allow: / Crawl-delay: 0.5 Request-rate: 10/1m User-agent: Googlebot-Image Allow: / Allow: /assets/image/ Allow: /assets/image/thumbnail/ Crawl-delay: 0.5 User-agent: Googlebot-Video Allow: / Allow: /#cctv Allow: /location/ Crawl-delay: 0.5 User-agent: Googlebot-News Allow: / Crawl-delay: 0.5 User-agent: Google-InspectionTool Allow: / User-agent: GoogleOther Allow: / # Bing - High Priority Access User-agent: Bingbot Allow: / Crawl-delay: 1 Request-rate: 8/1m User-agent: MicrosoftPreview Allow: / # Yahoo - Optimize Access User-agent: Slurp Allow: / Crawl-delay: 2 # DuckDuckGo - Privacy-Focused Access User-agent: DuckDuckBot Allow: / Crawl-delay: 1 # Yandex - International Access User-agent: YandexBot Allow: / Crawl-delay: 2 User-agent: YandexImages Allow: /assets/image/ Crawl-delay: 2 # Baidu - Asian Market Access User-agent: Baiduspider Allow: / Crawl-delay: 3 User-agent: Baiduspider-image Allow: /assets/image/ Crawl-delay: 3 # ==================================================================== # SOCIAL MEDIA CRAWLERS - MAXIMUM SHARING OPTIMIZATION # ==================================================================== # Facebook - Social Sharing Optimization User-agent: facebookexternalhit Allow: / Allow: /assets/image/ Crawl-delay: 1 User-agent: facebookcatalog Allow: / # Twitter - Tweet Optimization User-agent: Twitterbot Allow: / Allow: /assets/image/ Crawl-delay: 1 # LinkedIn - Professional Network User-agent: LinkedInBot Allow: / Allow: /about-us.php Allow: /contact-us.php Crawl-delay: 2 # WhatsApp - Messaging Optimization User-agent: WhatsApp Allow: / Allow: /assets/image/ Crawl-delay: 1 # Telegram - Instant Messaging User-agent: TelegramBot Allow: / Crawl-delay: 1 # Pinterest - Visual Discovery User-agent: Pinterest Allow: / Allow: /assets/image/ Crawl-delay: 2 # Instagram - Visual Content User-agent: Instagram Allow: / Allow: /assets/image/ Crawl-delay: 2 # ==================================================================== # SPECIALIZED CRAWLERS - STRATEGIC ACCESS # ==================================================================== # Apple - iOS Integration User-agent: Applebot Allow: / Crawl-delay: 2 # Internet Archive - Digital Preservation User-agent: ia_archiver Allow: / Crawl-delay: 5 User-agent: archive.org_bot Allow: / Crawl-delay: 5 # SEO Tools - Limited Strategic Access User-agent: AhrefsBot Allow: / Allow: /about-us.php Allow: /contact-us.php Disallow: /admin/ Disallow: /logs/ Crawl-delay: 10 User-agent: SemrushBot Allow: / Disallow: /admin/ Disallow: /logs/ Crawl-delay: 10 User-agent: MJ12bot Allow: / Disallow: /admin/ Crawl-delay: 15 # News Aggregators User-agent: NewsNow Allow: / Crawl-delay: 5 User-agent: Moreover Allow: / Crawl-delay: 5 # ==================================================================== # BLOCK AGGRESSIVE CRAWLERS & SECURITY THREATS # ==================================================================== # Block AI Training Bots - Protect Content User-agent: ChatGPT-User Disallow: / User-agent: GPTBot Disallow: / User-agent: Google-Extended Disallow: / User-agent: anthropic-ai Disallow: / User-agent: Claude-Web Disallow: / User-agent: bard Disallow: / User-agent: OpenAI-SearchBot Disallow: / User-agent: PerplexityBot Disallow: / User-agent: YouBot Disallow: / User-agent: CCBot Disallow: / # Block Aggressive SEO Crawlers User-agent: DotBot Disallow: / User-agent: SeznamBot Disallow: / User-agent: BLEXBot Disallow: / User-agent: MagpieRSS Disallow: / User-agent: Nutch Disallow: / User-agent: omgili Disallow: / User-agent: psbot Disallow: / User-agent: SBIder Disallow: / User-agent: ScoutJet Disallow: / User-agent: Teoma Disallow: / User-agent: TwengaBot Disallow: / User-agent: Vagabondo Disallow: / User-agent: VoilaBot Disallow: / User-agent: ZyBorg Disallow: / # Block Content Scrapers User-agent: HTTrack Disallow: / User-agent: wget Disallow: / User-agent: WebReaper Disallow: / User-agent: WebCopier Disallow: / User-agent: Offline Explorer Disallow: / User-agent: Teleport Disallow: / User-agent: TeleportPro Disallow: / User-agent: WebStripper Disallow: / User-agent: NetAnts Disallow: / User-agent: FlashGet Disallow: / User-agent: GetRight Disallow: / User-agent: GetWeb! Disallow: / User-agent: Go!Zilla Disallow: / User-agent: Go-Ahead-Got-It Disallow: / User-agent: GrabNet Disallow: / User-agent: TurnitinBot Disallow: / # Block Image Scrapers User-agent: ImageWalker Disallow: / User-agent: ExtractorPro Disallow: / User-agent: WebPictures Disallow: / User-agent: WebImages Disallow: / # Block Email Harvesters User-agent: EmailSiphon Disallow: / User-agent: EmailWolf Disallow: / User-agent: ExtractorPro Disallow: / User-agent: CherryPicker Disallow: / User-agent: EmailCollector Disallow: / User-agent: WebEnhancer Disallow: / User-agent: WebmasterWorldForumBot Disallow: / User-agent: SpankBot Disallow: / # Block Vulnerability Scanners User-agent: Nikto Disallow: / User-agent: sqlmap Disallow: / User-agent: w3af Disallow: / User-agent: skipfish Disallow: / User-agent: Nessus Disallow: / User-agent: OpenVAS Disallow: / User-agent: Acunetix Disallow: / User-agent: Netsparker Disallow: / User-agent: AppScan Disallow: / User-agent: Burp Disallow: / User-agent: OWASP Disallow: / User-agent: ZAP Disallow: / # ==================================================================== # SECURITY RESTRICTIONS - PROTECT SENSITIVE AREAS # ==================================================================== # Block access to sensitive directories Disallow: /logs/ Disallow: /includes/ Disallow: /admin/ Disallow: /config/ Disallow: /private/ Disallow: /restricted/ Disallow: /error/ Disallow: /tmp/ Disallow: /temp/ Disallow: /cache/ Disallow: /backup/ Disallow: /backups/ Disallow: /.git/ Disallow: /.svn/ Disallow: /.env Disallow: /.htaccess Disallow: /web.config Disallow: /composer.json Disallow: /composer.lock Disallow: /package.json Disallow: /package-lock.json Disallow: /yarn.lock Disallow: /gulpfile.js Disallow: /webpack.config.js Disallow: /Gruntfile.js # Block sensitive file types Disallow: /*.log$ Disallow: /*.sql$ Disallow: /*.bak$ Disallow: /*.old$ Disallow: /*.orig$ Disallow: /*.tmp$ Disallow: /*.temp$ Disallow: /*.swp$ Disallow: /*.swo$ Disallow: /*.DS_Store$ Disallow: /*~$ # Block streaming URLs from direct crawling (prevent bandwidth issues) Disallow: /stream/ Disallow: /*embed* Disallow: /*player* Disallow: /*streaming* # Block query parameters that might cause issues Disallow: /*?*utm_* Disallow: /*?*session* Disallow: /*?*token* Disallow: /*?*key* Disallow: /*?*password* Disallow: /*?*admin* # ==================================================================== # ALLOW IMPORTANT ASSETS - MAXIMIZE SEO VALUE # ==================================================================== # Allow critical assets for SEO Allow: /assets/ Allow: /assets/css/ Allow: /assets/js/ Allow: /assets/image/ Allow: /assets/image/thumbnail/ Allow: /assets/fonts/ Allow: /favicon.ico Allow: /favicon.svg Allow: /apple-touch-icon.png Allow: /manifest.json Allow: /browserconfig.xml Allow: /sitemap.xml Allow: /sitemap*.xml Allow: /robots.txt Allow: /feed.xml Allow: /atom.xml Allow: /feed.json # Allow OpenGraph images Allow: /assets/image/og-* # Allow social media assets Allow: /assets/image/social/ # Allow structured data files Allow: /assets/schema/ # ==================================================================== # CRAWL OPTIMIZATION - MAXIMUM EFFICIENCY # ==================================================================== # Optimize crawl budget for high-value content # Priority URLs (fastest crawling) User-agent: * Allow: / Allow: /about-us.php Allow: /contact-us.php Allow: /#cctv Allow: /#wifi Allow: /#peta Allow: /location/ # ==================================================================== # SITEMAPS - COMPREHENSIVE DISCOVERY # ==================================================================== # Main sitemap index Sitemap: https://cctv.pematangsiantar.go.id/sitemap.xml # Specialized sitemaps Sitemap: https://cctv.pematangsiantar.go.id/sitemap-images.xml Sitemap: https://cctv.pematangsiantar.go.id/sitemap-videos.xml Sitemap: https://cctv.pematangsiantar.go.id/sitemap-news.xml Sitemap: https://cctv.pematangsiantar.go.id/sitemap-locations.xml Sitemap: https://cctv.pematangsiantar.go.id/sitemap-pages.xml # Language-specific sitemaps Sitemap: https://cctv.pematangsiantar.go.id/sitemap-id.xml Sitemap: https://cctv.pematangsiantar.go.id/sitemap-en.xml # Real-time sitemap Sitemap: https://cctv.pematangsiantar.go.id/sitemap-realtime.xml # ==================================================================== # HOST DIRECTIVE - PREFERRED DOMAIN # ==================================================================== # Canonical domain preference Host: https://cctv.pematangsiantar.go.id # ==================================================================== # ADDITIONAL DIRECTIVES - ADVANCED OPTIMIZATION # ==================================================================== # Clean URLs preference User-agent: * Disallow: /*.php$ Allow: /index.php Allow: /about-us.php Allow: /contact-us.php # Mobile optimization User-agent: Googlebot-Mobile Allow: / Crawl-delay: 0.5 # AMP optimization (if implemented) User-agent: * Allow: /amp/ # PWA optimization User-agent: * Allow: /sw.js Allow: /manifest.json # ==================================================================== # PERFORMANCE OPTIMIZATION # ==================================================================== # Prevent crawling of heavy resources during peak hours # Crawl-delay adjustments based on server load User-agent: * Crawl-delay: 1 # Weekend optimization (lower traffic) # User-agent: * # Crawl-delay: 0.5 # ==================================================================== # COMPLIANCE & LEGAL # ==================================================================== # Government transparency requirements Allow: /transparency/ Allow: /public-records/ Allow: /government-data/ # Accessibility compliance Allow: /accessibility/ Allow: /wcag/ # Privacy compliance Allow: /privacy-policy Allow: /terms-of-service Allow: /cookie-policy # ==================================================================== # ANALYTICS & TRACKING # ==================================================================== # Allow analytics tracking scripts Allow: /analytics/ Allow: /tracking/ Allow: /stats/ # Block analytics from being crawled as content Disallow: /analytics/*.js Disallow: /tracking/*.js # ==================================================================== # EMERGENCY DIRECTIVES # ==================================================================== # Emergency access (uncomment during maintenance) # User-agent: * # Disallow: / # Allow: /maintenance.html # Allow: /emergency-contact.html # ==================================================================== # END OF ULTIMATE ROBOTS.TXT # Last Updated: # For technical support: diskominfo@mail.pematangsiantar.go.id # Website: https://cctv.pematangsiantar.go.id/ # Government: Pemerintah Kota Pematangsiantar # ====================================================================