issues: 964322136
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | pull_request | body | repo | type | active_lock_reason | performed_via_github_app | reactions | draft | state_reason |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
964322136 | MDU6SXNzdWU5NjQzMjIxMzY= | 1426 | Manage /robots.txt in Datasette core, block robots by default | 9599 | open | 0 | 9 | 2021-08-09T19:56:56Z | 2021-12-04T07:11:29Z | OWNER | See accompanying Twitter thread: https://twitter.com/simonw/status/1424820203603431439 > Datasette currently has a plugin for configuring robots.txt, but I'm beginning to think it should be part of core and crawlers should be blocked by default - having people explicitly opt-in to having their sites crawled and indexed feels a lot safer https://datasette.io/plugins/datasette-block-robots I have a lot of Datasettes deployed now, and tailing logs shows that they are being *hammered* by search engine crawlers even though many of them are not interesting enough to warrant indexing. I'm starting to think blocking crawlers would actually be a better default for most people, provided it was well documented and easy to understand how to allow them. Default-deny is usually a better policy than default-allow! | 107914493 | issue | {"url": "https://api.github.com/repos/simonw/datasette/issues/1426/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} |