{"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1045024276", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1045024276, "node_id": "IC_kwDOBm6k_c4-Sc4U", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-02-18T19:01:42Z", "updated_at": "2022-02-18T19:55:24Z", "author_association": "OWNER", "body": "> Maybe I should use `-/` to encode forward slashes too, to defend against any ASGI servers that might not implement `raw_path` correctly.\r\n```python\r\ndef dash_encode(s):\r\n return s.replace(\"-\", \"--\").replace(\".\", \"-.\").replace(\"/\", \"-/\")\r\n\r\ndef dash_decode(s):\r\n return s.replace(\"-/\", \"/\").replace(\"-.\", \".\").replace(\"--\", \"-\")\r\n```\r\n\r\n```pycon\r\n>>> dash_encode(\"foo/bar/baz.csv\")\r\n'foo-/bar-/baz-.csv'\r\n>>> dash_decode('foo-/bar-/baz-.csv')\r\n'foo/bar/baz.csv'\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1045027067", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1045027067, "node_id": "IC_kwDOBm6k_c4-Sdj7", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-02-18T19:03:26Z", "updated_at": "2022-02-18T19:03:26Z", "author_association": "OWNER", "body": "(If I make this change it may break some existing Datasette installations when they upgrade - I could try and build a plugin for them which triggers on 404s and checks to see if the old format would return a 200 response, then returns that.)", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1045032377", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1045032377, "node_id": "IC_kwDOBm6k_c4-Se25", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-02-18T19:06:50Z", "updated_at": "2022-02-18T19:06:50Z", "author_association": "OWNER", "body": "How does URL routing for https://latest.datasette.io/fixtures/table%2Fwith%2Fslashes.csv work?\r\n\r\nRight now it's https://github.com/simonw/datasette/blob/7d24fd405f3c60e4c852c5d746c91aa2ba23cf5b/datasette/app.py#L1098-L1101\r\n\r\nThat's not going to capture the dot-dash encoding version of that table name:\r\n```pycon\r\n>>> dot_dash_encode(\"table/with/slashes.csv\")\r\n'table-/with-/slashes-.csv'\r\n```\r\nProbably needs a fancy regex trick like a negative lookbehind assertion or similar.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1045055772", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1045055772, "node_id": "IC_kwDOBm6k_c4-Skkc", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-02-18T19:23:33Z", "updated_at": "2022-02-18T19:25:42Z", "author_association": "OWNER", "body": "I want a match for this URL:\r\n\r\n /db/table-/with-/slashes-.csv\r\n\r\nMaybe this:\r\n\r\n ^/(?P[^/]+)/(?P([^/]*|(\\-/)*|(\\-\\.)*|(\\.\\.)*)*$)\r\n\r\nHere we are matching a sequence of:\r\n\r\n ([^/]*|(\\-/)*|(\\-\\.)*|(\\-\\-)*)*\r\n\r\nSo a combination of not-slashes OR -/ or -. Or -- sequences\r\n\r\n\"image\"\r\n\r\n ^/(?P[^/]+)/(?P([^/]*|(\\-/)*|(\\-\\.)*|(\\-\\-)*)*$)\r\n\r\nTry that with non-capturing bits:\r\n\r\n ^/(?P[^/]+)/(?P(?:[^/]*|(?:\\-/)*|(?:\\-\\.)*|(?:\\-\\-)*)*$)\r\n\r\n`(?:[^/]*|(?:\\-/)*|(?:\\-\\.)*|(?:\\-\\-)*)*` visualized is:\r\n\r\n\"image\"\r\n\r\nHere's the explanation on regex101.com https://regex101.com/r/CPnsIO/1\r\n\r\n\"image\"\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1045059427", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1045059427, "node_id": "IC_kwDOBm6k_c4-Sldj", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-02-18T19:26:25Z", "updated_at": "2022-02-18T19:26:25Z", "author_association": "OWNER", "body": "With this new pattern I could probably extract out the optional `.json` format string as part of the initial route capturing regex too, rather than the current `table_and_format` hack.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1045075207", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1045075207, "node_id": "IC_kwDOBm6k_c4-SpUH", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-02-18T19:39:35Z", "updated_at": "2022-02-18T19:40:13Z", "author_association": "OWNER", "body": "> And if for some horific reason you had a table with the name `/db/table-.csv.csv` (so `/db/` was the first part of the actual table name in SQLite) the URLs would look like this:\r\n> \r\n> * `/db/%2Fdb%2Ftable---.csv-.csv` - the HTML version\r\n> * `/db/%2Fdb%2Ftable---.csv-.csv.csv` - the CSV version\r\n> * `/db/%2Fdb%2Ftable---.csv-.csv.json` - the JSON version\r\n\r\nHere's what those look like with the updated version of `dot_dash_encode()` that also encodes `/` as `-/`:\r\n\r\n- `/db/-/db-/table---.csv-.csv` - HTML\r\n- `/db/-/db-/table---.csv-.csv.csv` - CSV\r\n- `/db/-/db-/table---.csv-.csv.json` - JSON\r\n\r\n\"image\"\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1045077590", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1045077590, "node_id": "IC_kwDOBm6k_c4-Sp5W", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-02-18T19:41:37Z", "updated_at": "2022-02-18T19:42:41Z", "author_association": "OWNER", "body": "Ugh, one disadvantage I just spotted with this: Datasette already has a `/-/versions.json` convention where \"system\" URLs are namespaced under `/-/` - but that could be confused under this new scheme with the `-/` escaping sequence.\r\n\r\nAnd I've thought about adding `/db/-/special` and `/db/table/-/special` URLs in the past too.\r\n\r\nMaybe change this system to use `.` as the escaping character instead of `-`?", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1045081042", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1045081042, "node_id": "IC_kwDOBm6k_c4-SqvS", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-02-18T19:44:12Z", "updated_at": "2022-02-18T19:51:34Z", "author_association": "OWNER", "body": "```python\r\ndef dot_encode(s):\r\n return s.replace(\".\", \"..\").replace(\"/\", \"./\")\r\n\r\ndef dot_decode(s):\r\n return s.replace(\"./\", \"/\").replace(\"..\", \".\")\r\n```\r\nNo need for hyphen encoding in this variant at all, which simplifies things a bit.\r\n\r\n(Update: this is flawed, see https://github.com/simonw/datasette/issues/1439#issuecomment-1045086033)", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1045082891", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1045082891, "node_id": "IC_kwDOBm6k_c4-SrML", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-02-18T19:45:32Z", "updated_at": "2022-02-18T19:45:32Z", "author_association": "OWNER", "body": "```pycon\r\n>>> dot_encode(\"/db/table-.csv.csv\")\r\n'./db./table-..csv..csv'\r\n>>> dot_decode('./db./table-..csv..csv')\r\n'/db/table-.csv.csv'\r\n```\r\nI worry that web servers might treat `./` in a special way though.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1045086033", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1045086033, "node_id": "IC_kwDOBm6k_c4-Sr9R", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-02-18T19:47:43Z", "updated_at": "2022-02-18T19:51:11Z", "author_association": "OWNER", "body": "- https://datasette.io/-/asgi-scope/db/./db./table-..csv..csv\r\n- https://til.simonwillison.net/-/asgi-scope/db/./db./table-..csv..csv\r\n\r\nDo both of those survive the round-trip to populate `raw_path` correctly?\r\n\r\nNo! In both cases the `/./` bit goes missing.\r\n\r\nIt looks like this might even be a client issue - `curl` shows me this:\r\n\r\n```\r\n~ % curl -vv -i 'https://datasette.io/-/asgi-scope/db/./db./table-..csv..csv'\r\n* Trying 216.239.32.21:443...\r\n* Connected to datasette.io (216.239.32.21) port 443 (#0)\r\n* ALPN, offering http/1.1\r\n* TLS 1.2 connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256\r\n* Server certificate: datasette.io\r\n* Server certificate: R3\r\n* Server certificate: ISRG Root X1\r\n> GET /-/asgi-scope/db/db./table-..csv..csv HTTP/1.1\r\n```\r\nSo `curl` decided to turn `/-/asgi-scope/db/./db./table` into `/-/asgi-scope/db/db./table` before even sending the request.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1045095348", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1045095348, "node_id": "IC_kwDOBm6k_c4-SuO0", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-02-18T19:53:48Z", "updated_at": "2022-02-18T19:53:48Z", "author_association": "OWNER", "body": "> Ugh, one disadvantage I just spotted with this: Datasette already has a `/-/versions.json` convention where \"system\" URLs are namespaced under `/-/` - but that could be confused under this new scheme with the `-/` escaping sequence.\r\n> \r\n> And I've thought about adding `/db/-/special` and `/db/table/-/special` URLs in the past too.\r\n\r\nI don't think this matters. The new regex does indeed capture that kind of page:\r\n\r\n\"image\"\r\n\r\nBut Datasette goes through configured route regular expressions in order - so I can have the regex that captures `/db/-/special` routes listed before the one that captures tables and formats.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1045099290", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1045099290, "node_id": "IC_kwDOBm6k_c4-SvMa", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-02-18T19:56:18Z", "updated_at": "2022-02-18T19:56:30Z", "author_association": "OWNER", "body": "> ```python\r\n> def dash_encode(s):\r\n> return s.replace(\"-\", \"--\").replace(\".\", \"-.\").replace(\"/\", \"-/\")\r\n> \r\n> def dash_decode(s):\r\n> return s.replace(\"-/\", \"/\").replace(\"-.\", \".\").replace(\"--\", \"-\")\r\n> ```\r\n\r\nI think **dash-encoding** (new name for this) is the right way forward here.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1045108611", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1045108611, "node_id": "IC_kwDOBm6k_c4-SxeD", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-02-18T20:02:19Z", "updated_at": "2022-02-18T20:08:34Z", "author_association": "OWNER", "body": "One other potential variant:\r\n```python\r\ndef dash_encode(s):\r\n return s.replace(\"-\", \"-dash-\").replace(\".\", \"-dot-\").replace(\"/\", \"-slash-\")\r\n\r\ndef dash_decode(s):\r\n return s.replace(\"-slash-\", \"/\").replace(\"-dot-\", \".\").replace(\"-dash-\", \"-\")\r\n```\r\nExcept this has bugs - it doesn't round-trip safely, because it can get confused about things like `-dash-slash-` in terms of is that a `-dash-` or a `-slash-`?\r\n```pycon\r\n>>> dash_encode(\"/db/table-.csv.csv\")\r\n'-slash-db-slash-table-dash--dot-csv-dot-csv'\r\n>>> dash_decode('-slash-db-slash-table-dash--dot-csv-dot-csv')\r\n'/db/table-.csv.csv'\r\n>>> dash_encode('-slash-db-slash-table-dash--dot-csv-dot-csv')\r\n'-dash-slash-dash-db-dash-slash-dash-table-dash-dash-dash--dash-dot-dash-csv-dash-dot-dash-csv'\r\n>>> dash_decode('-dash-slash-dash-db-dash-slash-dash-table-dash-dash-dash--dash-dot-dash-csv-dash-dot-dash-csv')\r\n'-dash/dash-db-dash/dash-table-dash--dash.dash-csv-dash.dash-csv'\r\n```\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1045111309", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1045111309, "node_id": "IC_kwDOBm6k_c4-SyIN", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-02-18T20:04:24Z", "updated_at": "2022-02-18T20:05:40Z", "author_association": "OWNER", "body": "This made me worry that my current `dash_decode()` implementation had unknown round-trip bugs, but thankfully this works OK:\r\n```pycon\r\n>>> dash_encode(\"/db/table-.csv.csv\")\r\n'-/db-/table---.csv-.csv'\r\n>>> dash_encode('-/db-/table---.csv-.csv')\r\n'---/db---/table-------.csv---.csv'\r\n>>> dash_decode('---/db---/table-------.csv---.csv')\r\n'-/db-/table---.csv-.csv'\r\n>>> dash_decode('-/db-/table---.csv-.csv')\r\n'/db/table-.csv.csv'\r\n``` \r\nThe regex still works against that double-encoded example too:\r\n\r\n\"image\"\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1045117304", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1045117304, "node_id": "IC_kwDOBm6k_c4-Szl4", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-02-18T20:09:22Z", "updated_at": "2022-02-18T20:09:22Z", "author_association": "OWNER", "body": "Adopting this could result in supporting database files with surprising characters in their filename too.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1045131086", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1045131086, "node_id": "IC_kwDOBm6k_c4-S29O", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-02-18T20:22:13Z", "updated_at": "2022-02-18T20:22:47Z", "author_association": "OWNER", "body": "Should it encode `%` symbols too, since they have a special meaning in URLs and we can't guarantee that every single web server / proxy out there will round-trip them safely using percentage encoding? If so, would need to pick a different encoding character for them. Maybe `%` becomes `-p` - and in that case `/` could become `-s` too.\r\n\r\nIs it worth expanding dash-encoding outside of just `/` and `-` and `.` though? Not sure.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1045134050", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1045134050, "node_id": "IC_kwDOBm6k_c4-S3ri", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-02-18T20:25:04Z", "updated_at": "2022-02-18T20:25:04Z", "author_association": "OWNER", "body": "Here's a useful modern spec for how existing URL percentage encoding is supposed to work: https://url.spec.whatwg.org/#percent-encoded-bytes", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1045269544", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1045269544, "node_id": "IC_kwDOBm6k_c4-TYwo", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-02-18T22:19:29Z", "updated_at": "2022-02-18T22:19:29Z", "author_association": "OWNER", "body": "Note that I've ruled out using `Accept: application/json` to return JSON because it turns out Cloudflare and potentially other CDNs ignore the `Vary: Accept` header entirely:\r\n- https://github.com/simonw/datasette/issues/1534", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null}