issue_comments
48 rows where issue = 973139047
This data as json, CSV (advanced)
Suggested facets: user, author_association, created_at (date), updated_at (date)
id ▼ | html_url | issue_url | node_id | user | created_at | updated_at | author_association | body | reactions | issue | performed_via_github_app |
---|---|---|---|---|---|---|---|---|---|---|---|
900699670 | https://github.com/simonw/datasette/issues/1439#issuecomment-900699670 | https://api.github.com/repos/simonw/datasette/issues/1439 | IC_kwDOBm6k_c41r5YW | simonw 9599 | 2021-08-17T23:34:23Z | 2021-08-17T23:34:23Z | OWNER | The challenge comes down to telling the difference between the following: - `/db/table` - an HTML table page - `/db/table.csv` - the CSV version of `/db/table` - `/db/table.csv` - no this one is actually a database table called `table.csv` - `/db/table.csv.csv` - the CSV version of `/db/table.csv` - `/db/table.csv.csv.csv` and so on... | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047 | |
900705226 | https://github.com/simonw/datasette/issues/1439#issuecomment-900705226 | https://api.github.com/repos/simonw/datasette/issues/1439 | IC_kwDOBm6k_c41r6vK | simonw 9599 | 2021-08-17T23:50:32Z | 2021-08-17T23:50:47Z | OWNER | An alternative solution would be to use some form of escaping for the characters that form the name of the table. The obvious way to do this would be URL-encoding - but it doesn't hold for `.` characters. The hex for that is `%2E` but watch what happens with that in a URL: ``` # Against Cloud Run: curl -s 'https://datasette.io/-/asgi-scope/foo/bar%2Fbaz%2E' | rg path 'path': '/-/asgi-scope/foo/bar/baz.', 'raw_path': b'/-/asgi-scope/foo/bar%2Fbaz.', 'root_path': '', # Against Vercel: curl -s 'https://til.simonwillison.net/-/asgi-scope/foo/bar%2Fbaz%2E' | rg path 'path': '/-/asgi-scope/foo/bar%2Fbaz%2E', 'raw_path': b'/-/asgi-scope/foo/bar%2Fbaz%2E', 'root_path': '', ``` Surprisingly in this case Vercel DOES keep it intact, but Cloud Run does not. It's still no good though: I need a solution that works on Vercel, Cloud Run and every other potential hosting provider too. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047 | |
900709703 | https://github.com/simonw/datasette/issues/1439#issuecomment-900709703 | https://api.github.com/repos/simonw/datasette/issues/1439 | IC_kwDOBm6k_c41r71H | simonw 9599 | 2021-08-18T00:03:09Z | 2021-08-18T00:03:09Z | OWNER | But... what if I invent my own escaping scheme? I actually did this once before, in https://github.com/simonw/datasette/commit/9fdb47ca952b93b7b60adddb965ea6642b1ff523 - while I was working on porting Datasette to ASGI in https://github.com/simonw/datasette/issues/272#issuecomment-494192779 because ASGI didn't yet have the `raw_path` mechanism. I could bring that back - it looked like this: ``` "table/and/slashes" => "tableU+002FandU+002Fslashes" "~table" => "U+007Etable" "+bobcats!" => "U+002Bbobcats!" "U+007Etable" => "UU+002B007Etable" ``` But I didn't particularly like it - it was quite verbose. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047 | |
900711967 | https://github.com/simonw/datasette/issues/1439#issuecomment-900711967 | https://api.github.com/repos/simonw/datasette/issues/1439 | IC_kwDOBm6k_c41r8Yf | simonw 9599 | 2021-08-18T00:08:09Z | 2021-08-18T00:08:09Z | OWNER | Here's an alternative I just made up which I'm calling "dot dash" encoding: ```python def dot_dash_encode(s): return s.replace("-", "--").replace(".", "-.") def dot_dash_decode(s): return s.replace("-.", ".").replace("--", "-") ``` And some examples: ```python for example in ( "hello", "hello.csv", "hello-and-so-on.csv", "hello-.csv", "hello--and--so--on-.csv", "hello.csv.", "hello.csv.-", "hello.csv.--", ): print(example) print(dot_dash_encode(example)) print(example == dot_dash_decode(dot_dash_encode(example))) print() ``` Outputs: ``` hello hello True hello.csv hello-.csv True hello-and-so-on.csv hello--and--so--on-.csv True hello-.csv hello---.csv True hello--and--so--on-.csv hello----and----so----on---.csv True hello.csv. hello-.csv-. True hello.csv.- hello-.csv-.-- True hello.csv.-- hello-.csv-.---- True ``` | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047 | |
900712981 | https://github.com/simonw/datasette/issues/1439#issuecomment-900712981 | https://api.github.com/repos/simonw/datasette/issues/1439 | IC_kwDOBm6k_c41r8oV | simonw 9599 | 2021-08-18T00:09:59Z | 2021-08-18T00:12:32Z | OWNER | So given the original examples, a table called `table.csv` would have the following URLs: - `/db/table-.csv` - the HTML version - `/db/table-.csv.csv` - the CSV version - `/db/table-.csv.json` - the JSON version And if for some horific reason you had a table with the name `/db/table-.csv.csv` (so `/db/` was the first part of the actual table name in SQLite) the URLs would look like this: - `/db/%2Fdb%2Ftable---.csv-.csv` - the HTML version - `/db/%2Fdb%2Ftable---.csv-.csv.csv` - the CSV version - `/db/%2Fdb%2Ftable---.csv-.csv.json` - the JSON version | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047 | |
900714630 | https://github.com/simonw/datasette/issues/1439#issuecomment-900714630 | https://api.github.com/repos/simonw/datasette/issues/1439 | IC_kwDOBm6k_c41r9CG | simonw 9599 | 2021-08-18T00:13:33Z | 2021-08-18T00:13:33Z | OWNER | The documentation should definitely cover how table names become URLs, in case any third party code needs to be able to calculate this themselves. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047 | |
900715375 | https://github.com/simonw/datasette/issues/1439#issuecomment-900715375 | https://api.github.com/repos/simonw/datasette/issues/1439 | IC_kwDOBm6k_c41r9Nv | simonw 9599 | 2021-08-18T00:15:28Z | 2021-08-18T00:15:28Z | OWNER | Maybe I should use `-/` to encode forward slashes too, to defend against any ASGI servers that might not implement `raw_path` correctly. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047 | |
1031141849 | https://github.com/simonw/datasette/issues/1439#issuecomment-1031141849 | https://api.github.com/repos/simonw/datasette/issues/1439 | IC_kwDOBm6k_c49dfnZ | simonw 9599 | 2022-02-07T07:11:11Z | 2022-02-07T07:11:11Z | OWNER | I added a Link header to solve this problem for the JSON version in: - #1533 | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047 | |
1045024276 | https://github.com/simonw/datasette/issues/1439#issuecomment-1045024276 | https://api.github.com/repos/simonw/datasette/issues/1439 | IC_kwDOBm6k_c4-Sc4U | simonw 9599 | 2022-02-18T19:01:42Z | 2022-02-18T19:55:24Z | OWNER | > Maybe I should use `-/` to encode forward slashes too, to defend against any ASGI servers that might not implement `raw_path` correctly. ```python def dash_encode(s): return s.replace("-", "--").replace(".", "-.").replace("/", "-/") def dash_decode(s): return s.replace("-/", "/").replace("-.", ".").replace("--", "-") ``` ```pycon >>> dash_encode("foo/bar/baz.csv") 'foo-/bar-/baz-.csv' >>> dash_decode('foo-/bar-/baz-.csv') 'foo/bar/baz.csv' ``` | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047 | |
1045027067 | https://github.com/simonw/datasette/issues/1439#issuecomment-1045027067 | https://api.github.com/repos/simonw/datasette/issues/1439 | IC_kwDOBm6k_c4-Sdj7 | simonw 9599 | 2022-02-18T19:03:26Z | 2022-02-18T19:03:26Z | OWNER | (If I make this change it may break some existing Datasette installations when they upgrade - I could try and build a plugin for them which triggers on 404s and checks to see if the old format would return a 200 response, then returns that.) | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047 | |
1045032377 | https://github.com/simonw/datasette/issues/1439#issuecomment-1045032377 | https://api.github.com/repos/simonw/datasette/issues/1439 | IC_kwDOBm6k_c4-Se25 | simonw 9599 | 2022-02-18T19:06:50Z | 2022-02-18T19:06:50Z | OWNER | How does URL routing for https://latest.datasette.io/fixtures/table%2Fwith%2Fslashes.csv work? Right now it's https://github.com/simonw/datasette/blob/7d24fd405f3c60e4c852c5d746c91aa2ba23cf5b/datasette/app.py#L1098-L1101 That's not going to capture the dot-dash encoding version of that table name: ```pycon >>> dot_dash_encode("table/with/slashes.csv") 'table-/with-/slashes-.csv' ``` Probably needs a fancy regex trick like a negative lookbehind assertion or similar. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047 | |
1045055772 | https://github.com/simonw/datasette/issues/1439#issuecomment-1045055772 | https://api.github.com/repos/simonw/datasette/issues/1439 | IC_kwDOBm6k_c4-Skkc | simonw 9599 | 2022-02-18T19:23:33Z | 2022-02-18T19:25:42Z | OWNER | I want a match for this URL: /db/table-/with-/slashes-.csv Maybe this: ^/(?P<db_name>[^/]+)/(?P<table_and_format>([^/]*|(\-/)*|(\-\.)*|(\.\.)*)*$) Here we are matching a sequence of: ([^/]*|(\-/)*|(\-\.)*|(\-\-)*)* So a combination of not-slashes OR -/ or -. Or -- sequences <img width="224" alt="image" src="https://user-images.githubusercontent.com/9599/154748362-84909d4e-dccf-454b-a9cd-a036f9f66f09.png"> ^/(?P<db_name>[^/]+)/(?P<table_and_format>([^/]*|(\-/)*|(\-\.)*|(\-\-)*)*$) Try that with non-capturing bits: ^/(?P<db_name>[^/]+)/(?P<table_and_format>(?:[^/]*|(?:\-/)*|(?:\-\.)*|(?:\-\-)*)*$) `(?:[^/]*|(?:\-/)*|(?:\-\.)*|(?:\-\-)*)*` visualized is: <img width="193" alt="image" src="https://user-images.githubusercontent.com/9599/154748441-decea502-0d04-44f4-9ca9-fb6883767833.png"> Here's the explanation on regex101.com https://regex101.com/r/CPnsIO/1 <img width="1074" alt="image" src="https://user-images.githubusercontent.com/9599/154748720-cdda61db-5498-49a8-91c2-e726b394fa49.png"> | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047 | |
1045059427 | https://github.com/simonw/datasette/issues/1439#issuecomment-1045059427 | https://api.github.com/repos/simonw/datasette/issues/1439 | IC_kwDOBm6k_c4-Sldj | simonw 9599 | 2022-02-18T19:26:25Z | 2022-02-18T19:26:25Z | OWNER | With this new pattern I could probably extract out the optional `.json` format string as part of the initial route capturing regex too, rather than the current `table_and_format` hack. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047 | |
1045069481 | https://github.com/simonw/datasette/issues/1439#issuecomment-1045069481 | https://api.github.com/repos/simonw/datasette/issues/1439 | IC_kwDOBm6k_c4-Sn6p | simonw 9599 | 2022-02-18T19:34:41Z | 2022-03-05T21:32:22Z | OWNER | I think I got format extraction working! https://regex101.com/r/A0bW1D/1 ^/(?P<database>[^/]+)/(?P<table>(?:[^\/\-\.]*|(?:\-/)*|(?:\-\.)*|(?:\-\-)*)*?)(?:(?<!\-)\.(?P<format>\w+))?$ I had to make that crazy inner one even more complicated to stop it from capturing `.` that was not part of `-.`. (?:[^\/\-\.]*|(?:\-/)*|(?:\-\.)*|(?:\-\-)*)* Visualized: <img width="222" alt="image" src="https://user-images.githubusercontent.com/9599/154749714-44579899-5dc7-4e5f-ad4f-dc59dac48979.png"> So now I have a regex which can extract out the dot-encoded table name AND spot if there is an optional `.format` at the end: <img width="1090" alt="image" src="https://user-images.githubusercontent.com/9599/156900484-7912073f-28aa-4301-86e2-e5cbe625e1d5.png"> If I end up using this in Datasette it's going to need VERY comprehensive unit tests and inline documentation. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047 | |
1045075207 | https://github.com/simonw/datasette/issues/1439#issuecomment-1045075207 | https://api.github.com/repos/simonw/datasette/issues/1439 | IC_kwDOBm6k_c4-SpUH | simonw 9599 | 2022-02-18T19:39:35Z | 2022-02-18T19:40:13Z | OWNER | > And if for some horific reason you had a table with the name `/db/table-.csv.csv` (so `/db/` was the first part of the actual table name in SQLite) the URLs would look like this: > > * `/db/%2Fdb%2Ftable---.csv-.csv` - the HTML version > * `/db/%2Fdb%2Ftable---.csv-.csv.csv` - the CSV version > * `/db/%2Fdb%2Ftable---.csv-.csv.json` - the JSON version Here's what those look like with the updated version of `dot_dash_encode()` that also encodes `/` as `-/`: - `/db/-/db-/table---.csv-.csv` - HTML - `/db/-/db-/table---.csv-.csv.csv` - CSV - `/db/-/db-/table---.csv-.csv.json` - JSON <img width="1050" alt="image" src="https://user-images.githubusercontent.com/9599/154750631-a8a23c62-3dfc-43e4-8026-4d117dc4bf8d.png"> | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047 | |
1045077590 | https://github.com/simonw/datasette/issues/1439#issuecomment-1045077590 | https://api.github.com/repos/simonw/datasette/issues/1439 | IC_kwDOBm6k_c4-Sp5W | simonw 9599 | 2022-02-18T19:41:37Z | 2022-02-18T19:42:41Z | OWNER | Ugh, one disadvantage I just spotted with this: Datasette already has a `/-/versions.json` convention where "system" URLs are namespaced under `/-/` - but that could be confused under this new scheme with the `-/` escaping sequence. And I've thought about adding `/db/-/special` and `/db/table/-/special` URLs in the past too. Maybe change this system to use `.` as the escaping character instead of `-`? | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047 | |
1045081042 | https://github.com/simonw/datasette/issues/1439#issuecomment-1045081042 | https://api.github.com/repos/simonw/datasette/issues/1439 | IC_kwDOBm6k_c4-SqvS | simonw 9599 | 2022-02-18T19:44:12Z | 2022-02-18T19:51:34Z | OWNER | ```python def dot_encode(s): return s.replace(".", "..").replace("/", "./") def dot_decode(s): return s.replace("./", "/").replace("..", ".") ``` No need for hyphen encoding in this variant at all, which simplifies things a bit. (Update: this is flawed, see https://github.com/simonw/datasette/issues/1439#issuecomment-1045086033) | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047 | |
1045082891 | https://github.com/simonw/datasette/issues/1439#issuecomment-1045082891 | https://api.github.com/repos/simonw/datasette/issues/1439 | IC_kwDOBm6k_c4-SrML | simonw 9599 | 2022-02-18T19:45:32Z | 2022-02-18T19:45:32Z | OWNER | ```pycon >>> dot_encode("/db/table-.csv.csv") './db./table-..csv..csv' >>> dot_decode('./db./table-..csv..csv') '/db/table-.csv.csv' ``` I worry that web servers might treat `./` in a special way though. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047 | |
1045086033 | https://github.com/simonw/datasette/issues/1439#issuecomment-1045086033 | https://api.github.com/repos/simonw/datasette/issues/1439 | IC_kwDOBm6k_c4-Sr9R | simonw 9599 | 2022-02-18T19:47:43Z | 2022-02-18T19:51:11Z | OWNER | - https://datasette.io/-/asgi-scope/db/./db./table-..csv..csv - https://til.simonwillison.net/-/asgi-scope/db/./db./table-..csv..csv Do both of those survive the round-trip to populate `raw_path` correctly? No! In both cases the `/./` bit goes missing. It looks like this might even be a client issue - `curl` shows me this: ``` ~ % curl -vv -i 'https://datasette.io/-/asgi-scope/db/./db./table-..csv..csv' * Trying 216.239.32.21:443... * Connected to datasette.io (216.239.32.21) port 443 (#0) * ALPN, offering http/1.1 * TLS 1.2 connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 * Server certificate: datasette.io * Server certificate: R3 * Server certificate: ISRG Root X1 > GET /-/asgi-scope/db/db./table-..csv..csv HTTP/1.1 ``` So `curl` decided to turn `/-/asgi-scope/db/./db./table` into `/-/asgi-scope/db/db./table` before even sending the request. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047 | |
1045095348 | https://github.com/simonw/datasette/issues/1439#issuecomment-1045095348 | https://api.github.com/repos/simonw/datasette/issues/1439 | IC_kwDOBm6k_c4-SuO0 | simonw 9599 | 2022-02-18T19:53:48Z | 2022-02-18T19:53:48Z | OWNER | > Ugh, one disadvantage I just spotted with this: Datasette already has a `/-/versions.json` convention where "system" URLs are namespaced under `/-/` - but that could be confused under this new scheme with the `-/` escaping sequence. > > And I've thought about adding `/db/-/special` and `/db/table/-/special` URLs in the past too. I don't think this matters. The new regex does indeed capture that kind of page: <img width="1052" alt="image" src="https://user-images.githubusercontent.com/9599/154752309-e1787755-3bdb-47c2-867c-7ac5fe65664d.png"> But Datasette goes through configured route regular expressions in order - so I can have the regex that captures `/db/-/special` routes listed before the one that captures tables and formats. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047 | |
1045099290 | https://github.com/simonw/datasette/issues/1439#issuecomment-1045099290 | https://api.github.com/repos/simonw/datasette/issues/1439 | IC_kwDOBm6k_c4-SvMa | simonw 9599 | 2022-02-18T19:56:18Z | 2022-02-18T19:56:30Z | OWNER | > ```python > def dash_encode(s): > return s.replace("-", "--").replace(".", "-.").replace("/", "-/") > > def dash_decode(s): > return s.replace("-/", "/").replace("-.", ".").replace("--", "-") > ``` I think **dash-encoding** (new name for this) is the right way forward here. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047 | |
1045108611 | https://github.com/simonw/datasette/issues/1439#issuecomment-1045108611 | https://api.github.com/repos/simonw/datasette/issues/1439 | IC_kwDOBm6k_c4-SxeD | simonw 9599 | 2022-02-18T20:02:19Z | 2022-02-18T20:08:34Z | OWNER | One other potential variant: ```python def dash_encode(s): return s.replace("-", "-dash-").replace(".", "-dot-").replace("/", "-slash-") def dash_decode(s): return s.replace("-slash-", "/").replace("-dot-", ".").replace("-dash-", "-") ``` Except this has bugs - it doesn't round-trip safely, because it can get confused about things like `-dash-slash-` in terms of is that a `-dash-` or a `-slash-`? ```pycon >>> dash_encode("/db/table-.csv.csv") '-slash-db-slash-table-dash--dot-csv-dot-csv' >>> dash_decode('-slash-db-slash-table-dash--dot-csv-dot-csv') '/db/table-.csv.csv' >>> dash_encode('-slash-db-slash-table-dash--dot-csv-dot-csv') '-dash-slash-dash-db-dash-slash-dash-table-dash-dash-dash--dash-dot-dash-csv-dash-dot-dash-csv' >>> dash_decode('-dash-slash-dash-db-dash-slash-dash-table-dash-dash-dash--dash-dot-dash-csv-dash-dot-dash-csv') '-dash/dash-db-dash/dash-table-dash--dash.dash-csv-dash.dash-csv' ``` | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047 | |
1045111309 | https://github.com/simonw/datasette/issues/1439#issuecomment-1045111309 | https://api.github.com/repos/simonw/datasette/issues/1439 | IC_kwDOBm6k_c4-SyIN | simonw 9599 | 2022-02-18T20:04:24Z | 2022-02-18T20:05:40Z | OWNER | This made me worry that my current `dash_decode()` implementation had unknown round-trip bugs, but thankfully this works OK: ```pycon >>> dash_encode("/db/table-.csv.csv") '-/db-/table---.csv-.csv' >>> dash_encode('-/db-/table---.csv-.csv') '---/db---/table-------.csv---.csv' >>> dash_decode('---/db---/table-------.csv---.csv') '-/db-/table---.csv-.csv' >>> dash_decode('-/db-/table---.csv-.csv') '/db/table-.csv.csv' ``` The regex still works against that double-encoded example too: <img width="1032" alt="image" src="https://user-images.githubusercontent.com/9599/154753916-b7d2159e-4284-4c92-ae61-110671fa320e.png"> | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047 | |
1045117304 | https://github.com/simonw/datasette/issues/1439#issuecomment-1045117304 | https://api.github.com/repos/simonw/datasette/issues/1439 | IC_kwDOBm6k_c4-Szl4 | simonw 9599 | 2022-02-18T20:09:22Z | 2022-02-18T20:09:22Z | OWNER | Adopting this could result in supporting database files with surprising characters in their filename too. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047 | |
1045131086 | https://github.com/simonw/datasette/issues/1439#issuecomment-1045131086 | https://api.github.com/repos/simonw/datasette/issues/1439 | IC_kwDOBm6k_c4-S29O | simonw 9599 | 2022-02-18T20:22:13Z | 2022-02-18T20:22:47Z | OWNER | Should it encode `%` symbols too, since they have a special meaning in URLs and we can't guarantee that every single web server / proxy out there will round-trip them safely using percentage encoding? If so, would need to pick a different encoding character for them. Maybe `%` becomes `-p` - and in that case `/` could become `-s` too. Is it worth expanding dash-encoding outside of just `/` and `-` and `.` though? Not sure. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047 | |
1045134050 | https://github.com/simonw/datasette/issues/1439#issuecomment-1045134050 | https://api.github.com/repos/simonw/datasette/issues/1439 | IC_kwDOBm6k_c4-S3ri | simonw 9599 | 2022-02-18T20:25:04Z | 2022-02-18T20:25:04Z | OWNER | Here's a useful modern spec for how existing URL percentage encoding is supposed to work: https://url.spec.whatwg.org/#percent-encoded-bytes | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047 | |
1045269544 | https://github.com/simonw/datasette/issues/1439#issuecomment-1045269544 | https://api.github.com/repos/simonw/datasette/issues/1439 | IC_kwDOBm6k_c4-TYwo | simonw 9599 | 2022-02-18T22:19:29Z | 2022-02-18T22:19:29Z | OWNER | Note that I've ruled out using `Accept: application/json` to return JSON because it turns out Cloudflare and potentially other CDNs ignore the `Vary: Accept` header entirely: - https://github.com/simonw/datasette/issues/1534 | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047 | |
1049114724 | https://github.com/simonw/datasette/issues/1439#issuecomment-1049114724 | https://api.github.com/repos/simonw/datasette/issues/1439 | IC_kwDOBm6k_c4-iDhk | simonw 9599 | 2022-02-23T19:04:40Z | 2022-02-23T19:04:40Z | OWNER | I'm going to try dash encoding for table names (and row IDs) in a branch and see how I like it. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047 | |
1049124390 | https://github.com/simonw/datasette/issues/1439#issuecomment-1049124390 | https://api.github.com/repos/simonw/datasette/issues/1439 | IC_kwDOBm6k_c4-iF4m | simonw 9599 | 2022-02-23T19:15:00Z | 2022-02-23T19:15:00Z | OWNER | I'll start by modifying this function: https://github.com/simonw/datasette/blob/458f03ad3a454d271f47a643f4530bd8b60ddb76/datasette/utils/__init__.py#L732-L749 Later I want to move this to the routing layer to split out `format` automatically, as seen in the regexes here: https://github.com/simonw/datasette/issues/1439#issuecomment-1045069481 | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047 | |
1049126151 | https://github.com/simonw/datasette/issues/1439#issuecomment-1049126151 | https://api.github.com/repos/simonw/datasette/issues/1439 | IC_kwDOBm6k_c4-iGUH | simonw 9599 | 2022-02-23T19:17:01Z | 2022-02-23T19:17:01Z | OWNER | Actually the relevant code looks to be: https://github.com/simonw/datasette/blob/7d24fd405f3c60e4c852c5d746c91aa2ba23cf5b/datasette/views/base.py#L481-L498 | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047 | |
1053973425 | https://github.com/simonw/datasette/issues/1439#issuecomment-1053973425 | https://api.github.com/repos/simonw/datasette/issues/1439 | IC_kwDOBm6k_c4-0lux | simonw 9599 | 2022-02-28T07:40:12Z | 2022-02-28T07:40:12Z | OWNER | If I make this change it will break existing links to one of the oldest Datasette demos: http://fivethirtyeight.datasettes.com/fivethirtyeight/avengers%2Favengers A plugin that fixes those by redirecting them on 404 would be neat. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047 | |
1059802318 | https://github.com/simonw/datasette/issues/1439#issuecomment-1059802318 | https://api.github.com/repos/simonw/datasette/issues/1439 | IC_kwDOBm6k_c4_K0zO | simonw 9599 | 2022-03-05T17:34:33Z | 2022-03-05T17:34:33Z | OWNER | Wrote documentation: <img width="741" alt="Dash encoding. Datasette uses a custom encoding scheme in some places, called dash encoding. This is primarily used for table names and row primary keys, to avoid any confusion between / characters in those values and the Datasette URL that references them. Dash encoding applies the following rules, in order: 1. All single - characters are replaced by -- 2. . characters are replaced by -. 3. / characters are replaced by ./ These rules are applied in reverse order to decode a dash encoded string." src="https://user-images.githubusercontent.com/9599/156893903-5723f60e-e054-4365-84bc-f3084d11183d.png"> | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047 | |
1059822151 | https://github.com/simonw/datasette/issues/1439#issuecomment-1059822151 | https://api.github.com/repos/simonw/datasette/issues/1439 | IC_kwDOBm6k_c4_K5pH | simonw 9599 | 2022-03-05T19:48:35Z | 2022-03-05T19:48:35Z | OWNER | Those new docs: https://github.com/simonw/datasette/blob/d1cb73180b4b5a07538380db76298618a5fc46b6/docs/internals.rst#dash-encoding | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047 | |
1059822391 | https://github.com/simonw/datasette/issues/1439#issuecomment-1059822391 | https://api.github.com/repos/simonw/datasette/issues/1439 | IC_kwDOBm6k_c4_K5s3 | simonw 9599 | 2022-03-05T19:50:12Z | 2022-03-05T19:50:12Z | OWNER | I'm going to move this work to a PR. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047 | |
1059836599 | https://github.com/simonw/datasette/issues/1439#issuecomment-1059836599 | https://api.github.com/repos/simonw/datasette/issues/1439 | IC_kwDOBm6k_c4_K9K3 | simonw 9599 | 2022-03-05T21:52:10Z | 2022-03-05T21:52:10Z | OWNER | Blogged about this here: https://simonwillison.net/2022/Mar/5/dash-encoding/ | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047 | |
1059850369 | https://github.com/simonw/datasette/issues/1439#issuecomment-1059850369 | https://api.github.com/repos/simonw/datasette/issues/1439 | IC_kwDOBm6k_c4_LAiB | simonw 9599 | 2022-03-05T23:28:56Z | 2022-03-05T23:28:56Z | OWNER | Lots of great conversations about the dash encoding implementation on Twitter: https://twitter.com/simonw/status/1500228316309061633 @dracos helped me figure out a simpler regex: https://twitter.com/dracos/status/1500236433809973248 `^/(?P<database>[^/]+)/(?P<table>[^\/\-\.]*|\-/|\-\.|\-\-)*(?P<format>\.\w+)?$`  | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047 | |
1059851259 | https://github.com/simonw/datasette/issues/1439#issuecomment-1059851259 | https://api.github.com/repos/simonw/datasette/issues/1439 | IC_kwDOBm6k_c4_LAv7 | simonw 9599 | 2022-03-05T23:35:47Z | 2022-03-05T23:35:59Z | OWNER | This [comment from glyph](https://twitter.com/glyph/status/1500244937312329730) got me thinking: > Have you considered replacing % with some other character and then using percent-encoding? What happens if a table name includes a `%` character and that ends up getting mangled by a misbehaving proxy? I should consider `%` in the escaping system too. And maybe go with that suggestion of using percent-encoding directly but with a different character. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047 | |
1059853526 | https://github.com/simonw/datasette/issues/1439#issuecomment-1059853526 | https://api.github.com/repos/simonw/datasette/issues/1439 | IC_kwDOBm6k_c4_LBTW | simonw 9599 | 2022-03-05T23:49:59Z | 2022-03-05T23:49:59Z | OWNER | I want to try regular percentage encoding, except that it also encodes both the `-` and the `.` characters, AND it uses `-` instead of `%` as the encoding character. Should check what it does with emoji too. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047 | |
1059854864 | https://github.com/simonw/datasette/issues/1439#issuecomment-1059854864 | https://api.github.com/repos/simonw/datasette/issues/1439 | IC_kwDOBm6k_c4_LBoQ | simonw 9599 | 2022-03-05T23:59:05Z | 2022-03-05T23:59:05Z | OWNER | OK, for that percentage thing: the Python core implementation of URL percentage escaping deliberately ignores two of the characters we want to escape: `.` and `-`: https://github.com/python/cpython/blob/6927632492cbad86a250aa006c1847e03b03e70b/Lib/urllib/parse.py#L780-L783 ```python _ALWAYS_SAFE = frozenset(b'ABCDEFGHIJKLMNOPQRSTUVWXYZ' b'abcdefghijklmnopqrstuvwxyz' b'0123456789' b'_.-~') ``` It also defaults to skipping `/` (passed as a `safe=` parameter to various things). I'm going to try borrowing and modifying the core of the Python implementation: https://github.com/python/cpython/blob/6927632492cbad86a250aa006c1847e03b03e70b/Lib/urllib/parse.py#L795-L814 ```python class _Quoter(dict): """A mapping from bytes numbers (in range(0,256)) to strings. String values are percent-encoded byte values, unless the key < 128, and in either of the specified safe set, or the always safe set. """ # Keeps a cache internally, via __missing__, for efficiency (lookups # of cached keys don't call Python code at all). def __init__(self, safe): """safe: bytes object.""" self.safe = _ALWAYS_SAFE.union(safe) def __repr__(self): return f"<Quoter {dict(self)!r}>" def __missing__(self, b): # Handle a cache miss. Store quoted string in cache and return. res = chr(b) if b in self.safe else '%{:02X}'.format(b) self[b] = res return res ``` | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047 | |
1059855418 | https://github.com/simonw/datasette/issues/1439#issuecomment-1059855418 | https://api.github.com/repos/simonw/datasette/issues/1439 | IC_kwDOBm6k_c4_LBw6 | simonw 9599 | 2022-03-06T00:00:53Z | 2022-03-06T00:04:18Z | OWNER | ```python _ESCAPE_SAFE = frozenset( b'ABCDEFGHIJKLMNOPQRSTUVWXYZ' b'abcdefghijklmnopqrstuvwxyz' b'0123456789_' ) # I removed b'.-~') class Quoter(dict): # Keeps a cache internally, via __missing__ def __missing__(self, b): # Handle a cache miss. Store quoted string in cache and return. res = chr(b) if b in _ESCAPE_SAFE else '-{:02X}'.format(b) self[b] = res return res quoter = Quoter().__getitem__ ''.join([quoter(char) for char in b'foo/bar.csv']) # 'foo-2Fbar-2Ecsv' ``` | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047 | |
1059863997 | https://github.com/simonw/datasette/issues/1439#issuecomment-1059863997 | https://api.github.com/repos/simonw/datasette/issues/1439 | IC_kwDOBm6k_c4_LD29 | karlcow 505230 | 2022-03-06T00:57:57Z | 2022-03-06T00:57:57Z | NONE | Probably too late… but I have just seen this because http://simonwillison.net/2022/Mar/5/dash-encoding/#atom-everything And it reminded me of comma tools at W3C. http://www.w3.org/,tools Example, the text version of W3C homepage https://www.w3.org/,text > The challenge comes down to telling the difference between the following: > > * `/db/table` - an HTML table page `/db/table` > * `/db/table.csv` - the CSV version of `/db/table` `/db/table,csv` > * `/db/table.csv` - no this one is actually a database table called `table.csv` `/db/table.csv` > * `/db/table.csv.csv` - the CSV version of `/db/table.csv` `/db/table.csv,csv` > * `/db/table.csv.csv.csv` and so on... `/db/table.csv.csv,csv` I haven't checked all the cases in the thread. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047 | |
1059864154 | https://github.com/simonw/datasette/issues/1439#issuecomment-1059864154 | https://api.github.com/repos/simonw/datasette/issues/1439 | IC_kwDOBm6k_c4_LD5a | simonw 9599 | 2022-03-06T00:59:04Z | 2022-03-06T00:59:04Z | OWNER | Needs more testing, but this seems to work for decoding the percent-escaped-with-dashes format: `urllib.parse.unquote(s.replace('-', '%'))` | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047 | |
1059903309 | https://github.com/simonw/datasette/issues/1439#issuecomment-1059903309 | https://api.github.com/repos/simonw/datasette/issues/1439 | IC_kwDOBm6k_c4_LNdN | simonw 9599 | 2022-03-06T06:17:51Z | 2022-03-06T06:17:51Z | OWNER | Suggestion from a conversation with Seth Michael Larson: it would be neat if plugins could easily integrate with whatever scheme this ends up using, maybe with the `/db/table/-/plugin-name` standardized pattern or similar. Making it easy for plugins to do the right, consistent thing is a good idea. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047 | |
1060044007 | https://github.com/simonw/datasette/issues/1439#issuecomment-1060044007 | https://api.github.com/repos/simonw/datasette/issues/1439 | IC_kwDOBm6k_c4_Lvzn | simonw 9599 | 2022-03-06T21:38:15Z | 2022-03-06T21:38:15Z | OWNER | Test: https://github.com/simonw/datasette/blob/d2e3fe3facf0ed0abf8b00cd54463af90dd6904d/tests/test_utils.py#L651-L666 One big advantage to this scheme is that redirecting old links to `%2F` pages (e.g. https://fivethirtyeight.datasettes.com/fivethirtyeight/twitter-ratio%2Fsenators) is easy - if you see a `%` in the `raw_path`, redirect to that page with the `%` replaced by `-`. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047 | |
1060870237 | https://github.com/simonw/datasette/issues/1439#issuecomment-1060870237 | https://api.github.com/repos/simonw/datasette/issues/1439 | IC_kwDOBm6k_c4_O5hd | simonw 9599 | 2022-03-07T16:19:22Z | 2022-03-07T16:19:22Z | OWNER | I didn't need to do any of the fancy regular expression routing stuff after all, since the new dash encoding format avoids using `/` so a simple `[^/]+` can capture the correct segments from the URL. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047 | |
1065987808 | https://github.com/simonw/datasette/issues/1439#issuecomment-1065987808 | https://api.github.com/repos/simonw/datasette/issues/1439 | IC_kwDOBm6k_c4_ia7g | simonw 9599 | 2022-03-13T00:02:32Z | 2022-03-13T00:02:32Z | OWNER | OK, this has broken a lot more than I expected it would. Turns out `-` is a very common character in existing Datasette database names! https://datasette.io/-/databases for example has two: ```json [ { "name": "docs-index", "path": "docs-index.db", "size": 1007616, "is_mutable": false, "is_memory": false, "hash": "0ac6c3de2762fcd174fd249fed8a8fa6046ea345173d22c2766186bf336462b2" }, { "name": "dogsheep-index", "path": "dogsheep-index.db", "size": 5496832, "is_mutable": false, "is_memory": false, "hash": "d1ea238d204e5b9ae783c86e4af5bcdf21267c1f391de3e468d9665494ee012a" } ] ``` | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047 | |
1065988403 | https://github.com/simonw/datasette/issues/1439#issuecomment-1065988403 | https://api.github.com/repos/simonw/datasette/issues/1439 | IC_kwDOBm6k_c4_ibEz | simonw 9599 | 2022-03-13T00:06:38Z | 2022-03-13T00:07:19Z | OWNER | If I want to reserve `-` as a character that CAN be used in URLs, the only remaining character that might make sense for escape sequences is `~` - based on this last line of characters that are escape from percentage encoding: ```python _ALWAYS_SAFE = frozenset(b'ABCDEFGHIJKLMNOPQRSTUVWXYZ' b'abcdefghijklmnopqrstuvwxyz' b'0123456789' b'_.-~') ``` So I'd add both `-` and `_` back to the safe list, but use `~` to escape `.` and `/` and suchlike. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047 | |
1068461449 | https://github.com/simonw/datasette/issues/1439#issuecomment-1068461449 | https://api.github.com/repos/simonw/datasette/issues/1439 | IC_kwDOBm6k_c4_r22J | simonw 9599 | 2022-03-15T20:51:26Z | 2022-03-15T20:51:26Z | OWNER | I'm happy with this now that I've landed Tilde encoding in #1657. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [issue] INTEGER REFERENCES [issues]([id]) , [performed_via_github_app] TEXT); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);