home / github

Menu
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

48 rows where issue = 973139047

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: user, author_association, created_at (date), updated_at (date)

id ▼ html_url issue_url node_id user created_at updated_at author_association body reactions issue performed_via_github_app
900699670 https://github.com/simonw/datasette/issues/1439#issuecomment-900699670 https://api.github.com/repos/simonw/datasette/issues/1439 IC_kwDOBm6k_c41r5YW simonw 9599 2021-08-17T23:34:23Z 2021-08-17T23:34:23Z OWNER The challenge comes down to telling the difference between the following: - `/db/table` - an HTML table page - `/db/table.csv` - the CSV version of `/db/table` - `/db/table.csv` - no this one is actually a database table called `table.csv` - `/db/table.csv.csv` - the CSV version of `/db/table.csv` - `/db/table.csv.csv.csv` and so on... {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047  
900705226 https://github.com/simonw/datasette/issues/1439#issuecomment-900705226 https://api.github.com/repos/simonw/datasette/issues/1439 IC_kwDOBm6k_c41r6vK simonw 9599 2021-08-17T23:50:32Z 2021-08-17T23:50:47Z OWNER An alternative solution would be to use some form of escaping for the characters that form the name of the table. The obvious way to do this would be URL-encoding - but it doesn't hold for `.` characters. The hex for that is `%2E` but watch what happens with that in a URL: ``` # Against Cloud Run: curl -s 'https://datasette.io/-/asgi-scope/foo/bar%2Fbaz%2E' | rg path 'path': '/-/asgi-scope/foo/bar/baz.', 'raw_path': b'/-/asgi-scope/foo/bar%2Fbaz.', 'root_path': '', # Against Vercel: curl -s 'https://til.simonwillison.net/-/asgi-scope/foo/bar%2Fbaz%2E' | rg path 'path': '/-/asgi-scope/foo/bar%2Fbaz%2E', 'raw_path': b'/-/asgi-scope/foo/bar%2Fbaz%2E', 'root_path': '', ``` Surprisingly in this case Vercel DOES keep it intact, but Cloud Run does not. It's still no good though: I need a solution that works on Vercel, Cloud Run and every other potential hosting provider too. {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047  
900709703 https://github.com/simonw/datasette/issues/1439#issuecomment-900709703 https://api.github.com/repos/simonw/datasette/issues/1439 IC_kwDOBm6k_c41r71H simonw 9599 2021-08-18T00:03:09Z 2021-08-18T00:03:09Z OWNER But... what if I invent my own escaping scheme? I actually did this once before, in https://github.com/simonw/datasette/commit/9fdb47ca952b93b7b60adddb965ea6642b1ff523 - while I was working on porting Datasette to ASGI in https://github.com/simonw/datasette/issues/272#issuecomment-494192779 because ASGI didn't yet have the `raw_path` mechanism. I could bring that back - it looked like this: ``` "table/and/slashes" => "tableU+002FandU+002Fslashes" "~table" => "U+007Etable" "+bobcats!" => "U+002Bbobcats!" "U+007Etable" => "UU+002B007Etable" ``` But I didn't particularly like it - it was quite verbose. {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047  
900711967 https://github.com/simonw/datasette/issues/1439#issuecomment-900711967 https://api.github.com/repos/simonw/datasette/issues/1439 IC_kwDOBm6k_c41r8Yf simonw 9599 2021-08-18T00:08:09Z 2021-08-18T00:08:09Z OWNER Here's an alternative I just made up which I'm calling "dot dash" encoding: ```python def dot_dash_encode(s): return s.replace("-", "--").replace(".", "-.") def dot_dash_decode(s): return s.replace("-.", ".").replace("--", "-") ``` And some examples: ```python for example in ( "hello", "hello.csv", "hello-and-so-on.csv", "hello-.csv", "hello--and--so--on-.csv", "hello.csv.", "hello.csv.-", "hello.csv.--", ): print(example) print(dot_dash_encode(example)) print(example == dot_dash_decode(dot_dash_encode(example))) print() ``` Outputs: ``` hello hello True hello.csv hello-.csv True hello-and-so-on.csv hello--and--so--on-.csv True hello-.csv hello---.csv True hello--and--so--on-.csv hello----and----so----on---.csv True hello.csv. hello-.csv-. True hello.csv.- hello-.csv-.-- True hello.csv.-- hello-.csv-.---- True ``` {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047  
900712981 https://github.com/simonw/datasette/issues/1439#issuecomment-900712981 https://api.github.com/repos/simonw/datasette/issues/1439 IC_kwDOBm6k_c41r8oV simonw 9599 2021-08-18T00:09:59Z 2021-08-18T00:12:32Z OWNER So given the original examples, a table called `table.csv` would have the following URLs: - `/db/table-.csv` - the HTML version - `/db/table-.csv.csv` - the CSV version - `/db/table-.csv.json` - the JSON version And if for some horific reason you had a table with the name `/db/table-.csv.csv` (so `/db/` was the first part of the actual table name in SQLite) the URLs would look like this: - `/db/%2Fdb%2Ftable---.csv-.csv` - the HTML version - `/db/%2Fdb%2Ftable---.csv-.csv.csv` - the CSV version - `/db/%2Fdb%2Ftable---.csv-.csv.json` - the JSON version {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047  
900714630 https://github.com/simonw/datasette/issues/1439#issuecomment-900714630 https://api.github.com/repos/simonw/datasette/issues/1439 IC_kwDOBm6k_c41r9CG simonw 9599 2021-08-18T00:13:33Z 2021-08-18T00:13:33Z OWNER The documentation should definitely cover how table names become URLs, in case any third party code needs to be able to calculate this themselves. {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047  
900715375 https://github.com/simonw/datasette/issues/1439#issuecomment-900715375 https://api.github.com/repos/simonw/datasette/issues/1439 IC_kwDOBm6k_c41r9Nv simonw 9599 2021-08-18T00:15:28Z 2021-08-18T00:15:28Z OWNER Maybe I should use `-/` to encode forward slashes too, to defend against any ASGI servers that might not implement `raw_path` correctly. {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047  
1031141849 https://github.com/simonw/datasette/issues/1439#issuecomment-1031141849 https://api.github.com/repos/simonw/datasette/issues/1439 IC_kwDOBm6k_c49dfnZ simonw 9599 2022-02-07T07:11:11Z 2022-02-07T07:11:11Z OWNER I added a Link header to solve this problem for the JSON version in: - #1533 {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047  
1045024276 https://github.com/simonw/datasette/issues/1439#issuecomment-1045024276 https://api.github.com/repos/simonw/datasette/issues/1439 IC_kwDOBm6k_c4-Sc4U simonw 9599 2022-02-18T19:01:42Z 2022-02-18T19:55:24Z OWNER > Maybe I should use `-/` to encode forward slashes too, to defend against any ASGI servers that might not implement `raw_path` correctly. ```python def dash_encode(s): return s.replace("-", "--").replace(".", "-.").replace("/", "-/") def dash_decode(s): return s.replace("-/", "/").replace("-.", ".").replace("--", "-") ``` ```pycon >>> dash_encode("foo/bar/baz.csv") 'foo-/bar-/baz-.csv' >>> dash_decode('foo-/bar-/baz-.csv') 'foo/bar/baz.csv' ``` {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047  
1045027067 https://github.com/simonw/datasette/issues/1439#issuecomment-1045027067 https://api.github.com/repos/simonw/datasette/issues/1439 IC_kwDOBm6k_c4-Sdj7 simonw 9599 2022-02-18T19:03:26Z 2022-02-18T19:03:26Z OWNER (If I make this change it may break some existing Datasette installations when they upgrade - I could try and build a plugin for them which triggers on 404s and checks to see if the old format would return a 200 response, then returns that.) {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047  
1045032377 https://github.com/simonw/datasette/issues/1439#issuecomment-1045032377 https://api.github.com/repos/simonw/datasette/issues/1439 IC_kwDOBm6k_c4-Se25 simonw 9599 2022-02-18T19:06:50Z 2022-02-18T19:06:50Z OWNER How does URL routing for https://latest.datasette.io/fixtures/table%2Fwith%2Fslashes.csv work? Right now it's https://github.com/simonw/datasette/blob/7d24fd405f3c60e4c852c5d746c91aa2ba23cf5b/datasette/app.py#L1098-L1101 That's not going to capture the dot-dash encoding version of that table name: ```pycon >>> dot_dash_encode("table/with/slashes.csv") 'table-/with-/slashes-.csv' ``` Probably needs a fancy regex trick like a negative lookbehind assertion or similar. {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047  
1045055772 https://github.com/simonw/datasette/issues/1439#issuecomment-1045055772 https://api.github.com/repos/simonw/datasette/issues/1439 IC_kwDOBm6k_c4-Skkc simonw 9599 2022-02-18T19:23:33Z 2022-02-18T19:25:42Z OWNER I want a match for this URL: /db/table-/with-/slashes-.csv Maybe this: ^/(?P<db_name>[^/]+)/(?P<table_and_format>([^/]*|(\-/)*|(\-\.)*|(\.\.)*)*$) Here we are matching a sequence of: ([^/]*|(\-/)*|(\-\.)*|(\-\-)*)* So a combination of not-slashes OR -/ or -. Or -- sequences <img width="224" alt="image" src="https://user-images.githubusercontent.com/9599/154748362-84909d4e-dccf-454b-a9cd-a036f9f66f09.png"> ^/(?P<db_name>[^/]+)/(?P<table_and_format>([^/]*|(\-/)*|(\-\.)*|(\-\-)*)*$) Try that with non-capturing bits: ^/(?P<db_name>[^/]+)/(?P<table_and_format>(?:[^/]*|(?:\-/)*|(?:\-\.)*|(?:\-\-)*)*$) `(?:[^/]*|(?:\-/)*|(?:\-\.)*|(?:\-\-)*)*` visualized is: <img width="193" alt="image" src="https://user-images.githubusercontent.com/9599/154748441-decea502-0d04-44f4-9ca9-fb6883767833.png"> Here's the explanation on regex101.com https://regex101.com/r/CPnsIO/1 <img width="1074" alt="image" src="https://user-images.githubusercontent.com/9599/154748720-cdda61db-5498-49a8-91c2-e726b394fa49.png"> {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047  
1045059427 https://github.com/simonw/datasette/issues/1439#issuecomment-1045059427 https://api.github.com/repos/simonw/datasette/issues/1439 IC_kwDOBm6k_c4-Sldj simonw 9599 2022-02-18T19:26:25Z 2022-02-18T19:26:25Z OWNER With this new pattern I could probably extract out the optional `.json` format string as part of the initial route capturing regex too, rather than the current `table_and_format` hack. {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047  
1045069481 https://github.com/simonw/datasette/issues/1439#issuecomment-1045069481 https://api.github.com/repos/simonw/datasette/issues/1439 IC_kwDOBm6k_c4-Sn6p simonw 9599 2022-02-18T19:34:41Z 2022-03-05T21:32:22Z OWNER I think I got format extraction working! https://regex101.com/r/A0bW1D/1 ^/(?P<database>[^/]+)/(?P<table>(?:[^\/\-\.]*|(?:\-/)*|(?:\-\.)*|(?:\-\-)*)*?)(?:(?<!\-)\.(?P<format>\w+))?$ I had to make that crazy inner one even more complicated to stop it from capturing `.` that was not part of `-.`. (?:[^\/\-\.]*|(?:\-/)*|(?:\-\.)*|(?:\-\-)*)* Visualized: <img width="222" alt="image" src="https://user-images.githubusercontent.com/9599/154749714-44579899-5dc7-4e5f-ad4f-dc59dac48979.png"> So now I have a regex which can extract out the dot-encoded table name AND spot if there is an optional `.format` at the end: <img width="1090" alt="image" src="https://user-images.githubusercontent.com/9599/156900484-7912073f-28aa-4301-86e2-e5cbe625e1d5.png"> If I end up using this in Datasette it's going to need VERY comprehensive unit tests and inline documentation. {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047  
1045075207 https://github.com/simonw/datasette/issues/1439#issuecomment-1045075207 https://api.github.com/repos/simonw/datasette/issues/1439 IC_kwDOBm6k_c4-SpUH simonw 9599 2022-02-18T19:39:35Z 2022-02-18T19:40:13Z OWNER > And if for some horific reason you had a table with the name `/db/table-.csv.csv` (so `/db/` was the first part of the actual table name in SQLite) the URLs would look like this: > > * `/db/%2Fdb%2Ftable---.csv-.csv` - the HTML version > * `/db/%2Fdb%2Ftable---.csv-.csv.csv` - the CSV version > * `/db/%2Fdb%2Ftable---.csv-.csv.json` - the JSON version Here's what those look like with the updated version of `dot_dash_encode()` that also encodes `/` as `-/`: - `/db/-/db-/table---.csv-.csv` - HTML - `/db/-/db-/table---.csv-.csv.csv` - CSV - `/db/-/db-/table---.csv-.csv.json` - JSON <img width="1050" alt="image" src="https://user-images.githubusercontent.com/9599/154750631-a8a23c62-3dfc-43e4-8026-4d117dc4bf8d.png"> {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047  
1045077590 https://github.com/simonw/datasette/issues/1439#issuecomment-1045077590 https://api.github.com/repos/simonw/datasette/issues/1439 IC_kwDOBm6k_c4-Sp5W simonw 9599 2022-02-18T19:41:37Z 2022-02-18T19:42:41Z OWNER Ugh, one disadvantage I just spotted with this: Datasette already has a `/-/versions.json` convention where "system" URLs are namespaced under `/-/` - but that could be confused under this new scheme with the `-/` escaping sequence. And I've thought about adding `/db/-/special` and `/db/table/-/special` URLs in the past too. Maybe change this system to use `.` as the escaping character instead of `-`? {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047  
1045081042 https://github.com/simonw/datasette/issues/1439#issuecomment-1045081042 https://api.github.com/repos/simonw/datasette/issues/1439 IC_kwDOBm6k_c4-SqvS simonw 9599 2022-02-18T19:44:12Z 2022-02-18T19:51:34Z OWNER ```python def dot_encode(s): return s.replace(".", "..").replace("/", "./") def dot_decode(s): return s.replace("./", "/").replace("..", ".") ``` No need for hyphen encoding in this variant at all, which simplifies things a bit. (Update: this is flawed, see https://github.com/simonw/datasette/issues/1439#issuecomment-1045086033) {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047  
1045082891 https://github.com/simonw/datasette/issues/1439#issuecomment-1045082891 https://api.github.com/repos/simonw/datasette/issues/1439 IC_kwDOBm6k_c4-SrML simonw 9599 2022-02-18T19:45:32Z 2022-02-18T19:45:32Z OWNER ```pycon >>> dot_encode("/db/table-.csv.csv") './db./table-..csv..csv' >>> dot_decode('./db./table-..csv..csv') '/db/table-.csv.csv' ``` I worry that web servers might treat `./` in a special way though. {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047  
1045086033 https://github.com/simonw/datasette/issues/1439#issuecomment-1045086033 https://api.github.com/repos/simonw/datasette/issues/1439 IC_kwDOBm6k_c4-Sr9R simonw 9599 2022-02-18T19:47:43Z 2022-02-18T19:51:11Z OWNER - https://datasette.io/-/asgi-scope/db/./db./table-..csv..csv - https://til.simonwillison.net/-/asgi-scope/db/./db./table-..csv..csv Do both of those survive the round-trip to populate `raw_path` correctly? No! In both cases the `/./` bit goes missing. It looks like this might even be a client issue - `curl` shows me this: ``` ~ % curl -vv -i 'https://datasette.io/-/asgi-scope/db/./db./table-..csv..csv' * Trying 216.239.32.21:443... * Connected to datasette.io (216.239.32.21) port 443 (#0) * ALPN, offering http/1.1 * TLS 1.2 connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 * Server certificate: datasette.io * Server certificate: R3 * Server certificate: ISRG Root X1 > GET /-/asgi-scope/db/db./table-..csv..csv HTTP/1.1 ``` So `curl` decided to turn `/-/asgi-scope/db/./db./table` into `/-/asgi-scope/db/db./table` before even sending the request. {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047  
1045095348 https://github.com/simonw/datasette/issues/1439#issuecomment-1045095348 https://api.github.com/repos/simonw/datasette/issues/1439 IC_kwDOBm6k_c4-SuO0 simonw 9599 2022-02-18T19:53:48Z 2022-02-18T19:53:48Z OWNER > Ugh, one disadvantage I just spotted with this: Datasette already has a `/-/versions.json` convention where "system" URLs are namespaced under `/-/` - but that could be confused under this new scheme with the `-/` escaping sequence. > > And I've thought about adding `/db/-/special` and `/db/table/-/special` URLs in the past too. I don't think this matters. The new regex does indeed capture that kind of page: <img width="1052" alt="image" src="https://user-images.githubusercontent.com/9599/154752309-e1787755-3bdb-47c2-867c-7ac5fe65664d.png"> But Datasette goes through configured route regular expressions in order - so I can have the regex that captures `/db/-/special` routes listed before the one that captures tables and formats. {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047  
1045099290 https://github.com/simonw/datasette/issues/1439#issuecomment-1045099290 https://api.github.com/repos/simonw/datasette/issues/1439 IC_kwDOBm6k_c4-SvMa simonw 9599 2022-02-18T19:56:18Z 2022-02-18T19:56:30Z OWNER > ```python > def dash_encode(s): > return s.replace("-", "--").replace(".", "-.").replace("/", "-/") > > def dash_decode(s): > return s.replace("-/", "/").replace("-.", ".").replace("--", "-") > ``` I think **dash-encoding** (new name for this) is the right way forward here. {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047  
1045108611 https://github.com/simonw/datasette/issues/1439#issuecomment-1045108611 https://api.github.com/repos/simonw/datasette/issues/1439 IC_kwDOBm6k_c4-SxeD simonw 9599 2022-02-18T20:02:19Z 2022-02-18T20:08:34Z OWNER One other potential variant: ```python def dash_encode(s): return s.replace("-", "-dash-").replace(".", "-dot-").replace("/", "-slash-") def dash_decode(s): return s.replace("-slash-", "/").replace("-dot-", ".").replace("-dash-", "-") ``` Except this has bugs - it doesn't round-trip safely, because it can get confused about things like `-dash-slash-` in terms of is that a `-dash-` or a `-slash-`? ```pycon >>> dash_encode("/db/table-.csv.csv") '-slash-db-slash-table-dash--dot-csv-dot-csv' >>> dash_decode('-slash-db-slash-table-dash--dot-csv-dot-csv') '/db/table-.csv.csv' >>> dash_encode('-slash-db-slash-table-dash--dot-csv-dot-csv') '-dash-slash-dash-db-dash-slash-dash-table-dash-dash-dash--dash-dot-dash-csv-dash-dot-dash-csv' >>> dash_decode('-dash-slash-dash-db-dash-slash-dash-table-dash-dash-dash--dash-dot-dash-csv-dash-dot-dash-csv') '-dash/dash-db-dash/dash-table-dash--dash.dash-csv-dash.dash-csv' ``` {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047  
1045111309 https://github.com/simonw/datasette/issues/1439#issuecomment-1045111309 https://api.github.com/repos/simonw/datasette/issues/1439 IC_kwDOBm6k_c4-SyIN simonw 9599 2022-02-18T20:04:24Z 2022-02-18T20:05:40Z OWNER This made me worry that my current `dash_decode()` implementation had unknown round-trip bugs, but thankfully this works OK: ```pycon >>> dash_encode("/db/table-.csv.csv") '-/db-/table---.csv-.csv' >>> dash_encode('-/db-/table---.csv-.csv') '---/db---/table-------.csv---.csv' >>> dash_decode('---/db---/table-------.csv---.csv') '-/db-/table---.csv-.csv' >>> dash_decode('-/db-/table---.csv-.csv') '/db/table-.csv.csv' ``` The regex still works against that double-encoded example too: <img width="1032" alt="image" src="https://user-images.githubusercontent.com/9599/154753916-b7d2159e-4284-4c92-ae61-110671fa320e.png"> {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047  
1045117304 https://github.com/simonw/datasette/issues/1439#issuecomment-1045117304 https://api.github.com/repos/simonw/datasette/issues/1439 IC_kwDOBm6k_c4-Szl4 simonw 9599 2022-02-18T20:09:22Z 2022-02-18T20:09:22Z OWNER Adopting this could result in supporting database files with surprising characters in their filename too. {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047  
1045131086 https://github.com/simonw/datasette/issues/1439#issuecomment-1045131086 https://api.github.com/repos/simonw/datasette/issues/1439 IC_kwDOBm6k_c4-S29O simonw 9599 2022-02-18T20:22:13Z 2022-02-18T20:22:47Z OWNER Should it encode `%` symbols too, since they have a special meaning in URLs and we can't guarantee that every single web server / proxy out there will round-trip them safely using percentage encoding? If so, would need to pick a different encoding character for them. Maybe `%` becomes `-p` - and in that case `/` could become `-s` too. Is it worth expanding dash-encoding outside of just `/` and `-` and `.` though? Not sure. {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047  
1045134050 https://github.com/simonw/datasette/issues/1439#issuecomment-1045134050 https://api.github.com/repos/simonw/datasette/issues/1439 IC_kwDOBm6k_c4-S3ri simonw 9599 2022-02-18T20:25:04Z 2022-02-18T20:25:04Z OWNER Here's a useful modern spec for how existing URL percentage encoding is supposed to work: https://url.spec.whatwg.org/#percent-encoded-bytes {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047  
1045269544 https://github.com/simonw/datasette/issues/1439#issuecomment-1045269544 https://api.github.com/repos/simonw/datasette/issues/1439 IC_kwDOBm6k_c4-TYwo simonw 9599 2022-02-18T22:19:29Z 2022-02-18T22:19:29Z OWNER Note that I've ruled out using `Accept: application/json` to return JSON because it turns out Cloudflare and potentially other CDNs ignore the `Vary: Accept` header entirely: - https://github.com/simonw/datasette/issues/1534 {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047  
1049114724 https://github.com/simonw/datasette/issues/1439#issuecomment-1049114724 https://api.github.com/repos/simonw/datasette/issues/1439 IC_kwDOBm6k_c4-iDhk simonw 9599 2022-02-23T19:04:40Z 2022-02-23T19:04:40Z OWNER I'm going to try dash encoding for table names (and row IDs) in a branch and see how I like it. {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047  
1049124390 https://github.com/simonw/datasette/issues/1439#issuecomment-1049124390 https://api.github.com/repos/simonw/datasette/issues/1439 IC_kwDOBm6k_c4-iF4m simonw 9599 2022-02-23T19:15:00Z 2022-02-23T19:15:00Z OWNER I'll start by modifying this function: https://github.com/simonw/datasette/blob/458f03ad3a454d271f47a643f4530bd8b60ddb76/datasette/utils/__init__.py#L732-L749 Later I want to move this to the routing layer to split out `format` automatically, as seen in the regexes here: https://github.com/simonw/datasette/issues/1439#issuecomment-1045069481 {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047  
1049126151 https://github.com/simonw/datasette/issues/1439#issuecomment-1049126151 https://api.github.com/repos/simonw/datasette/issues/1439 IC_kwDOBm6k_c4-iGUH simonw 9599 2022-02-23T19:17:01Z 2022-02-23T19:17:01Z OWNER Actually the relevant code looks to be: https://github.com/simonw/datasette/blob/7d24fd405f3c60e4c852c5d746c91aa2ba23cf5b/datasette/views/base.py#L481-L498 {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047  
1053973425 https://github.com/simonw/datasette/issues/1439#issuecomment-1053973425 https://api.github.com/repos/simonw/datasette/issues/1439 IC_kwDOBm6k_c4-0lux simonw 9599 2022-02-28T07:40:12Z 2022-02-28T07:40:12Z OWNER If I make this change it will break existing links to one of the oldest Datasette demos: http://fivethirtyeight.datasettes.com/fivethirtyeight/avengers%2Favengers A plugin that fixes those by redirecting them on 404 would be neat. {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047  
1059802318 https://github.com/simonw/datasette/issues/1439#issuecomment-1059802318 https://api.github.com/repos/simonw/datasette/issues/1439 IC_kwDOBm6k_c4_K0zO simonw 9599 2022-03-05T17:34:33Z 2022-03-05T17:34:33Z OWNER Wrote documentation: <img width="741" alt="Dash encoding. Datasette uses a custom encoding scheme in some places, called dash encoding. This is primarily used for table names and row primary keys, to avoid any confusion between / characters in those values and the Datasette URL that references them. Dash encoding applies the following rules, in order: 1. All single - characters are replaced by -- 2. . characters are replaced by -. 3. / characters are replaced by ./ These rules are applied in reverse order to decode a dash encoded string." src="https://user-images.githubusercontent.com/9599/156893903-5723f60e-e054-4365-84bc-f3084d11183d.png"> {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047  
1059822151 https://github.com/simonw/datasette/issues/1439#issuecomment-1059822151 https://api.github.com/repos/simonw/datasette/issues/1439 IC_kwDOBm6k_c4_K5pH simonw 9599 2022-03-05T19:48:35Z 2022-03-05T19:48:35Z OWNER Those new docs: https://github.com/simonw/datasette/blob/d1cb73180b4b5a07538380db76298618a5fc46b6/docs/internals.rst#dash-encoding {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047  
1059822391 https://github.com/simonw/datasette/issues/1439#issuecomment-1059822391 https://api.github.com/repos/simonw/datasette/issues/1439 IC_kwDOBm6k_c4_K5s3 simonw 9599 2022-03-05T19:50:12Z 2022-03-05T19:50:12Z OWNER I'm going to move this work to a PR. {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047  
1059836599 https://github.com/simonw/datasette/issues/1439#issuecomment-1059836599 https://api.github.com/repos/simonw/datasette/issues/1439 IC_kwDOBm6k_c4_K9K3 simonw 9599 2022-03-05T21:52:10Z 2022-03-05T21:52:10Z OWNER Blogged about this here: https://simonwillison.net/2022/Mar/5/dash-encoding/ {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047  
1059850369 https://github.com/simonw/datasette/issues/1439#issuecomment-1059850369 https://api.github.com/repos/simonw/datasette/issues/1439 IC_kwDOBm6k_c4_LAiB simonw 9599 2022-03-05T23:28:56Z 2022-03-05T23:28:56Z OWNER Lots of great conversations about the dash encoding implementation on Twitter: https://twitter.com/simonw/status/1500228316309061633 @dracos helped me figure out a simpler regex: https://twitter.com/dracos/status/1500236433809973248 `^/(?P<database>[^/]+)/(?P<table>[^\/\-\.]*|\-/|\-\.|\-\-)*(?P<format>\.\w+)?$` ![image](https://user-images.githubusercontent.com/9599/156903088-c01933ae-4713-4e91-8d71-affebf70b945.png) {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047  
1059851259 https://github.com/simonw/datasette/issues/1439#issuecomment-1059851259 https://api.github.com/repos/simonw/datasette/issues/1439 IC_kwDOBm6k_c4_LAv7 simonw 9599 2022-03-05T23:35:47Z 2022-03-05T23:35:59Z OWNER This [comment from glyph](https://twitter.com/glyph/status/1500244937312329730) got me thinking: > Have you considered replacing % with some other character and then using percent-encoding? What happens if a table name includes a `%` character and that ends up getting mangled by a misbehaving proxy? I should consider `%` in the escaping system too. And maybe go with that suggestion of using percent-encoding directly but with a different character. {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047  
1059853526 https://github.com/simonw/datasette/issues/1439#issuecomment-1059853526 https://api.github.com/repos/simonw/datasette/issues/1439 IC_kwDOBm6k_c4_LBTW simonw 9599 2022-03-05T23:49:59Z 2022-03-05T23:49:59Z OWNER I want to try regular percentage encoding, except that it also encodes both the `-` and the `.` characters, AND it uses `-` instead of `%` as the encoding character. Should check what it does with emoji too. {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047  
1059854864 https://github.com/simonw/datasette/issues/1439#issuecomment-1059854864 https://api.github.com/repos/simonw/datasette/issues/1439 IC_kwDOBm6k_c4_LBoQ simonw 9599 2022-03-05T23:59:05Z 2022-03-05T23:59:05Z OWNER OK, for that percentage thing: the Python core implementation of URL percentage escaping deliberately ignores two of the characters we want to escape: `.` and `-`: https://github.com/python/cpython/blob/6927632492cbad86a250aa006c1847e03b03e70b/Lib/urllib/parse.py#L780-L783 ```python _ALWAYS_SAFE = frozenset(b'ABCDEFGHIJKLMNOPQRSTUVWXYZ' b'abcdefghijklmnopqrstuvwxyz' b'0123456789' b'_.-~') ``` It also defaults to skipping `/` (passed as a `safe=` parameter to various things). I'm going to try borrowing and modifying the core of the Python implementation: https://github.com/python/cpython/blob/6927632492cbad86a250aa006c1847e03b03e70b/Lib/urllib/parse.py#L795-L814 ```python class _Quoter(dict): """A mapping from bytes numbers (in range(0,256)) to strings. String values are percent-encoded byte values, unless the key < 128, and in either of the specified safe set, or the always safe set. """ # Keeps a cache internally, via __missing__, for efficiency (lookups # of cached keys don't call Python code at all). def __init__(self, safe): """safe: bytes object.""" self.safe = _ALWAYS_SAFE.union(safe) def __repr__(self): return f"<Quoter {dict(self)!r}>" def __missing__(self, b): # Handle a cache miss. Store quoted string in cache and return. res = chr(b) if b in self.safe else '%{:02X}'.format(b) self[b] = res return res ``` {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047  
1059855418 https://github.com/simonw/datasette/issues/1439#issuecomment-1059855418 https://api.github.com/repos/simonw/datasette/issues/1439 IC_kwDOBm6k_c4_LBw6 simonw 9599 2022-03-06T00:00:53Z 2022-03-06T00:04:18Z OWNER ```python _ESCAPE_SAFE = frozenset( b'ABCDEFGHIJKLMNOPQRSTUVWXYZ' b'abcdefghijklmnopqrstuvwxyz' b'0123456789_' ) # I removed b'.-~') class Quoter(dict): # Keeps a cache internally, via __missing__ def __missing__(self, b): # Handle a cache miss. Store quoted string in cache and return. res = chr(b) if b in _ESCAPE_SAFE else '-{:02X}'.format(b) self[b] = res return res quoter = Quoter().__getitem__ ''.join([quoter(char) for char in b'foo/bar.csv']) # 'foo-2Fbar-2Ecsv' ``` {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047  
1059863997 https://github.com/simonw/datasette/issues/1439#issuecomment-1059863997 https://api.github.com/repos/simonw/datasette/issues/1439 IC_kwDOBm6k_c4_LD29 karlcow 505230 2022-03-06T00:57:57Z 2022-03-06T00:57:57Z NONE Probably too late… but I have just seen this because http://simonwillison.net/2022/Mar/5/dash-encoding/#atom-everything And it reminded me of comma tools at W3C. http://www.w3.org/,tools Example, the text version of W3C homepage https://www.w3.org/,text > The challenge comes down to telling the difference between the following: > > * `/db/table` - an HTML table page `/db/table` > * `/db/table.csv` - the CSV version of `/db/table` `/db/table,csv` > * `/db/table.csv` - no this one is actually a database table called `table.csv` `/db/table.csv` > * `/db/table.csv.csv` - the CSV version of `/db/table.csv` `/db/table.csv,csv` > * `/db/table.csv.csv.csv` and so on... `/db/table.csv.csv,csv` I haven't checked all the cases in the thread. {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047  
1059864154 https://github.com/simonw/datasette/issues/1439#issuecomment-1059864154 https://api.github.com/repos/simonw/datasette/issues/1439 IC_kwDOBm6k_c4_LD5a simonw 9599 2022-03-06T00:59:04Z 2022-03-06T00:59:04Z OWNER Needs more testing, but this seems to work for decoding the percent-escaped-with-dashes format: `urllib.parse.unquote(s.replace('-', '%'))` {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047  
1059903309 https://github.com/simonw/datasette/issues/1439#issuecomment-1059903309 https://api.github.com/repos/simonw/datasette/issues/1439 IC_kwDOBm6k_c4_LNdN simonw 9599 2022-03-06T06:17:51Z 2022-03-06T06:17:51Z OWNER Suggestion from a conversation with Seth Michael Larson: it would be neat if plugins could easily integrate with whatever scheme this ends up using, maybe with the `/db/table/-/plugin-name` standardized pattern or similar. Making it easy for plugins to do the right, consistent thing is a good idea. {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047  
1060044007 https://github.com/simonw/datasette/issues/1439#issuecomment-1060044007 https://api.github.com/repos/simonw/datasette/issues/1439 IC_kwDOBm6k_c4_Lvzn simonw 9599 2022-03-06T21:38:15Z 2022-03-06T21:38:15Z OWNER Test: https://github.com/simonw/datasette/blob/d2e3fe3facf0ed0abf8b00cd54463af90dd6904d/tests/test_utils.py#L651-L666 One big advantage to this scheme is that redirecting old links to `%2F` pages (e.g. https://fivethirtyeight.datasettes.com/fivethirtyeight/twitter-ratio%2Fsenators) is easy - if you see a `%` in the `raw_path`, redirect to that page with the `%` replaced by `-`. {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047  
1060870237 https://github.com/simonw/datasette/issues/1439#issuecomment-1060870237 https://api.github.com/repos/simonw/datasette/issues/1439 IC_kwDOBm6k_c4_O5hd simonw 9599 2022-03-07T16:19:22Z 2022-03-07T16:19:22Z OWNER I didn't need to do any of the fancy regular expression routing stuff after all, since the new dash encoding format avoids using `/` so a simple `[^/]+` can capture the correct segments from the URL. {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047  
1065987808 https://github.com/simonw/datasette/issues/1439#issuecomment-1065987808 https://api.github.com/repos/simonw/datasette/issues/1439 IC_kwDOBm6k_c4_ia7g simonw 9599 2022-03-13T00:02:32Z 2022-03-13T00:02:32Z OWNER OK, this has broken a lot more than I expected it would. Turns out `-` is a very common character in existing Datasette database names! https://datasette.io/-/databases for example has two: ```json [ { "name": "docs-index", "path": "docs-index.db", "size": 1007616, "is_mutable": false, "is_memory": false, "hash": "0ac6c3de2762fcd174fd249fed8a8fa6046ea345173d22c2766186bf336462b2" }, { "name": "dogsheep-index", "path": "dogsheep-index.db", "size": 5496832, "is_mutable": false, "is_memory": false, "hash": "d1ea238d204e5b9ae783c86e4af5bcdf21267c1f391de3e468d9665494ee012a" } ] ``` {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047  
1065988403 https://github.com/simonw/datasette/issues/1439#issuecomment-1065988403 https://api.github.com/repos/simonw/datasette/issues/1439 IC_kwDOBm6k_c4_ibEz simonw 9599 2022-03-13T00:06:38Z 2022-03-13T00:07:19Z OWNER If I want to reserve `-` as a character that CAN be used in URLs, the only remaining character that might make sense for escape sequences is `~` - based on this last line of characters that are escape from percentage encoding: ```python _ALWAYS_SAFE = frozenset(b'ABCDEFGHIJKLMNOPQRSTUVWXYZ' b'abcdefghijklmnopqrstuvwxyz' b'0123456789' b'_.-~') ``` So I'd add both `-` and `_` back to the safe list, but use `~` to escape `.` and `/` and suchlike. {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047  
1068461449 https://github.com/simonw/datasette/issues/1439#issuecomment-1068461449 https://api.github.com/repos/simonw/datasette/issues/1439 IC_kwDOBm6k_c4_r22J simonw 9599 2022-03-15T20:51:26Z 2022-03-15T20:51:26Z OWNER I'm happy with this now that I've landed Tilde encoding in #1657. {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047  

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);
Powered by Datasette · Queries took 21.394ms · About: simonw/datasette-graphql