github: issue_comments: 48 rows where issue = 973139047

48 rows where issue = 973139047

Search:

descending

id ▼	html_url	issue_url	node_id	user	created_at	updated_at	author_association	body	reactions	issue
900699670	https://github.com/simonw/datasette/issues/1439#issuecomment-900699670	https://api.github.com/repos/simonw/datasette/issues/1439	IC_kwDOBm6k_c41r5YW	simonw 9599	2021-08-17T23:34:23Z	2021-08-17T23:34:23Z	OWNER	The challenge comes down to telling the difference between the following: - `/db/table` - an HTML table page - `/db/table.csv` - the CSV version of `/db/table` - `/db/table.csv` - no this one is actually a database table called `table.csv` - `/db/table.csv.csv` - the CSV version of `/db/table.csv` - `/db/table.csv.csv.csv` and so on...	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047
900705226	https://github.com/simonw/datasette/issues/1439#issuecomment-900705226	https://api.github.com/repos/simonw/datasette/issues/1439	IC_kwDOBm6k_c41r6vK	simonw 9599	2021-08-17T23:50:32Z	2021-08-17T23:50:47Z	OWNER	An alternative solution would be to use some form of escaping for the characters that form the name of the table. The obvious way to do this would be URL-encoding - but it doesn't hold for `.` characters. The hex for that is `%2E` but watch what happens with that in a URL: ``` # Against Cloud Run: curl -s 'https://datasette.io/-/asgi-scope/foo/bar%2Fbaz%2E' \| rg path 'path': '/-/asgi-scope/foo/bar/baz.', 'raw_path': b'/-/asgi-scope/foo/bar%2Fbaz.', 'root_path': '', # Against Vercel: curl -s 'https://til.simonwillison.net/-/asgi-scope/foo/bar%2Fbaz%2E' \| rg path 'path': '/-/asgi-scope/foo/bar%2Fbaz%2E', 'raw_path': b'/-/asgi-scope/foo/bar%2Fbaz%2E', 'root_path': '', ``` Surprisingly in this case Vercel DOES keep it intact, but Cloud Run does not. It's still no good though: I need a solution that works on Vercel, Cloud Run and every other potential hosting provider too.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047
900709703	https://github.com/simonw/datasette/issues/1439#issuecomment-900709703	https://api.github.com/repos/simonw/datasette/issues/1439	IC_kwDOBm6k_c41r71H	simonw 9599	2021-08-18T00:03:09Z	2021-08-18T00:03:09Z	OWNER	But... what if I invent my own escaping scheme? I actually did this once before, in https://github.com/simonw/datasette/commit/9fdb47ca952b93b7b60adddb965ea6642b1ff523 - while I was working on porting Datasette to ASGI in https://github.com/simonw/datasette/issues/272#issuecomment-494192779 because ASGI didn't yet have the `raw_path` mechanism. I could bring that back - it looked like this: ``` "table/and/slashes" => "tableU+002FandU+002Fslashes" "~table" => "U+007Etable" "+bobcats!" => "U+002Bbobcats!" "U+007Etable" => "UU+002B007Etable" ``` But I didn't particularly like it - it was quite verbose.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047
900711967	https://github.com/simonw/datasette/issues/1439#issuecomment-900711967	https://api.github.com/repos/simonw/datasette/issues/1439	IC_kwDOBm6k_c41r8Yf	simonw 9599	2021-08-18T00:08:09Z	2021-08-18T00:08:09Z	OWNER	Here's an alternative I just made up which I'm calling "dot dash" encoding: ```python def dot_dash_encode(s): return s.replace("-", "--").replace(".", "-.") def dot_dash_decode(s): return s.replace("-.", ".").replace("--", "-") ``` And some examples: ```python for example in ( "hello", "hello.csv", "hello-and-so-on.csv", "hello-.csv", "hello--and--so--on-.csv", "hello.csv.", "hello.csv.-", "hello.csv.--", ): print(example) print(dot_dash_encode(example)) print(example == dot_dash_decode(dot_dash_encode(example))) print() ``` Outputs: ``` hello hello True hello.csv hello-.csv True hello-and-so-on.csv hello--and--so--on-.csv True hello-.csv hello---.csv True hello--and--so--on-.csv hello----and----so----on---.csv True hello.csv. hello-.csv-. True hello.csv.- hello-.csv-.-- True hello.csv.-- hello-.csv-.---- True ```	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047
900712981	https://github.com/simonw/datasette/issues/1439#issuecomment-900712981	https://api.github.com/repos/simonw/datasette/issues/1439	IC_kwDOBm6k_c41r8oV	simonw 9599	2021-08-18T00:09:59Z	2021-08-18T00:12:32Z	OWNER	So given the original examples, a table called `table.csv` would have the following URLs: - `/db/table-.csv` - the HTML version - `/db/table-.csv.csv` - the CSV version - `/db/table-.csv.json` - the JSON version And if for some horific reason you had a table with the name `/db/table-.csv.csv` (so `/db/` was the first part of the actual table name in SQLite) the URLs would look like this: - `/db/%2Fdb%2Ftable---.csv-.csv` - the HTML version - `/db/%2Fdb%2Ftable---.csv-.csv.csv` - the CSV version - `/db/%2Fdb%2Ftable---.csv-.csv.json` - the JSON version	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047
900714630	https://github.com/simonw/datasette/issues/1439#issuecomment-900714630	https://api.github.com/repos/simonw/datasette/issues/1439	IC_kwDOBm6k_c41r9CG	simonw 9599	2021-08-18T00:13:33Z	2021-08-18T00:13:33Z	OWNER	The documentation should definitely cover how table names become URLs, in case any third party code needs to be able to calculate this themselves.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047
900715375	https://github.com/simonw/datasette/issues/1439#issuecomment-900715375	https://api.github.com/repos/simonw/datasette/issues/1439	IC_kwDOBm6k_c41r9Nv	simonw 9599	2021-08-18T00:15:28Z	2021-08-18T00:15:28Z	OWNER	Maybe I should use `-/` to encode forward slashes too, to defend against any ASGI servers that might not implement `raw_path` correctly.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047
1031141849	https://github.com/simonw/datasette/issues/1439#issuecomment-1031141849	https://api.github.com/repos/simonw/datasette/issues/1439	IC_kwDOBm6k_c49dfnZ	simonw 9599	2022-02-07T07:11:11Z	2022-02-07T07:11:11Z	OWNER	I added a Link header to solve this problem for the JSON version in: - #1533	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047
1045024276	https://github.com/simonw/datasette/issues/1439#issuecomment-1045024276	https://api.github.com/repos/simonw/datasette/issues/1439	IC_kwDOBm6k_c4-Sc4U	simonw 9599	2022-02-18T19:01:42Z	2022-02-18T19:55:24Z	OWNER	> Maybe I should use `-/` to encode forward slashes too, to defend against any ASGI servers that might not implement `raw_path` correctly. ```python def dash_encode(s): return s.replace("-", "--").replace(".", "-.").replace("/", "-/") def dash_decode(s): return s.replace("-/", "/").replace("-.", ".").replace("--", "-") ``` ```pycon >>> dash_encode("foo/bar/baz.csv") 'foo-/bar-/baz-.csv' >>> dash_decode('foo-/bar-/baz-.csv') 'foo/bar/baz.csv' ```	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047
1045027067	https://github.com/simonw/datasette/issues/1439#issuecomment-1045027067	https://api.github.com/repos/simonw/datasette/issues/1439	IC_kwDOBm6k_c4-Sdj7	simonw 9599	2022-02-18T19:03:26Z	2022-02-18T19:03:26Z	OWNER	(If I make this change it may break some existing Datasette installations when they upgrade - I could try and build a plugin for them which triggers on 404s and checks to see if the old format would return a 200 response, then returns that.)	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047
1045032377	https://github.com/simonw/datasette/issues/1439#issuecomment-1045032377	https://api.github.com/repos/simonw/datasette/issues/1439	IC_kwDOBm6k_c4-Se25	simonw 9599	2022-02-18T19:06:50Z	2022-02-18T19:06:50Z	OWNER	How does URL routing for https://latest.datasette.io/fixtures/table%2Fwith%2Fslashes.csv work? Right now it's https://github.com/simonw/datasette/blob/7d24fd405f3c60e4c852c5d746c91aa2ba23cf5b/datasette/app.py#L1098-L1101 That's not going to capture the dot-dash encoding version of that table name: ```pycon >>> dot_dash_encode("table/with/slashes.csv") 'table-/with-/slashes-.csv' ``` Probably needs a fancy regex trick like a negative lookbehind assertion or similar.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047
1045055772	https://github.com/simonw/datasette/issues/1439#issuecomment-1045055772	https://api.github.com/repos/simonw/datasette/issues/1439	IC_kwDOBm6k_c4-Skkc	simonw 9599	2022-02-18T19:23:33Z	2022-02-18T19:25:42Z	OWNER	I want a match for this URL: /db/table-/with-/slashes-.csv Maybe this: ^/(?P<db_name>[^/]+)/(?P<table_and_format>([^/]\|(\-/)\|(\-\.)\|(\.\.))$) Here we are matching a sequence of: ([^/]\|(\-/)\|(\-\.)\|(\-\-)) So a combination of not-slashes OR -/ or -. Or -- sequences <img width="224" alt="image" src="https://user-images.githubusercontent.com/9599/154748362-84909d4e-dccf-454b-a9cd-a036f9f66f09.png"> ^/(?P<db_name>[^/]+)/(?P<table_and_format>([^/]\|(\-/)\|(\-\.)\|(\-\-))$) Try that with non-capturing bits: ^/(?P<db_name>[^/]+)/(?P<table_and_format>(?:[^/]\|(?:\-/)\|(?:\-\.)\|(?:\-\-))$) `(?:[^/]\|(?:\-/)\|(?:\-\.)\|(?:\-\-))*` visualized is: <img width="193" alt="image" src="https://user-images.githubusercontent.com/9599/154748441-decea502-0d04-44f4-9ca9-fb6883767833.png"> Here's the explanation on regex101.com https://regex101.com/r/CPnsIO/1 <img width="1074" alt="image" src="https://user-images.githubusercontent.com/9599/154748720-cdda61db-5498-49a8-91c2-e726b394fa49.png">	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047
1045059427	https://github.com/simonw/datasette/issues/1439#issuecomment-1045059427	https://api.github.com/repos/simonw/datasette/issues/1439	IC_kwDOBm6k_c4-Sldj	simonw 9599	2022-02-18T19:26:25Z	2022-02-18T19:26:25Z	OWNER	With this new pattern I could probably extract out the optional `.json` format string as part of the initial route capturing regex too, rather than the current `table_and_format` hack.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047
1045069481	https://github.com/simonw/datasette/issues/1439#issuecomment-1045069481	https://api.github.com/repos/simonw/datasette/issues/1439	IC_kwDOBm6k_c4-Sn6p	simonw 9599	2022-02-18T19:34:41Z	2022-03-05T21:32:22Z	OWNER	I think I got format extraction working! https://regex101.com/r/A0bW1D/1 ^/(?P<database>[^/]+)/(?P<table>(?:[^\/\-\.]\|(?:\-/)\|(?:\-\.)\|(?:\-\-))?)(?:(?<!\-)\.(?P<format>\w+))?$ I had to make that crazy inner one even more complicated to stop it from capturing `.` that was not part of `-.`. (?:[^\/\-\.]\|(?:\-/)\|(?:\-\.)\|(?:\-\-)) Visualized: <img width="222" alt="image" src="https://user-images.githubusercontent.com/9599/154749714-44579899-5dc7-4e5f-ad4f-dc59dac48979.png"> So now I have a regex which can extract out the dot-encoded table name AND spot if there is an optional `.format` at the end: <img width="1090" alt="image" src="https://user-images.githubusercontent.com/9599/156900484-7912073f-28aa-4301-86e2-e5cbe625e1d5.png"> If I end up using this in Datasette it's going to need VERY comprehensive unit tests and inline documentation.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047
1045075207	https://github.com/simonw/datasette/issues/1439#issuecomment-1045075207	https://api.github.com/repos/simonw/datasette/issues/1439	IC_kwDOBm6k_c4-SpUH	simonw 9599	2022-02-18T19:39:35Z	2022-02-18T19:40:13Z	OWNER	> And if for some horific reason you had a table with the name `/db/table-.csv.csv` (so `/db/` was the first part of the actual table name in SQLite) the URLs would look like this: > > * `/db/%2Fdb%2Ftable---.csv-.csv` - the HTML version > * `/db/%2Fdb%2Ftable---.csv-.csv.csv` - the CSV version > * `/db/%2Fdb%2Ftable---.csv-.csv.json` - the JSON version Here's what those look like with the updated version of `dot_dash_encode()` that also encodes `/` as `-/`: - `/db/-/db-/table---.csv-.csv` - HTML - `/db/-/db-/table---.csv-.csv.csv` - CSV - `/db/-/db-/table---.csv-.csv.json` - JSON <img width="1050" alt="image" src="https://user-images.githubusercontent.com/9599/154750631-a8a23c62-3dfc-43e4-8026-4d117dc4bf8d.png">	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047
1045077590	https://github.com/simonw/datasette/issues/1439#issuecomment-1045077590	https://api.github.com/repos/simonw/datasette/issues/1439	IC_kwDOBm6k_c4-Sp5W	simonw 9599	2022-02-18T19:41:37Z	2022-02-18T19:42:41Z	OWNER	Ugh, one disadvantage I just spotted with this: Datasette already has a `/-/versions.json` convention where "system" URLs are namespaced under `/-/` - but that could be confused under this new scheme with the `-/` escaping sequence. And I've thought about adding `/db/-/special` and `/db/table/-/special` URLs in the past too. Maybe change this system to use `.` as the escaping character instead of `-`?	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047
1045081042	https://github.com/simonw/datasette/issues/1439#issuecomment-1045081042	https://api.github.com/repos/simonw/datasette/issues/1439	IC_kwDOBm6k_c4-SqvS	simonw 9599	2022-02-18T19:44:12Z	2022-02-18T19:51:34Z	OWNER	```python def dot_encode(s): return s.replace(".", "..").replace("/", "./") def dot_decode(s): return s.replace("./", "/").replace("..", ".") ``` No need for hyphen encoding in this variant at all, which simplifies things a bit. (Update: this is flawed, see https://github.com/simonw/datasette/issues/1439#issuecomment-1045086033)	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047
1045082891	https://github.com/simonw/datasette/issues/1439#issuecomment-1045082891	https://api.github.com/repos/simonw/datasette/issues/1439	IC_kwDOBm6k_c4-SrML	simonw 9599	2022-02-18T19:45:32Z	2022-02-18T19:45:32Z	OWNER	```pycon >>> dot_encode("/db/table-.csv.csv") './db./table-..csv..csv' >>> dot_decode('./db./table-..csv..csv') '/db/table-.csv.csv' ``` I worry that web servers might treat `./` in a special way though.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047
1045086033	https://github.com/simonw/datasette/issues/1439#issuecomment-1045086033	https://api.github.com/repos/simonw/datasette/issues/1439	IC_kwDOBm6k_c4-Sr9R	simonw 9599	2022-02-18T19:47:43Z	2022-02-18T19:51:11Z	OWNER	- https://datasette.io/-/asgi-scope/db/./db./table-..csv..csv - https://til.simonwillison.net/-/asgi-scope/db/./db./table-..csv..csv Do both of those survive the round-trip to populate `raw_path` correctly? No! In both cases the `/./` bit goes missing. It looks like this might even be a client issue - `curl` shows me this: ``` ~ % curl -vv -i 'https://datasette.io/-/asgi-scope/db/./db./table-..csv..csv' * Trying 216.239.32.21:443... * Connected to datasette.io (216.239.32.21) port 443 (#0) * ALPN, offering http/1.1 * TLS 1.2 connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 * Server certificate: datasette.io * Server certificate: R3 * Server certificate: ISRG Root X1 > GET /-/asgi-scope/db/db./table-..csv..csv HTTP/1.1 ``` So `curl` decided to turn `/-/asgi-scope/db/./db./table` into `/-/asgi-scope/db/db./table` before even sending the request.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047
1045095348	https://github.com/simonw/datasette/issues/1439#issuecomment-1045095348	https://api.github.com/repos/simonw/datasette/issues/1439	IC_kwDOBm6k_c4-SuO0	simonw 9599	2022-02-18T19:53:48Z	2022-02-18T19:53:48Z	OWNER	> Ugh, one disadvantage I just spotted with this: Datasette already has a `/-/versions.json` convention where "system" URLs are namespaced under `/-/` - but that could be confused under this new scheme with the `-/` escaping sequence. > > And I've thought about adding `/db/-/special` and `/db/table/-/special` URLs in the past too. I don't think this matters. The new regex does indeed capture that kind of page: <img width="1052" alt="image" src="https://user-images.githubusercontent.com/9599/154752309-e1787755-3bdb-47c2-867c-7ac5fe65664d.png"> But Datasette goes through configured route regular expressions in order - so I can have the regex that captures `/db/-/special` routes listed before the one that captures tables and formats.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047
1045099290	https://github.com/simonw/datasette/issues/1439#issuecomment-1045099290	https://api.github.com/repos/simonw/datasette/issues/1439	IC_kwDOBm6k_c4-SvMa	simonw 9599	2022-02-18T19:56:18Z	2022-02-18T19:56:30Z	OWNER	> ```python > def dash_encode(s): > return s.replace("-", "--").replace(".", "-.").replace("/", "-/") > > def dash_decode(s): > return s.replace("-/", "/").replace("-.", ".").replace("--", "-") > ``` I think dash-encoding (new name for this) is the right way forward here.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047
1045108611	https://github.com/simonw/datasette/issues/1439#issuecomment-1045108611	https://api.github.com/repos/simonw/datasette/issues/1439	IC_kwDOBm6k_c4-SxeD	simonw 9599	2022-02-18T20:02:19Z	2022-02-18T20:08:34Z	OWNER	One other potential variant: ```python def dash_encode(s): return s.replace("-", "-dash-").replace(".", "-dot-").replace("/", "-slash-") def dash_decode(s): return s.replace("-slash-", "/").replace("-dot-", ".").replace("-dash-", "-") ``` Except this has bugs - it doesn't round-trip safely, because it can get confused about things like `-dash-slash-` in terms of is that a `-dash-` or a `-slash-`? ```pycon >>> dash_encode("/db/table-.csv.csv") '-slash-db-slash-table-dash--dot-csv-dot-csv' >>> dash_decode('-slash-db-slash-table-dash--dot-csv-dot-csv') '/db/table-.csv.csv' >>> dash_encode('-slash-db-slash-table-dash--dot-csv-dot-csv') '-dash-slash-dash-db-dash-slash-dash-table-dash-dash-dash--dash-dot-dash-csv-dash-dot-dash-csv' >>> dash_decode('-dash-slash-dash-db-dash-slash-dash-table-dash-dash-dash--dash-dot-dash-csv-dash-dot-dash-csv') '-dash/dash-db-dash/dash-table-dash--dash.dash-csv-dash.dash-csv' ```	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047
1045111309	https://github.com/simonw/datasette/issues/1439#issuecomment-1045111309	https://api.github.com/repos/simonw/datasette/issues/1439	IC_kwDOBm6k_c4-SyIN	simonw 9599	2022-02-18T20:04:24Z	2022-02-18T20:05:40Z	OWNER	This made me worry that my current `dash_decode()` implementation had unknown round-trip bugs, but thankfully this works OK: ```pycon >>> dash_encode("/db/table-.csv.csv") '-/db-/table---.csv-.csv' >>> dash_encode('-/db-/table---.csv-.csv') '---/db---/table-------.csv---.csv' >>> dash_decode('---/db---/table-------.csv---.csv') '-/db-/table---.csv-.csv' >>> dash_decode('-/db-/table---.csv-.csv') '/db/table-.csv.csv' ``` The regex still works against that double-encoded example too: <img width="1032" alt="image" src="https://user-images.githubusercontent.com/9599/154753916-b7d2159e-4284-4c92-ae61-110671fa320e.png">	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047
1045117304	https://github.com/simonw/datasette/issues/1439#issuecomment-1045117304	https://api.github.com/repos/simonw/datasette/issues/1439	IC_kwDOBm6k_c4-Szl4	simonw 9599	2022-02-18T20:09:22Z	2022-02-18T20:09:22Z	OWNER	Adopting this could result in supporting database files with surprising characters in their filename too.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047
1045131086	https://github.com/simonw/datasette/issues/1439#issuecomment-1045131086	https://api.github.com/repos/simonw/datasette/issues/1439	IC_kwDOBm6k_c4-S29O	simonw 9599	2022-02-18T20:22:13Z	2022-02-18T20:22:47Z	OWNER	Should it encode `%` symbols too, since they have a special meaning in URLs and we can't guarantee that every single web server / proxy out there will round-trip them safely using percentage encoding? If so, would need to pick a different encoding character for them. Maybe `%` becomes `-p` - and in that case `/` could become `-s` too. Is it worth expanding dash-encoding outside of just `/` and `-` and `.` though? Not sure.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047
1045134050	https://github.com/simonw/datasette/issues/1439#issuecomment-1045134050	https://api.github.com/repos/simonw/datasette/issues/1439	IC_kwDOBm6k_c4-S3ri	simonw 9599	2022-02-18T20:25:04Z	2022-02-18T20:25:04Z	OWNER	Here's a useful modern spec for how existing URL percentage encoding is supposed to work: https://url.spec.whatwg.org/#percent-encoded-bytes	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047
1045269544	https://github.com/simonw/datasette/issues/1439#issuecomment-1045269544	https://api.github.com/repos/simonw/datasette/issues/1439	IC_kwDOBm6k_c4-TYwo	simonw 9599	2022-02-18T22:19:29Z	2022-02-18T22:19:29Z	OWNER	Note that I've ruled out using `Accept: application/json` to return JSON because it turns out Cloudflare and potentially other CDNs ignore the `Vary: Accept` header entirely: - https://github.com/simonw/datasette/issues/1534	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047
1049114724	https://github.com/simonw/datasette/issues/1439#issuecomment-1049114724	https://api.github.com/repos/simonw/datasette/issues/1439	IC_kwDOBm6k_c4-iDhk	simonw 9599	2022-02-23T19:04:40Z	2022-02-23T19:04:40Z	OWNER	I'm going to try dash encoding for table names (and row IDs) in a branch and see how I like it.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047
1049124390	https://github.com/simonw/datasette/issues/1439#issuecomment-1049124390	https://api.github.com/repos/simonw/datasette/issues/1439	IC_kwDOBm6k_c4-iF4m	simonw 9599	2022-02-23T19:15:00Z	2022-02-23T19:15:00Z	OWNER	I'll start by modifying this function: https://github.com/simonw/datasette/blob/458f03ad3a454d271f47a643f4530bd8b60ddb76/datasette/utils/__init__.py#L732-L749 Later I want to move this to the routing layer to split out `format` automatically, as seen in the regexes here: https://github.com/simonw/datasette/issues/1439#issuecomment-1045069481	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047
1049126151	https://github.com/simonw/datasette/issues/1439#issuecomment-1049126151	https://api.github.com/repos/simonw/datasette/issues/1439	IC_kwDOBm6k_c4-iGUH	simonw 9599	2022-02-23T19:17:01Z	2022-02-23T19:17:01Z	OWNER	Actually the relevant code looks to be: https://github.com/simonw/datasette/blob/7d24fd405f3c60e4c852c5d746c91aa2ba23cf5b/datasette/views/base.py#L481-L498	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047
1053973425	https://github.com/simonw/datasette/issues/1439#issuecomment-1053973425	https://api.github.com/repos/simonw/datasette/issues/1439	IC_kwDOBm6k_c4-0lux	simonw 9599	2022-02-28T07:40:12Z	2022-02-28T07:40:12Z	OWNER	If I make this change it will break existing links to one of the oldest Datasette demos: http://fivethirtyeight.datasettes.com/fivethirtyeight/avengers%2Favengers A plugin that fixes those by redirecting them on 404 would be neat.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047
1059802318	https://github.com/simonw/datasette/issues/1439#issuecomment-1059802318	https://api.github.com/repos/simonw/datasette/issues/1439	IC_kwDOBm6k_c4_K0zO	simonw 9599	2022-03-05T17:34:33Z	2022-03-05T17:34:33Z	OWNER	Wrote documentation: <img width="741" alt="Dash encoding. Datasette uses a custom encoding scheme in some places, called dash encoding. This is primarily used for table names and row primary keys, to avoid any confusion between / characters in those values and the Datasette URL that references them. Dash encoding applies the following rules, in order: 1. All single - characters are replaced by -- 2. . characters are replaced by -. 3. / characters are replaced by ./ These rules are applied in reverse order to decode a dash encoded string." src="https://user-images.githubusercontent.com/9599/156893903-5723f60e-e054-4365-84bc-f3084d11183d.png">	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047
1059822151	https://github.com/simonw/datasette/issues/1439#issuecomment-1059822151	https://api.github.com/repos/simonw/datasette/issues/1439	IC_kwDOBm6k_c4_K5pH	simonw 9599	2022-03-05T19:48:35Z	2022-03-05T19:48:35Z	OWNER	Those new docs: https://github.com/simonw/datasette/blob/d1cb73180b4b5a07538380db76298618a5fc46b6/docs/internals.rst#dash-encoding	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047
1059822391	https://github.com/simonw/datasette/issues/1439#issuecomment-1059822391	https://api.github.com/repos/simonw/datasette/issues/1439	IC_kwDOBm6k_c4_K5s3	simonw 9599	2022-03-05T19:50:12Z	2022-03-05T19:50:12Z	OWNER	I'm going to move this work to a PR.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047
1059836599	https://github.com/simonw/datasette/issues/1439#issuecomment-1059836599	https://api.github.com/repos/simonw/datasette/issues/1439	IC_kwDOBm6k_c4_K9K3	simonw 9599	2022-03-05T21:52:10Z	2022-03-05T21:52:10Z	OWNER	Blogged about this here: https://simonwillison.net/2022/Mar/5/dash-encoding/	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047
1059850369	https://github.com/simonw/datasette/issues/1439#issuecomment-1059850369	https://api.github.com/repos/simonw/datasette/issues/1439	IC_kwDOBm6k_c4_LAiB	simonw 9599	2022-03-05T23:28:56Z	2022-03-05T23:28:56Z	OWNER	Lots of great conversations about the dash encoding implementation on Twitter: https://twitter.com/simonw/status/1500228316309061633 @dracos helped me figure out a simpler regex: https://twitter.com/dracos/status/1500236433809973248 `^/(?P<database>[^/]+)/(?P<table>[^\/\-\.]\|\-/\|\-\.\|\-\-)(?P<format>\.\w+)?$` ![image](https://user-images.githubusercontent.com/9599/156903088-c01933ae-4713-4e91-8d71-affebf70b945.png)	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047
1059851259	https://github.com/simonw/datasette/issues/1439#issuecomment-1059851259	https://api.github.com/repos/simonw/datasette/issues/1439	IC_kwDOBm6k_c4_LAv7	simonw 9599	2022-03-05T23:35:47Z	2022-03-05T23:35:59Z	OWNER	This [comment from glyph](https://twitter.com/glyph/status/1500244937312329730) got me thinking: > Have you considered replacing % with some other character and then using percent-encoding? What happens if a table name includes a `%` character and that ends up getting mangled by a misbehaving proxy? I should consider `%` in the escaping system too. And maybe go with that suggestion of using percent-encoding directly but with a different character.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047
1059853526	https://github.com/simonw/datasette/issues/1439#issuecomment-1059853526	https://api.github.com/repos/simonw/datasette/issues/1439	IC_kwDOBm6k_c4_LBTW	simonw 9599	2022-03-05T23:49:59Z	2022-03-05T23:49:59Z	OWNER	I want to try regular percentage encoding, except that it also encodes both the `-` and the `.` characters, AND it uses `-` instead of `%` as the encoding character. Should check what it does with emoji too.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047
1059854864	https://github.com/simonw/datasette/issues/1439#issuecomment-1059854864	https://api.github.com/repos/simonw/datasette/issues/1439	IC_kwDOBm6k_c4_LBoQ	simonw 9599	2022-03-05T23:59:05Z	2022-03-05T23:59:05Z	OWNER	OK, for that percentage thing: the Python core implementation of URL percentage escaping deliberately ignores two of the characters we want to escape: `.` and `-`: https://github.com/python/cpython/blob/6927632492cbad86a250aa006c1847e03b03e70b/Lib/urllib/parse.py#L780-L783 ```python _ALWAYS_SAFE = frozenset(b'ABCDEFGHIJKLMNOPQRSTUVWXYZ' b'abcdefghijklmnopqrstuvwxyz' b'0123456789' b'_.-~') ``` It also defaults to skipping `/` (passed as a `safe=` parameter to various things). I'm going to try borrowing and modifying the core of the Python implementation: https://github.com/python/cpython/blob/6927632492cbad86a250aa006c1847e03b03e70b/Lib/urllib/parse.py#L795-L814 ```python class _Quoter(dict): """A mapping from bytes numbers (in range(0,256)) to strings. String values are percent-encoded byte values, unless the key < 128, and in either of the specified safe set, or the always safe set. """ # Keeps a cache internally, via __missing__, for efficiency (lookups # of cached keys don't call Python code at all). def __init__(self, safe): """safe: bytes object.""" self.safe = _ALWAYS_SAFE.union(safe) def __repr__(self): return f"<Quoter {dict(self)!r}>" def __missing__(self, b): # Handle a cache miss. Store quoted string in cache and return. res = chr(b) if b in self.safe else '%{:02X}'.format(b) self[b] = res return res ```	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047
1059855418	https://github.com/simonw/datasette/issues/1439#issuecomment-1059855418	https://api.github.com/repos/simonw/datasette/issues/1439	IC_kwDOBm6k_c4_LBw6	simonw 9599	2022-03-06T00:00:53Z	2022-03-06T00:04:18Z	OWNER	```python _ESCAPE_SAFE = frozenset( b'ABCDEFGHIJKLMNOPQRSTUVWXYZ' b'abcdefghijklmnopqrstuvwxyz' b'0123456789_' ) # I removed b'.-~') class Quoter(dict): # Keeps a cache internally, via __missing__ def __missing__(self, b): # Handle a cache miss. Store quoted string in cache and return. res = chr(b) if b in _ESCAPE_SAFE else '-{:02X}'.format(b) self[b] = res return res quoter = Quoter().__getitem__ ''.join([quoter(char) for char in b'foo/bar.csv']) # 'foo-2Fbar-2Ecsv' ```	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047
1059863997	https://github.com/simonw/datasette/issues/1439#issuecomment-1059863997	https://api.github.com/repos/simonw/datasette/issues/1439	IC_kwDOBm6k_c4_LD29	karlcow 505230	2022-03-06T00:57:57Z	2022-03-06T00:57:57Z	NONE	Probably too late… but I have just seen this because http://simonwillison.net/2022/Mar/5/dash-encoding/#atom-everything And it reminded me of comma tools at W3C. http://www.w3.org/,tools Example, the text version of W3C homepage https://www.w3.org/,text > The challenge comes down to telling the difference between the following: > > * `/db/table` - an HTML table page `/db/table` > * `/db/table.csv` - the CSV version of `/db/table` `/db/table,csv` > * `/db/table.csv` - no this one is actually a database table called `table.csv` `/db/table.csv` > * `/db/table.csv.csv` - the CSV version of `/db/table.csv` `/db/table.csv,csv` > * `/db/table.csv.csv.csv` and so on... `/db/table.csv.csv,csv` I haven't checked all the cases in the thread.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047
1059864154	https://github.com/simonw/datasette/issues/1439#issuecomment-1059864154	https://api.github.com/repos/simonw/datasette/issues/1439	IC_kwDOBm6k_c4_LD5a	simonw 9599	2022-03-06T00:59:04Z	2022-03-06T00:59:04Z	OWNER	Needs more testing, but this seems to work for decoding the percent-escaped-with-dashes format: `urllib.parse.unquote(s.replace('-', '%'))`	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047
1059903309	https://github.com/simonw/datasette/issues/1439#issuecomment-1059903309	https://api.github.com/repos/simonw/datasette/issues/1439	IC_kwDOBm6k_c4_LNdN	simonw 9599	2022-03-06T06:17:51Z	2022-03-06T06:17:51Z	OWNER	Suggestion from a conversation with Seth Michael Larson: it would be neat if plugins could easily integrate with whatever scheme this ends up using, maybe with the `/db/table/-/plugin-name` standardized pattern or similar. Making it easy for plugins to do the right, consistent thing is a good idea.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047
1060044007	https://github.com/simonw/datasette/issues/1439#issuecomment-1060044007	https://api.github.com/repos/simonw/datasette/issues/1439	IC_kwDOBm6k_c4_Lvzn	simonw 9599	2022-03-06T21:38:15Z	2022-03-06T21:38:15Z	OWNER	Test: https://github.com/simonw/datasette/blob/d2e3fe3facf0ed0abf8b00cd54463af90dd6904d/tests/test_utils.py#L651-L666 One big advantage to this scheme is that redirecting old links to `%2F` pages (e.g. https://fivethirtyeight.datasettes.com/fivethirtyeight/twitter-ratio%2Fsenators) is easy - if you see a `%` in the `raw_path`, redirect to that page with the `%` replaced by `-`.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047
1060870237	https://github.com/simonw/datasette/issues/1439#issuecomment-1060870237	https://api.github.com/repos/simonw/datasette/issues/1439	IC_kwDOBm6k_c4_O5hd	simonw 9599	2022-03-07T16:19:22Z	2022-03-07T16:19:22Z	OWNER	I didn't need to do any of the fancy regular expression routing stuff after all, since the new dash encoding format avoids using `/` so a simple `[^/]+` can capture the correct segments from the URL.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047
1065987808	https://github.com/simonw/datasette/issues/1439#issuecomment-1065987808	https://api.github.com/repos/simonw/datasette/issues/1439	IC_kwDOBm6k_c4_ia7g	simonw 9599	2022-03-13T00:02:32Z	2022-03-13T00:02:32Z	OWNER	OK, this has broken a lot more than I expected it would. Turns out `-` is a very common character in existing Datasette database names! https://datasette.io/-/databases for example has two: ```json [ { "name": "docs-index", "path": "docs-index.db", "size": 1007616, "is_mutable": false, "is_memory": false, "hash": "0ac6c3de2762fcd174fd249fed8a8fa6046ea345173d22c2766186bf336462b2" }, { "name": "dogsheep-index", "path": "dogsheep-index.db", "size": 5496832, "is_mutable": false, "is_memory": false, "hash": "d1ea238d204e5b9ae783c86e4af5bcdf21267c1f391de3e468d9665494ee012a" } ] ```	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047
1065988403	https://github.com/simonw/datasette/issues/1439#issuecomment-1065988403	https://api.github.com/repos/simonw/datasette/issues/1439	IC_kwDOBm6k_c4_ibEz	simonw 9599	2022-03-13T00:06:38Z	2022-03-13T00:07:19Z	OWNER	If I want to reserve `-` as a character that CAN be used in URLs, the only remaining character that might make sense for escape sequences is `~` - based on this last line of characters that are escape from percentage encoding: ```python _ALWAYS_SAFE = frozenset(b'ABCDEFGHIJKLMNOPQRSTUVWXYZ' b'abcdefghijklmnopqrstuvwxyz' b'0123456789' b'_.-~') ``` So I'd add both `-` and `_` back to the safe list, but use `~` to escape `.` and `/` and suchlike.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047
1068461449	https://github.com/simonw/datasette/issues/1439#issuecomment-1068461449	https://api.github.com/repos/simonw/datasette/issues/1439	IC_kwDOBm6k_c4_r22J	simonw 9599	2022-03-15T20:51:26Z	2022-03-15T20:51:26Z	OWNER	I'm happy with this now that I've landed Tilde encoding in #1657.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Rethink how .ext formats (v.s. ?_format=) works before 1.0 973139047

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);