{"html_url": "https://github.com/simonw/datasette/issues/1293#issuecomment-899915829", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1293", "id": 899915829, "node_id": "IC_kwDOBm6k_c41o6A1", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-08-17T01:02:35Z", "updated_at": "2021-08-17T01:02:35Z", "author_association": "OWNER", "body": "New approach: this time I'm building a simplified executor for the bytecode operations themselves.\r\n```python\r\ndef execute_operations(operations, max_iterations = 100, trace=None):\r\n trace = trace or (lambda *args: None)\r\n registers: Dict[int, Any] = {}\r\n cursors: Dict[int, Tuple[str, Dict]] = {}\r\n instruction_pointer = 0\r\n iterations = 0\r\n result_row = None\r\n while True:\r\n iterations += 1\r\n if iterations > max_iterations:\r\n break\r\n operation = operations[instruction_pointer]\r\n trace(instruction_pointer, dict(operation))\r\n opcode = operation[\"opcode\"]\r\n if opcode == \"Init\":\r\n if operation[\"p2\"] != 0:\r\n instruction_pointer = operation[\"p2\"]\r\n continue\r\n else:\r\n instruction_pointer += 1\r\n continue\r\n elif opcode == \"Goto\":\r\n instruction_pointer = operation[\"p2\"]\r\n continue\r\n elif opcode == \"Halt\":\r\n break\r\n elif opcode == \"OpenRead\":\r\n cursors[operation[\"p1\"]] = (\"database_table\", {\r\n \"rootpage\": operation[\"p2\"],\r\n \"connection\": operation[\"p3\"],\r\n })\r\n elif opcode == \"OpenEphemeral\":\r\n cursors[operation[\"p1\"]] = (\"ephemeral\", {\r\n \"num_columns\": operation[\"p2\"],\r\n \"index_keys\": [],\r\n })\r\n elif opcode == \"MakeRecord\":\r\n registers[operation[\"p3\"]] = (\"MakeRecord\", {\r\n \"registers\": list(range(operation[\"p1\"] + operation[\"p2\"]))\r\n })\r\n elif opcode == \"IdxInsert\":\r\n record = registers[operation[\"p2\"]]\r\n cursors[operation[\"p1\"]][1][\"index_keys\"].append(record)\r\n elif opcode == \"Rowid\":\r\n registers[operation[\"p2\"]] = (\"rowid\", {\r\n \"table\": operation[\"p1\"]\r\n })\r\n elif opcode == \"Sequence\":\r\n registers[operation[\"p2\"]] = (\"sequence\", {\r\n \"next_from_cursor\": operation[\"p1\"]\r\n })\r\n elif opcode == \"Column\":\r\n registers[operation[\"p3\"]] = (\"column\", {\r\n \"cursor\": operation[\"p1\"],\r\n \"column_offset\": operation[\"p2\"]\r\n })\r\n elif opcode == \"ResultRow\":\r\n p1 = operation[\"p1\"]\r\n p2 = operation[\"p2\"]\r\n trace(\"ResultRow: \", list(range(p1, p1 + p2)), registers)\r\n result_row = [registers.get(i) for i in range(p1, p1 + p2)]\r\n elif opcode == \"Integer\":\r\n registers[operation[\"p2\"]] = (\"Integer\", operation[\"p1\"])\r\n elif opcode == \"String8\":\r\n registers[operation[\"p2\"]] = (\"String\", operation[\"p4\"])\r\n instruction_pointer += 1\r\n return {\"registers\": registers, \"cursors\": cursors, \"result_row\": result_row}\r\n```\r\nResults are promising!\r\n```\r\nexecute_operations(db.execute(\"explain select 'hello', 55, rowid, * from searchable\").fetchall())\r\n\r\n{'registers': {1: ('String', 'hello'),\r\n 2: ('Integer', 55),\r\n 3: ('rowid', {'table': 0}),\r\n 4: ('rowid', {'table': 0}),\r\n 5: ('column', {'cursor': 0, 'column_offset': 1}),\r\n 6: ('column', {'cursor': 0, 'column_offset': 2}),\r\n 7: ('column', {'cursor': 0, 'column_offset': 3})},\r\n 'cursors': {0: ('database_table', {'rootpage': 32, 'connection': 0})},\r\n 'result_row': [('String', 'hello'),\r\n ('Integer', 55),\r\n ('rowid', {'table': 0}),\r\n ('rowid', {'table': 0}),\r\n ('column', {'cursor': 0, 'column_offset': 1}),\r\n ('column', {'cursor': 0, 'column_offset': 2}),\r\n ('column', {'cursor': 0, 'column_offset': 3})]}\r\n```\r\nHere's what happens with a union across three tables:\r\n```\r\nexecute_operations(db.execute(f\"\"\"\r\nexplain select data as content from binary_data\r\nunion\r\nselect pk as content from complex_foreign_keys\r\nunion\r\nselect name as content from facet_cities\r\n\"\"\"}).fetchall())\r\n\r\n{'registers': {1: ('column', {'cursor': 4, 'column_offset': 0}),\r\n 2: ('MakeRecord', {'registers': [0, 1, 2, 3]}),\r\n 3: ('column', {'cursor': 0, 'column_offset': 1}),\r\n 4: ('column', {'cursor': 3, 'column_offset': 0})},\r\n 'cursors': {3: ('ephemeral',\r\n {'num_columns': 1,\r\n 'index_keys': [('MakeRecord', {'registers': [0, 1]}),\r\n ('MakeRecord', {'registers': [0, 1]}),\r\n ('MakeRecord', {'registers': [0, 1, 2, 3]})]}),\r\n 2: ('database_table', {'rootpage': 44, 'connection': 0}),\r\n 4: ('database_table', {'rootpage': 24, 'connection': 0}),\r\n 0: ('database_table', {'rootpage': 42, 'connection': 0})},\r\n 'result_row': [('column', {'cursor': 3, 'column_offset': 0})]}\r\n```\r\nNote how the result_row refers to cursor 3, which is an ephemeral table which had three different sets of `MakeRecord` index keys assigned to it - indicating that the output column is NOT from the same underlying table source.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 849978964, "label": "Show column metadata plus links for foreign keys on arbitrary query results"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1438#issuecomment-900500824", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1438", "id": 900500824, "node_id": "IC_kwDOBm6k_c41rI1Y", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-08-17T17:38:16Z", "updated_at": "2021-08-17T17:38:16Z", "author_association": "OWNER", "body": "Relevant template code: https://github.com/simonw/datasette/blob/adb5b70de5cec3c3dd37184defe606a082c232cf/datasette/templates/query.html#L71\r\n\r\n`renderers` comes from here: https://github.com/simonw/datasette/blob/2883098770fc66e50183b2b231edbde20848d4d6/datasette/views/base.py#L593-L608", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 972918533, "label": "Query page .csv and .json links are not correctly URL-encoded on Vercel under unknown specific conditions"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1438#issuecomment-900502364", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1438", "id": 900502364, "node_id": "IC_kwDOBm6k_c41rJNc", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-08-17T17:40:41Z", "updated_at": "2021-08-17T17:40:41Z", "author_association": "OWNER", "body": "Bug is likely in `path_with_format` itself: https://github.com/simonw/datasette/blob/adb5b70de5cec3c3dd37184defe606a082c232cf/datasette/utils/__init__.py#L710-L729", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 972918533, "label": "Query page .csv and .json links are not correctly URL-encoded on Vercel under unknown specific conditions"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1438#issuecomment-900513267", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1438", "id": 900513267, "node_id": "IC_kwDOBm6k_c41rL3z", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-08-17T17:57:05Z", "updated_at": "2021-08-17T17:57:05Z", "author_association": "OWNER", "body": "I'm having trouble replicating this bug outside of Vercel. Against Cloud Run: view-source:https://latest.datasette.io/fixtures?sql=select+*+from+searchable+where+text1+like+%22%25cat%25%22\r\n\r\nThe HTML here is:\r\n\r\n```html\r\n

This data as\r\n json, \r\n ...\r\n CSV\r\n

\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 972918533, "label": "Query page .csv and .json links are not correctly URL-encoded on Vercel under unknown specific conditions"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1438#issuecomment-900516826", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1438", "id": 900516826, "node_id": "IC_kwDOBm6k_c41rMva", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-08-17T18:02:27Z", "updated_at": "2021-08-17T18:02:27Z", "author_association": "OWNER", "body": "The key difference I can spot between Vercel and Cloud Run is that `+` in a query string gets converted to `%20` by Vercel before it gets to my app, but does not for Cloud Run:\r\n```\r\n# Vercel\r\n~ % curl -s 'https://til.simonwillison.net/-/asgi-scope?sql=select+*+from+tunes+where+name+like+%22%25wise+maid%25%22%0D%0A' | rg 'query_string' -C 2\r\n 'method': 'GET',\r\n 'path': '/-/asgi-scope',\r\n 'query_string': b'sql=select%20*%20from%20tunes%20where%20name%20like%20%22%25'\r\n b'wise%20maid%25%22%0D%0A',\r\n 'raw_path': b'/-/asgi-scope',\r\n\r\n# Cloud Run\r\n~ % curl -s 'https://latest-with-plugins.datasette.io/-/asgi-scope?sql=select+*+from+tunes+where+name+like+%22%25wise+maid%25%22%0D%0A' | rg 'query_string' -C 2\r\n 'method': 'GET',\r\n 'path': '/-/asgi-scope',\r\n 'query_string': b'sql=select+*+from+tunes+where+name+like+%22%25wise+maid%25%2'\r\n b'2%0D%0A',\r\n 'raw_path': b'/-/asgi-scope',\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 972918533, "label": "Query page .csv and .json links are not correctly URL-encoded on Vercel under unknown specific conditions"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1438#issuecomment-900518343", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1438", "id": 900518343, "node_id": "IC_kwDOBm6k_c41rNHH", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-08-17T18:04:42Z", "updated_at": "2021-08-17T18:04:42Z", "author_association": "OWNER", "body": "Here's how `request.query_string` works: https://github.com/simonw/datasette/blob/adb5b70de5cec3c3dd37184defe606a082c232cf/datasette/utils/asgi.py#L86-L88", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 972918533, "label": "Query page .csv and .json links are not correctly URL-encoded on Vercel under unknown specific conditions"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1438#issuecomment-900681413", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1438", "id": 900681413, "node_id": "IC_kwDOBm6k_c41r07F", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-08-17T22:47:44Z", "updated_at": "2021-08-17T22:47:44Z", "author_association": "OWNER", "body": "I deployed another copy of `fixtures.db` on Vercel at https://til.simonwillison.net/fixtures so I can compare it with `fixtures.db` on Cloud Run at https://latest.datasette.io/fixtures", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 972918533, "label": "Query page .csv and .json links are not correctly URL-encoded on Vercel under unknown specific conditions"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1438#issuecomment-900690998", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1438", "id": 900690998, "node_id": "IC_kwDOBm6k_c41r3Q2", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-08-17T23:11:16Z", "updated_at": "2021-08-17T23:12:25Z", "author_association": "OWNER", "body": "I have completely failed to replicate this initial bug - but it's still there on the `thesession.vercel.app` deployment (even though my own deployments to Vercel do not exhibit it). Here's a one-liner to replicate it against that deployment:\r\n\r\n`curl -s 'https://thesession.vercel.app/thesession?sql=select+*+from+tunes+where+name+like+%22%25wise+maid%25%22' | rg '.csv'`\r\n\r\nWhit outputs this:\r\n\r\n`

This data as json, CSV

`\r\n\r\nIt looks like, rather than being URL-encoded, the original query string is somehow making it through to Jinja and then being auto-escaped there.\r\n\r\nThe weird thing is that the equivalent query executed against my `til.simonwillison.net` Vercel instance does this:\r\n\r\n`curl -s 'https://til.simonwillison.net/fixtures?sql=select+*+from+searchable+where+text1+like+%22%25a%25%22' | rg '.csv'`\r\n\r\n`

This data as json, CSV

`", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 972918533, "label": "Query page .csv and .json links are not correctly URL-encoded on Vercel under unknown specific conditions"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-900699670", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 900699670, "node_id": "IC_kwDOBm6k_c41r5YW", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-08-17T23:34:23Z", "updated_at": "2021-08-17T23:34:23Z", "author_association": "OWNER", "body": "The challenge comes down to telling the difference between the following:\r\n\r\n- `/db/table` - an HTML table page\r\n- `/db/table.csv` - the CSV version of `/db/table`\r\n- `/db/table.csv` - no this one is actually a database table called `table.csv`\r\n- `/db/table.csv.csv` - the CSV version of `/db/table.csv`\r\n- `/db/table.csv.csv.csv` and so on...", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-900705226", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 900705226, "node_id": "IC_kwDOBm6k_c41r6vK", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-08-17T23:50:32Z", "updated_at": "2021-08-17T23:50:47Z", "author_association": "OWNER", "body": "An alternative solution would be to use some form of escaping for the characters that form the name of the table.\r\n\r\nThe obvious way to do this would be URL-encoding - but it doesn't hold for `.` characters. The hex for that is `%2E` but watch what happens with that in a URL:\r\n\r\n```\r\n# Against Cloud Run:\r\ncurl -s 'https://datasette.io/-/asgi-scope/foo/bar%2Fbaz%2E' | rg path\r\n 'path': '/-/asgi-scope/foo/bar/baz.',\r\n 'raw_path': b'/-/asgi-scope/foo/bar%2Fbaz.',\r\n 'root_path': '',\r\n# Against Vercel:\r\ncurl -s 'https://til.simonwillison.net/-/asgi-scope/foo/bar%2Fbaz%2E' | rg path\r\n 'path': '/-/asgi-scope/foo/bar%2Fbaz%2E',\r\n 'raw_path': b'/-/asgi-scope/foo/bar%2Fbaz%2E',\r\n 'root_path': '',\r\n```\r\nSurprisingly in this case Vercel DOES keep it intact, but Cloud Run does not.\r\n\r\nIt's still no good though: I need a solution that works on Vercel, Cloud Run and every other potential hosting provider too.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null}