{"html_url": "https://github.com/simonw/datasette/issues/1518#issuecomment-999831967", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1518", "id": 999831967, "node_id": "IC_kwDOBm6k_c47mDmf", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-12-22T20:04:47Z", "updated_at": "2021-12-22T20:10:11Z", "author_association": "OWNER", "body": "I think I might be able to clean up a lot of the stuff in here using the `render_cell` plugin hook: https://github.com/simonw/datasette/blob/6b1384b2f529134998fb507e63307609a5b7f5c0/datasette/views/table.py#L87-L89\r\n\r\nThe catch with that hook - https://docs.datasette.io/en/stable/plugin_hooks.html#render-cell-value-column-table-database-datasette - is that it gets called for every single cell. I don't want the overhead of looking up the foreign key relationships etc once for every value in a specific column.\r\n\r\nBut maybe I could extend the hook to include a shared cache that gets used for all of the cells in a specific table? Something like this:\r\n```python\r\nrender_cell(value, column, table, database, datasette, cache)\r\n```\r\n`cache` is a dictionary - and the same dictionary is passed to every call to that hook while rendering a specific page.\r\n\r\nIt's a bit of a gross hack though, and would it ever be useful for plugins outside of the default plugin in Datasette which does the foreign key stuff?\r\n\r\nIf I can think of one other potential application for this `cache` then I might implement it.\r\n\r\nNo, this optimization doesn't make sense: the most complex cell enrichment logic is the stuff that does a `select * from categories where id in (2, 5, 6)` query, using just the distinct set of IDs that are rendered on the current page. That's not going to fit in the `render_cell` hook no matter how hard I try to warp it into the right shape, because it needs full visibility of all of the results that are being rendered in order to collect those unique ID values.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1058072543, "label": "Complete refactor of TableView and table.html template"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1518#issuecomment-999837220", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1518", "id": 999837220, "node_id": "IC_kwDOBm6k_c47mE4k", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-12-22T20:15:04Z", "updated_at": "2021-12-22T20:15:04Z", "author_association": "OWNER", "body": "I think I can move this much higher up in the method, it's a bit confusing having it half way through: https://github.com/simonw/datasette/blob/6b1384b2f529134998fb507e63307609a5b7f5c0/datasette/views/table.py#L414-L436", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1058072543, "label": "Complete refactor of TableView and table.html template"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1518#issuecomment-999837569", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1518", "id": 999837569, "node_id": "IC_kwDOBm6k_c47mE-B", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-12-22T20:15:45Z", "updated_at": "2021-12-22T20:15:45Z", "author_association": "OWNER", "body": "Also the whole `special_args` v.s. `request.args` thing is pretty confusing, I think that might be an older code pattern back from when I was using Sanic.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1058072543, "label": "Complete refactor of TableView and table.html template"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1518#issuecomment-999850191", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1518", "id": 999850191, "node_id": "IC_kwDOBm6k_c47mIDP", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-12-22T20:29:38Z", "updated_at": "2021-12-22T20:29:38Z", "author_association": "OWNER", "body": "New short-term goal: get facets and suggested facets to execute in parallel with the main query. Generate a trace graph that proves that is happening using `datasette-pretty-traces`.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1058072543, "label": "Complete refactor of TableView and table.html template"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1518#issuecomment-999863269", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1518", "id": 999863269, "node_id": "IC_kwDOBm6k_c47mLPl", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-12-22T20:35:41Z", "updated_at": "2021-12-22T20:37:13Z", "author_association": "OWNER", "body": "It looks like the count has to be executed before facets can be, because the facet_class constructor needs that total count figure: https://github.com/simonw/datasette/blob/6b1384b2f529134998fb507e63307609a5b7f5c0/datasette/views/table.py#L660-L671\r\n\r\nIt's used in facet suggestion logic here: https://github.com/simonw/datasette/blob/ace86566b28280091b3844cf5fbecd20158e9004/datasette/facets.py#L172-L178", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1058072543, "label": "Complete refactor of TableView and table.html template"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1518#issuecomment-999870282", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1518", "id": 999870282, "node_id": "IC_kwDOBm6k_c47mM9K", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-12-22T20:45:56Z", "updated_at": "2021-12-22T20:46:08Z", "author_association": "OWNER", "body": "> New short-term goal: get facets and suggested facets to execute in parallel with the main query. Generate a trace graph that proves that is happening using `datasette-pretty-traces`.\r\n\r\nI wrote code to execute those in parallel using `asyncio.gather()` - which seems to work but causes the SQL run inside the parallel `async def` functions not to show up in the trace graph at all.\r\n\r\n```diff\r\ndiff --git a/datasette/views/table.py b/datasette/views/table.py\r\nindex 9808fd2..ec9db64 100644\r\n--- a/datasette/views/table.py\r\n+++ b/datasette/views/table.py\r\n@@ -1,3 +1,4 @@\r\n+import asyncio\r\n import urllib\r\n import itertools\r\n import json\r\n@@ -615,44 +616,37 @@ class TableView(RowTableShared):\r\n if request.args.get(\"_timelimit\"):\r\n extra_args[\"custom_time_limit\"] = int(request.args.get(\"_timelimit\"))\r\n \r\n- # Execute the main query!\r\n- results = await db.execute(sql, params, truncate=True, **extra_args)\r\n-\r\n- # Calculate the total count for this query\r\n- filtered_table_rows_count = None\r\n- if (\r\n- not db.is_mutable\r\n- and self.ds.inspect_data\r\n- and count_sql == f\"select count(*) from {table} \"\r\n- ):\r\n- # We can use a previously cached table row count\r\n- try:\r\n- filtered_table_rows_count = self.ds.inspect_data[database][\"tables\"][\r\n- table\r\n- ][\"count\"]\r\n- except KeyError:\r\n- pass\r\n-\r\n- # Otherwise run a select count(*) ...\r\n- if count_sql and filtered_table_rows_count is None and not nocount:\r\n- try:\r\n- count_rows = list(await db.execute(count_sql, from_sql_params))\r\n- filtered_table_rows_count = count_rows[0][0]\r\n- except QueryInterrupted:\r\n- pass\r\n-\r\n- # Faceting\r\n- if not self.ds.setting(\"allow_facet\") and any(\r\n- arg.startswith(\"_facet\") for arg in request.args\r\n- ):\r\n- raise BadRequest(\"_facet= is not allowed\")\r\n+ async def execute_count():\r\n+ # Calculate the total count for this query\r\n+ filtered_table_rows_count = None\r\n+ if (\r\n+ not db.is_mutable\r\n+ and self.ds.inspect_data\r\n+ and count_sql == f\"select count(*) from {table} \"\r\n+ ):\r\n+ # We can use a previously cached table row count\r\n+ try:\r\n+ filtered_table_rows_count = self.ds.inspect_data[database][\r\n+ \"tables\"\r\n+ ][table][\"count\"]\r\n+ except KeyError:\r\n+ pass\r\n+\r\n+ if count_sql and filtered_table_rows_count is None and not nocount:\r\n+ try:\r\n+ count_rows = list(await db.execute(count_sql, from_sql_params))\r\n+ filtered_table_rows_count = count_rows[0][0]\r\n+ except QueryInterrupted:\r\n+ pass\r\n+\r\n+ return filtered_table_rows_count\r\n+\r\n+ filtered_table_rows_count = await execute_count()\r\n \r\n # pylint: disable=no-member\r\n facet_classes = list(\r\n itertools.chain.from_iterable(pm.hook.register_facet_classes())\r\n )\r\n- facet_results = {}\r\n- facets_timed_out = []\r\n facet_instances = []\r\n for klass in facet_classes:\r\n facet_instances.append(\r\n@@ -668,33 +662,58 @@ class TableView(RowTableShared):\r\n )\r\n )\r\n \r\n- if not nofacet:\r\n- for facet in facet_instances:\r\n- (\r\n- instance_facet_results,\r\n- instance_facets_timed_out,\r\n- ) = await facet.facet_results()\r\n- for facet_info in instance_facet_results:\r\n- base_key = facet_info[\"name\"]\r\n- key = base_key\r\n- i = 1\r\n- while key in facet_results:\r\n- i += 1\r\n- key = f\"{base_key}_{i}\"\r\n- facet_results[key] = facet_info\r\n- facets_timed_out.extend(instance_facets_timed_out)\r\n-\r\n- # Calculate suggested facets\r\n- suggested_facets = []\r\n- if (\r\n- self.ds.setting(\"suggest_facets\")\r\n- and self.ds.setting(\"allow_facet\")\r\n- and not _next\r\n- and not nofacet\r\n- and not nosuggest\r\n- ):\r\n- for facet in facet_instances:\r\n- suggested_facets.extend(await facet.suggest())\r\n+ async def execute_suggested_facets():\r\n+ # Calculate suggested facets\r\n+ suggested_facets = []\r\n+ if (\r\n+ self.ds.setting(\"suggest_facets\")\r\n+ and self.ds.setting(\"allow_facet\")\r\n+ and not _next\r\n+ and not nofacet\r\n+ and not nosuggest\r\n+ ):\r\n+ for facet in facet_instances:\r\n+ suggested_facets.extend(await facet.suggest())\r\n+ return suggested_facets\r\n+\r\n+ async def execute_facets():\r\n+ facet_results = {}\r\n+ facets_timed_out = []\r\n+ if not self.ds.setting(\"allow_facet\") and any(\r\n+ arg.startswith(\"_facet\") for arg in request.args\r\n+ ):\r\n+ raise BadRequest(\"_facet= is not allowed\")\r\n+\r\n+ if not nofacet:\r\n+ for facet in facet_instances:\r\n+ (\r\n+ instance_facet_results,\r\n+ instance_facets_timed_out,\r\n+ ) = await facet.facet_results()\r\n+ for facet_info in instance_facet_results:\r\n+ base_key = facet_info[\"name\"]\r\n+ key = base_key\r\n+ i = 1\r\n+ while key in facet_results:\r\n+ i += 1\r\n+ key = f\"{base_key}_{i}\"\r\n+ facet_results[key] = facet_info\r\n+ facets_timed_out.extend(instance_facets_timed_out)\r\n+\r\n+ return facet_results, facets_timed_out\r\n+\r\n+ # Execute the main query, facets and facet suggestions in parallel:\r\n+ (\r\n+ results,\r\n+ suggested_facets,\r\n+ (facet_results, facets_timed_out),\r\n+ ) = await asyncio.gather(\r\n+ db.execute(sql, params, truncate=True, **extra_args),\r\n+ execute_suggested_facets(),\r\n+ execute_facets(),\r\n+ )\r\n+\r\n+ results = await db.execute(sql, params, truncate=True, **extra_args)\r\n \r\n # Figure out columns and rows for the query\r\n columns = [r[0] for r in results.description]\r\n```\r\nHere's the trace for `http://127.0.0.1:4422/fixtures/compound_three_primary_keys?_trace=1&_facet=pk1&_facet=pk2` with the missing facet and facet suggestion queries:\r\n\r\n\"image\"\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1058072543, "label": "Complete refactor of TableView and table.html template"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1518#issuecomment-999870993", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1518", "id": 999870993, "node_id": "IC_kwDOBm6k_c47mNIR", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-12-22T20:47:18Z", "updated_at": "2021-12-22T20:50:24Z", "author_association": "OWNER", "body": "The reason they aren't showing up in the traces is that traces are stored just for the currently executing `asyncio` task ID: https://github.com/simonw/datasette/blob/ace86566b28280091b3844cf5fbecd20158e9004/datasette/tracer.py#L13-L25\r\n\r\nThis is so traces for other incoming requests don't end up mixed together. But there's no current mechanism to track async tasks that are effectively \"child tasks\" of the current request, and hence should be tracked the same.\r\n\r\nhttps://stackoverflow.com/a/69349501/6083 suggests that you pass the task ID as an argument to the child tasks that are executed using `asyncio.gather()` to work around this kind of problem.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1058072543, "label": "Complete refactor of TableView and table.html template"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1576#issuecomment-999874484", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1576", "id": 999874484, "node_id": "IC_kwDOBm6k_c47mN-0", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-12-22T20:54:52Z", "updated_at": "2021-12-22T20:54:52Z", "author_association": "OWNER", "body": "Here's the full current relevant code from `tracer.py`: https://github.com/simonw/datasette/blob/ace86566b28280091b3844cf5fbecd20158e9004/datasette/tracer.py#L8-L64\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1087181951, "label": "Traces should include SQL executed by subtasks created with `asyncio.gather`"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1576#issuecomment-999874886", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1576", "id": 999874886, "node_id": "IC_kwDOBm6k_c47mOFG", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-12-22T20:55:42Z", "updated_at": "2021-12-22T20:57:28Z", "author_association": "OWNER", "body": "One way to solve this would be to introduce a `set_task_id()` method, which sets an ID which will be returned by `get_task_id()` instead of using `id(current_task(loop=loop))`.\r\n\r\nIt would be really nice if I could solve this using `with` syntax somehow. Something like:\r\n```python\r\nwith trace_child_tasks():\r\n (\r\n suggested_facets,\r\n (facet_results, facets_timed_out),\r\n ) = await asyncio.gather(\r\n execute_suggested_facets(),\r\n execute_facets(),\r\n )\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1087181951, "label": "Traces should include SQL executed by subtasks created with `asyncio.gather`"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1576#issuecomment-999876666", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1576", "id": 999876666, "node_id": "IC_kwDOBm6k_c47mOg6", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-12-22T20:59:22Z", "updated_at": "2021-12-22T21:18:09Z", "author_association": "OWNER", "body": "This article is relevant: [Context information storage for asyncio](https://blog.sqreen.com/asyncio/) - in particular the section https://blog.sqreen.com/asyncio/#context-inheritance-between-tasks which describes exactly the problem I have and their solution, which involves this trickery:\r\n\r\n```python\r\ndef request_task_factory(loop, coro):\r\n child_task = asyncio.tasks.Task(coro, loop=loop)\r\n parent_task = asyncio.Task.current_task(loop=loop)\r\n current_request = getattr(parent_task, 'current_request', None)\r\n setattr(child_task, 'current_request', current_request)\r\n return child_task\r\n\r\nloop = asyncio.get_event_loop()\r\nloop.set_task_factory(request_task_factory)\r\n```\r\n\r\nThey released their solution as a library: https://pypi.org/project/aiocontext/ and https://github.com/sqreen/AioContext - but that company was acquired by Datadog back in April and doesn't seem to be actively maintaining their open source stuff any more: https://twitter.com/SqreenIO/status/1384906075506364417", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1087181951, "label": "Traces should include SQL executed by subtasks created with `asyncio.gather`"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1576#issuecomment-999878907", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1576", "id": 999878907, "node_id": "IC_kwDOBm6k_c47mPD7", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-12-22T21:03:49Z", "updated_at": "2021-12-22T21:10:46Z", "author_association": "OWNER", "body": "`context_vars` can solve this but they were introduced in Python 3.7: https://www.python.org/dev/peps/pep-0567/\r\n\r\nPython 3.6 support ends in a few days time, and it looks like Glitch has updated to 3.7 now - so maybe I can get away with Datasette needing 3.7 these days?\r\n\r\nTweeted about that here: https://twitter.com/simonw/status/1473761478155010048", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1087181951, "label": "Traces should include SQL executed by subtasks created with `asyncio.gather`"}, "performed_via_github_app": null}