{"html_url": "https://github.com/simonw/datasette/issues/1727#issuecomment-1111451790", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1727", "id": 1111451790, "node_id": "IC_kwDOBm6k_c5CP2iO", "user": {"value": 716529, "label": "glyph"}, "created_at": "2022-04-27T20:30:33Z", "updated_at": "2022-04-27T20:30:33Z", "author_association": "NONE", "body": "> I should try seeing what happens with WAL mode enabled.\r\n\r\nI've only skimmed above but it looks like you're doing mainly read-only queries? WAL mode is about better interactions between writers & readers, primarily.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1217759117, "label": "Research: demonstrate if parallel SQL queries are worthwhile"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1727#issuecomment-1111432375", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1727", "id": 1111432375, "node_id": "IC_kwDOBm6k_c5CPxy3", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-04-27T20:07:57Z", "updated_at": "2022-04-27T20:07:57Z", "author_association": "OWNER", "body": "Also useful: https://avi.im/blag/2021/fast-sqlite-inserts/ - from a tip on Twitter: https://twitter.com/ricardoanderegg/status/1519402047556235264", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1217759117, "label": "Research: demonstrate if parallel SQL queries are worthwhile"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1727#issuecomment-1111535818", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1727", "id": 1111535818, "node_id": "IC_kwDOBm6k_c5CQLDK", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-04-27T22:18:45Z", "updated_at": "2022-04-27T22:18:45Z", "author_association": "OWNER", "body": "Another avenue: https://twitter.com/weargoggles/status/1519426289920270337\r\n\r\n> SQLite has its own mutexes to provide thread safety, which as another poster noted are out of play in multi process setups. Perhaps downgrading from the \u201cserializable\u201d to \u201cmulti-threaded\u201d safety would be okay for Datasette? https://sqlite.org/c3ref/c_config_covering_index_scan.html#sqliteconfigmultithread\r\n\r\nDoesn't look like there's an obvious way to access that from Python via the `sqlite3` module though.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1217759117, "label": "Research: demonstrate if parallel SQL queries are worthwhile"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1724#issuecomment-1110369004", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1724", "id": 1110369004, "node_id": "IC_kwDOBm6k_c5CLuLs", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-04-27T00:16:35Z", "updated_at": "2022-04-27T00:17:04Z", "author_association": "OWNER", "body": "I bet this is because it's exceeding the size limit: https://github.com/simonw/datasette/blob/da53e0360da4771ffb56a8e3eb3f7476f3168299/datasette/tracer.py#L80-L88\r\n\r\nhttps://github.com/simonw/datasette/blob/da53e0360da4771ffb56a8e3eb3f7476f3168299/datasette/tracer.py#L102-L113", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1216619276, "label": "?_trace=1 doesn't work on Global Power Plants demo"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1727#issuecomment-1111385875", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1727", "id": 1111385875, "node_id": "IC_kwDOBm6k_c5CPmcT", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-04-27T19:16:57Z", "updated_at": "2022-04-27T19:16:57Z", "author_association": "OWNER", "body": "I just remembered the `--setting num_sql_threads` option... which defaults to 3! https://github.com/simonw/datasette/blob/942411ef946e9a34a2094944d3423cddad27efd3/datasette/app.py#L109-L113\r\n\r\nWould explain why the first trace never seems to show more than three SQL queries executing at once.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1217759117, "label": "Research: demonstrate if parallel SQL queries are worthwhile"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1727#issuecomment-1111558204", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1727", "id": 1111558204, "node_id": "IC_kwDOBm6k_c5CQQg8", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-04-27T22:58:39Z", "updated_at": "2022-04-27T22:58:39Z", "author_association": "OWNER", "body": "I should check my timing mechanism. Am I capturing the time taken just in SQLite or does it include time spent in Python crossing between async and threaded world and waiting for a thread pool worker to become available?\r\n\r\nThat could explain the longer query times.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1217759117, "label": "Research: demonstrate if parallel SQL queries are worthwhile"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1727#issuecomment-1111431785", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1727", "id": 1111431785, "node_id": "IC_kwDOBm6k_c5CPxpp", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-04-27T20:07:16Z", "updated_at": "2022-04-27T20:07:16Z", "author_association": "OWNER", "body": "I think I need some much more in-depth tracing tricks for this.\r\n\r\nhttps://www.maartenbreddels.com/perf/jupyter/python/tracing/gil/2021/01/14/Tracing-the-Python-GIL.html looks relevant - uses the `perf` tool on Linux.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1217759117, "label": "Research: demonstrate if parallel SQL queries are worthwhile"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1727#issuecomment-1111553029", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1727", "id": 1111553029, "node_id": "IC_kwDOBm6k_c5CQPQF", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-04-27T22:48:21Z", "updated_at": "2022-04-27T22:48:21Z", "author_association": "OWNER", "body": "I wonder if it would be worth exploring multiprocessing here.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1217759117, "label": "Research: demonstrate if parallel SQL queries are worthwhile"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/159#issuecomment-1111506339", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/159", "id": 1111506339, "node_id": "IC_kwDOCGYnMM5CQD2j", "user": {"value": 154364, "label": "dracos"}, "created_at": "2022-04-27T21:35:13Z", "updated_at": "2022-04-27T21:35:13Z", "author_association": "NONE", "body": "Just stumbled across this, wondering why none of my deletes were working.", "reactions": "{\"total_count\": 2, \"+1\": 2, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 702386948, "label": ".delete_where() does not auto-commit (unlike .insert() or .upsert())"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1727#issuecomment-1111390433", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1727", "id": 1111390433, "node_id": "IC_kwDOBm6k_c5CPnjh", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-04-27T19:21:02Z", "updated_at": "2022-04-27T19:21:02Z", "author_association": "OWNER", "body": "One weird thing: I noticed that in the parallel trace above the SQL query bars are wider. Mousover shows duration in ms, and I got 13ms for this query:\r\n\r\n select message as value, count(*) as n from (\r\n\r\nBut in the `?_noparallel=1` version that some query took 2.97ms.\r\n\r\nGiven those numbers though I would expect the overall page time to be MUCH worse for the parallel version - but the page load times are instead very close to each other, with parallel often winning.\r\n\r\nThis is super-weird.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1217759117, "label": "Research: demonstrate if parallel SQL queries are worthwhile"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1727#issuecomment-1111551076", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1727", "id": 1111551076, "node_id": "IC_kwDOBm6k_c5CQOxk", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-04-27T22:44:51Z", "updated_at": "2022-04-27T22:45:04Z", "author_association": "OWNER", "body": "Really wild idea: what if I created three copies of the SQLite database file - as three separate file names - and then balanced the parallel queries across all these? Any chance that could avoid any mysterious locking issues?", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1217759117, "label": "Research: demonstrate if parallel SQL queries are worthwhile"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1727#issuecomment-1111408273", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1727", "id": 1111408273, "node_id": "IC_kwDOBm6k_c5CPr6R", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-04-27T19:40:51Z", "updated_at": "2022-04-27T19:42:17Z", "author_association": "OWNER", "body": "Relevant: here's the code that sets up a Datasette SQLite connection: https://github.com/simonw/datasette/blob/7a6654a253dee243518dc542ce4c06dbb0d0801d/datasette/database.py#L73-L96\r\n\r\nIt's using `check_same_thread=False` - here's [the Python docs on that](https://docs.python.org/3/library/sqlite3.html#sqlite3.connect):\r\n\r\n> By default, *check_same_thread* is [`True`](https://docs.python.org/3/library/constants.html#True \"True\") and only the creating thread may use the connection. If set [`False`](https://docs.python.org/3/library/constants.html#False \"False\"), the returned connection may be shared across multiple threads. When using multiple threads with the same connection writing operations should be serialized by the user to avoid data corruption.\r\n\r\nThis is why Datasette reserves a single connection for write queries and queues them up in memory, [as described here](https://simonwillison.net/2020/Feb/26/weeknotes-datasette-writes/).", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1217759117, "label": "Research: demonstrate if parallel SQL queries are worthwhile"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1724#issuecomment-1110370095", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1724", "id": 1110370095, "node_id": "IC_kwDOBm6k_c5CLucv", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-04-27T00:18:30Z", "updated_at": "2022-04-27T00:18:30Z", "author_association": "OWNER", "body": "So this isn't a bug here, it's working as intended.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1216619276, "label": "?_trace=1 doesn't work on Global Power Plants demo"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1727#issuecomment-1111442012", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1727", "id": 1111442012, "node_id": "IC_kwDOBm6k_c5CP0Jc", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-04-27T20:19:00Z", "updated_at": "2022-04-27T20:19:00Z", "author_association": "OWNER", "body": "Something worth digging into: are these parallel queries running against the same SQLite connection or are they each rubbing against a separate SQLite connection?\r\n\r\nJust realized I know the answer: they're running against separate SQLite connections, because that's how the time limit mechanism works: it installs a progress handler for each connection which terminates it after a set time.\r\n\r\nThis means that if SQLite benefits from multiple threads using the same connection (due to shared caches or similar) then Datasette will not be seeing those benefits.\r\n\r\nIt also means that if there's some mechanism within SQLite that penalizes you for having multiple parallel connections to a single file (just guessing here, maybe there's some kind of locking going on?) then Datasette will suffer those penalties.\r\n\r\nI should try seeing what happens with WAL mode enabled.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1217759117, "label": "Research: demonstrate if parallel SQL queries are worthwhile"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1727#issuecomment-1111462442", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1727", "id": 1111462442, "node_id": "IC_kwDOBm6k_c5CP5Iq", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-04-27T20:40:59Z", "updated_at": "2022-04-27T20:42:49Z", "author_association": "OWNER", "body": "This looks VERY relevant: [SQLite Shared-Cache Mode](https://www.sqlite.org/sharedcache.html):\r\n\r\n> SQLite includes a special \"shared-cache\" mode (disabled by default) intended for use in embedded servers. If shared-cache mode is enabled and a thread establishes multiple connections to the same database, the connections share a single data and schema cache. This can significantly reduce the quantity of memory and IO required by the system.\r\n\r\nEnabled as part of the URI filename:\r\n\r\n ATTACH 'file:aux.db?cache=shared' AS aux;\r\n\r\nTurns out I'm already using this for in-memory databases that have `.memory_name` set, but not (yet) for regular file-backed databases:\r\n\r\nhttps://github.com/simonw/datasette/blob/7a6654a253dee243518dc542ce4c06dbb0d0801d/datasette/database.py#L73-L75\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1217759117, "label": "Research: demonstrate if parallel SQL queries are worthwhile"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1727#issuecomment-1111485722", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1727", "id": 1111485722, "node_id": "IC_kwDOBm6k_c5CP-0a", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-04-27T21:08:20Z", "updated_at": "2022-04-27T21:08:20Z", "author_association": "OWNER", "body": "Tried that and it didn't seem to make a difference either.\r\n\r\nI really need a much deeper view of what's going on here.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1217759117, "label": "Research: demonstrate if parallel SQL queries are worthwhile"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1727#issuecomment-1111460068", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1727", "id": 1111460068, "node_id": "IC_kwDOBm6k_c5CP4jk", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-04-27T20:38:32Z", "updated_at": "2022-04-27T20:38:32Z", "author_association": "OWNER", "body": "WAL mode didn't seem to make a difference. I thought there was a chance it might help multiple read connections operate at the same time but it looks like it really does only matter for when writes are going on.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1217759117, "label": "Research: demonstrate if parallel SQL queries are worthwhile"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1727#issuecomment-1111380282", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1727", "id": 1111380282, "node_id": "IC_kwDOBm6k_c5CPlE6", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-04-27T19:10:27Z", "updated_at": "2022-04-27T19:10:27Z", "author_association": "OWNER", "body": "Wrote more about that here: https://simonwillison.net/2022/Apr/27/parallel-queries/\r\n\r\nCompare https://latest-with-plugins.datasette.io/github/commits?_facet=repo&_facet=committer&_trace=1\r\n\r\n![image](https://user-images.githubusercontent.com/9599/165601503-2083c5d2-d740-405c-b34d-85570744ca82.png)\r\n\r\nWith the same thing but with parallel execution disabled:\r\n\r\nhttps://latest-with-plugins.datasette.io/github/commits?_facet=repo&_facet=committer&_trace=1&_noparallel=1\r\n\r\n![image](https://user-images.githubusercontent.com/9599/165601525-98abbfb1-5631-4040-b6bd-700948d1db6e.png)\r\n\r\nThose total page load time numbers are very similar. Is this parallel optimization worthwhile?\r\n\r\nMaybe it's only worth it on larger databases? Or maybe larger databases perform worse with this?", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1217759117, "label": "Research: demonstrate if parallel SQL queries are worthwhile"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1727#issuecomment-1111456500", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1727", "id": 1111456500, "node_id": "IC_kwDOBm6k_c5CP3r0", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-04-27T20:36:01Z", "updated_at": "2022-04-27T20:36:01Z", "author_association": "OWNER", "body": "Yeah all of this is pretty much assuming read-only connections. Datasette has a separate mechanism for ensuring that writes are executed one at a time against a dedicated connection from an in-memory queue:\r\n- https://github.com/simonw/datasette/issues/682", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1217759117, "label": "Research: demonstrate if parallel SQL queries are worthwhile"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1724#issuecomment-1110585475", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1724", "id": 1110585475, "node_id": "IC_kwDOBm6k_c5CMjCD", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-04-27T06:15:14Z", "updated_at": "2022-04-27T06:15:14Z", "author_association": "OWNER", "body": "Yeah, that page is 438K (but only 20K gzipped).", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1216619276, "label": "?_trace=1 doesn't work on Global Power Plants demo"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1727#issuecomment-1111448928", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1727", "id": 1111448928, "node_id": "IC_kwDOBm6k_c5CP11g", "user": {"value": 716529, "label": "glyph"}, "created_at": "2022-04-27T20:27:05Z", "updated_at": "2022-04-27T20:27:05Z", "author_association": "NONE", "body": "You don't want to re-use an SQLite connection from multiple threads anyway: https://www.sqlite.org/threadsafe.html\r\n\r\nMultiple connections can operate on the file in parallel, but a single connection can't:\r\n\r\n> Multi-thread. In this mode, SQLite can be safely used by multiple threads **provided that no single database connection is used simultaneously in two or more threads**.\r\n\r\n(emphasis mine)", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1217759117, "label": "Research: demonstrate if parallel SQL queries are worthwhile"}, "performed_via_github_app": null}