{"html_url": "https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008526736", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/364", "id": 1008526736, "node_id": "IC_kwDOCGYnMM48HOWQ", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-10T04:07:29Z", "updated_at": "2022-01-10T04:07:29Z", "author_association": "OWNER", "body": "I think this test is right:\r\n```python\r\ndef test_insert_streaming_batch_size_1(db_path):\r\n # https://github.com/simonw/sqlite-utils/issues/364\r\n # Streaming with --batch-size 1 should commit on each record\r\n # Can't use CliRunner().invoke() here bacuse we need to\r\n # run assertions in between writing to process stdin\r\n proc = subprocess.Popen(\r\n [\r\n sys.executable,\r\n \"-m\",\r\n \"sqlite_utils\",\r\n \"insert\",\r\n db_path,\r\n \"rows\",\r\n \"-\",\r\n \"--nl\",\r\n \"--batch-size\",\r\n \"1\",\r\n ],\r\n stdin=subprocess.PIPE,\r\n )\r\n proc.stdin.write(b'{\"name\": \"Azi\"}')\r\n proc.stdin.flush()\r\n assert list(Database(db_path)[\"rows\"].rows) == [{\"name\": \"Azi\"}]\r\n proc.stdin.write(b'{\"name\": \"Suna\"}')\r\n proc.stdin.flush()\r\n assert list(Database(db_path)[\"rows\"].rows) == [{\"name\": \"Azi\"}, {\"name\": \"Suna\"}]\r\n proc.stdin.close()\r\n proc.wait()\r\n```\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1095570074, "label": "`--batch-size 1` doesn't seem to commit for every item"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008537194", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/364", "id": 1008537194, "node_id": "IC_kwDOCGYnMM48HQ5q", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-10T04:29:53Z", "updated_at": "2022-01-10T04:31:29Z", "author_association": "OWNER", "body": "After a bunch of debugging with `print()` statements it's clear that the problem isn't with when things are committed or the size of the batches - it's that the data sent to standard input is all being processed in one go, not a line at a time.\r\n\r\nI think that's because it is being buffered by this: https://github.com/simonw/sqlite-utils/blob/d2a79d200f9071a86027365fa2a576865b71064f/sqlite_utils/cli.py#L759-L770\r\n\r\nThe buffering is there so that we can sniff the first few bytes to detect if it's a CSV file - added in 99ff0a288c08ec2071139c6031eb880fa9c95310 for #230. So maybe for non-CSV inputs we should disable buffering?", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1095570074, "label": "`--batch-size 1` doesn't seem to commit for every item"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008545140", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/364", "id": 1008545140, "node_id": "IC_kwDOCGYnMM48HS10", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-10T05:01:34Z", "updated_at": "2022-01-10T05:01:34Z", "author_association": "OWNER", "body": "Urgh, tests are still failing intermittently - for example:\r\n```\r\n time.sleep(0.4)\r\n> assert list(Database(db_path)[\"rows\"].rows) == [{\"name\": \"Azi\"}]\r\nE AssertionError: assert [] == [{'name': 'Azi'}]\r\nE Right contains one more item: {'name': 'Azi'}\r\nE Full diff:\r\nE - [{'name': 'Azi'}]\r\nE + []\r\n```\r\nI'm going to change this code to keep on trying up to 10 seconds - that should get the tests to pass faster on most machines.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1095570074, "label": "`--batch-size 1` doesn't seem to commit for every item"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008546573", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/364", "id": 1008546573, "node_id": "IC_kwDOCGYnMM48HTMN", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-10T05:05:15Z", "updated_at": "2022-01-10T05:05:15Z", "author_association": "OWNER", "body": "Bit nasty but it might work:\r\n```python\r\n def try_until(expected):\r\n tries = 0\r\n while True:\r\n rows = list(Database(db_path)[\"rows\"].rows)\r\n if rows == expected:\r\n return\r\n tries += 1\r\n if tries > 10:\r\n assert False, \"Expected {}, got {}\".format(expected, rows)\r\n time.sleep(tries * 0.1)\r\n\r\n try_until([{\"name\": \"Azi\"}])\r\n proc.stdin.write(b'{\"name\": \"Suna\"}\\n')\r\n proc.stdin.flush()\r\n try_until([{\"name\": \"Azi\"}, {\"name\": \"Suna\"}])\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1095570074, "label": "`--batch-size 1` doesn't seem to commit for every item"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008557414", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/364", "id": 1008557414, "node_id": "IC_kwDOCGYnMM48HV1m", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-10T05:36:19Z", "updated_at": "2022-01-10T05:36:19Z", "author_association": "OWNER", "body": "That did the trick.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1095570074, "label": "`--batch-size 1` doesn't seem to commit for every item"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/366#issuecomment-1009273525", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/366", "id": 1009273525, "node_id": "IC_kwDOCGYnMM48KEq1", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-10T19:32:39Z", "updated_at": "2022-01-10T19:32:39Z", "author_association": "OWNER", "body": "I'm going to implement the Python library methods based on the prototype:\r\n```diff\r\ncommit 650f97a08f29a688c530e5f6c9eedc9269ed7bdc\r\nAuthor: Simon Willison \r\nDate: Sat Jan 8 13:34:01 2022 -0800\r\n\r\n Initial prototype of .analyze(), refs #366\r\n\r\ndiff --git a/sqlite_utils/db.py b/sqlite_utils/db.py\r\nindex dfc4723..1348b4a 100644\r\n--- a/sqlite_utils/db.py\r\n+++ b/sqlite_utils/db.py\r\n@@ -923,6 +923,13 @@ class Database:\r\n \"Run a SQLite ``VACUUM`` against the database.\"\r\n self.execute(\"VACUUM;\")\r\n \r\n+ def analyze(self, name=None):\r\n+ \"Run ``ANALYZE`` against the entire database or a named table or index.\"\r\n+ sql = \"ANALYZE\"\r\n+ if name is not None:\r\n+ sql += \" [{}]\".format(name)\r\n+ self.execute(sql)\r\n+\r\n \r\n class Queryable:\r\n def exists(self) -> bool:\r\n@@ -2902,6 +2909,10 @@ class Table(Queryable):\r\n )\r\n return self\r\n \r\n+ def analyze(self):\r\n+ \"Run ANALYZE against this table\"\r\n+ self.db.analyze(self.name)\r\n+\r\n def analyze_column(\r\n self, column: str, common_limit: int = 10, value_truncate=None, total_rows=None\r\n ) -> \"ColumnDetails\":\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1096563265, "label": "Python library methods for calling ANALYZE"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/366#issuecomment-1009285627", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/366", "id": 1009285627, "node_id": "IC_kwDOCGYnMM48KHn7", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-10T19:49:19Z", "updated_at": "2022-01-10T19:51:25Z", "author_association": "OWNER", "body": "Documentation for those two new methods: https://sqlite-utils.datasette.io/en/latest/python-api.html#optimizing-index-usage-with-analyze", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1096563265, "label": "Python library methods for calling ANALYZE"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/366#issuecomment-1009286373", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/366", "id": 1009286373, "node_id": "IC_kwDOCGYnMM48KHzl", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-10T19:50:22Z", "updated_at": "2022-01-10T19:50:22Z", "author_association": "OWNER", "body": "With respect to #365, I'm now thinking that having the ability to say \"... and then run ANALYZE\" could be useful for a bunch of Python methods. For example:\r\n\r\n```python\r\ndb[\"dogs\"].insert_all(list_of_dogs, analyze=True)\r\ndb[\"dogs\"].create_index([\"name\"], analyze=True)\r\n```\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1096563265, "label": "Python library methods for calling ANALYZE"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/366#issuecomment-1009288898", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/366", "id": 1009288898, "node_id": "IC_kwDOCGYnMM48KIbC", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-10T19:54:04Z", "updated_at": "2022-01-10T19:54:04Z", "author_association": "OWNER", "body": "Having browsed the API reference I think the methods that would benefit from an `analyze=True` parameter are:\r\n\r\n- `db.create_index`\r\n- `table.insert_all`\r\n- `table.upsert_all`\r\n- `table.delete_where`", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1096563265, "label": "Python library methods for calling ANALYZE"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/375#issuecomment-1008556706", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/375", "id": 1008556706, "node_id": "IC_kwDOCGYnMM48HVqi", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-10T05:33:41Z", "updated_at": "2022-01-10T05:33:41Z", "author_association": "OWNER", "body": "I tested the prototype like this:\r\n\r\n sqlite-utils blah.db 'create table blah (id integer primary key, name text)' \r\n echo 'id,name\r\n 1,Cleo\r\n 2,Chicken' > blah.csv\r\n sqlite-utils bulk blah.db 'insert into blah (id, name) values (:id, :name)' blah.csv --csv\r\n\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1097251014, "label": "`sqlite-utils bulk` command"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/pull/367#issuecomment-1009272446", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/367", "id": 1009272446, "node_id": "IC_kwDOCGYnMM48KEZ-", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-10T19:31:08Z", "updated_at": "2022-01-10T19:31:08Z", "author_association": "OWNER", "body": "I'm going to implement this in a separate commit from this PR.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1097041471, "label": "Initial prototype of .analyze() methods"}, "performed_via_github_app": null}