html_url,issue_url,id,node_id,user,user_label,created_at,updated_at,author_association,body,reactions,issue,issue_label,performed_via_github_app https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008526736,https://api.github.com/repos/simonw/sqlite-utils/issues/364,1008526736,IC_kwDOCGYnMM48HOWQ,9599,simonw,2022-01-10T04:07:29Z,2022-01-10T04:07:29Z,OWNER,"I think this test is right: ```python def test_insert_streaming_batch_size_1(db_path): # https://github.com/simonw/sqlite-utils/issues/364 # Streaming with --batch-size 1 should commit on each record # Can't use CliRunner().invoke() here bacuse we need to # run assertions in between writing to process stdin proc = subprocess.Popen( [ sys.executable, ""-m"", ""sqlite_utils"", ""insert"", db_path, ""rows"", ""-"", ""--nl"", ""--batch-size"", ""1"", ], stdin=subprocess.PIPE, ) proc.stdin.write(b'{""name"": ""Azi""}') proc.stdin.flush() assert list(Database(db_path)[""rows""].rows) == [{""name"": ""Azi""}] proc.stdin.write(b'{""name"": ""Suna""}') proc.stdin.flush() assert list(Database(db_path)[""rows""].rows) == [{""name"": ""Azi""}, {""name"": ""Suna""}] proc.stdin.close() proc.wait() ``` ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1095570074,`--batch-size 1` doesn't seem to commit for every item, https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008537194,https://api.github.com/repos/simonw/sqlite-utils/issues/364,1008537194,IC_kwDOCGYnMM48HQ5q,9599,simonw,2022-01-10T04:29:53Z,2022-01-10T04:31:29Z,OWNER,"After a bunch of debugging with `print()` statements it's clear that the problem isn't with when things are committed or the size of the batches - it's that the data sent to standard input is all being processed in one go, not a line at a time. I think that's because it is being buffered by this: https://github.com/simonw/sqlite-utils/blob/d2a79d200f9071a86027365fa2a576865b71064f/sqlite_utils/cli.py#L759-L770 The buffering is there so that we can sniff the first few bytes to detect if it's a CSV file - added in 99ff0a288c08ec2071139c6031eb880fa9c95310 for #230. So maybe for non-CSV inputs we should disable buffering?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1095570074,`--batch-size 1` doesn't seem to commit for every item, https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008545140,https://api.github.com/repos/simonw/sqlite-utils/issues/364,1008545140,IC_kwDOCGYnMM48HS10,9599,simonw,2022-01-10T05:01:34Z,2022-01-10T05:01:34Z,OWNER,"Urgh, tests are still failing intermittently - for example: ``` time.sleep(0.4) > assert list(Database(db_path)[""rows""].rows) == [{""name"": ""Azi""}] E AssertionError: assert [] == [{'name': 'Azi'}] E Right contains one more item: {'name': 'Azi'} E Full diff: E - [{'name': 'Azi'}] E + [] ``` I'm going to change this code to keep on trying up to 10 seconds - that should get the tests to pass faster on most machines.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1095570074,`--batch-size 1` doesn't seem to commit for every item, https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008546573,https://api.github.com/repos/simonw/sqlite-utils/issues/364,1008546573,IC_kwDOCGYnMM48HTMN,9599,simonw,2022-01-10T05:05:15Z,2022-01-10T05:05:15Z,OWNER,"Bit nasty but it might work: ```python def try_until(expected): tries = 0 while True: rows = list(Database(db_path)[""rows""].rows) if rows == expected: return tries += 1 if tries > 10: assert False, ""Expected {}, got {}"".format(expected, rows) time.sleep(tries * 0.1) try_until([{""name"": ""Azi""}]) proc.stdin.write(b'{""name"": ""Suna""}\n') proc.stdin.flush() try_until([{""name"": ""Azi""}, {""name"": ""Suna""}]) ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1095570074,`--batch-size 1` doesn't seem to commit for every item, https://github.com/simonw/sqlite-utils/issues/375#issuecomment-1008556706,https://api.github.com/repos/simonw/sqlite-utils/issues/375,1008556706,IC_kwDOCGYnMM48HVqi,9599,simonw,2022-01-10T05:33:41Z,2022-01-10T05:33:41Z,OWNER,"I tested the prototype like this: sqlite-utils blah.db 'create table blah (id integer primary key, name text)' echo 'id,name 1,Cleo 2,Chicken' > blah.csv sqlite-utils bulk blah.db 'insert into blah (id, name) values (:id, :name)' blah.csv --csv ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1097251014,`sqlite-utils bulk` command, https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008557414,https://api.github.com/repos/simonw/sqlite-utils/issues/364,1008557414,IC_kwDOCGYnMM48HV1m,9599,simonw,2022-01-10T05:36:19Z,2022-01-10T05:36:19Z,OWNER,That did the trick.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1095570074,`--batch-size 1` doesn't seem to commit for every item, https://github.com/simonw/sqlite-utils/pull/367#issuecomment-1009272446,https://api.github.com/repos/simonw/sqlite-utils/issues/367,1009272446,IC_kwDOCGYnMM48KEZ-,9599,simonw,2022-01-10T19:31:08Z,2022-01-10T19:31:08Z,OWNER,I'm going to implement this in a separate commit from this PR.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1097041471,Initial prototype of .analyze() methods, https://github.com/simonw/sqlite-utils/issues/366#issuecomment-1009273525,https://api.github.com/repos/simonw/sqlite-utils/issues/366,1009273525,IC_kwDOCGYnMM48KEq1,9599,simonw,2022-01-10T19:32:39Z,2022-01-10T19:32:39Z,OWNER,"I'm going to implement the Python library methods based on the prototype: ```diff commit 650f97a08f29a688c530e5f6c9eedc9269ed7bdc Author: Simon Willison Date: Sat Jan 8 13:34:01 2022 -0800 Initial prototype of .analyze(), refs #366 diff --git a/sqlite_utils/db.py b/sqlite_utils/db.py index dfc4723..1348b4a 100644 --- a/sqlite_utils/db.py +++ b/sqlite_utils/db.py @@ -923,6 +923,13 @@ class Database: ""Run a SQLite ``VACUUM`` against the database."" self.execute(""VACUUM;"") + def analyze(self, name=None): + ""Run ``ANALYZE`` against the entire database or a named table or index."" + sql = ""ANALYZE"" + if name is not None: + sql += "" [{}]"".format(name) + self.execute(sql) + class Queryable: def exists(self) -> bool: @@ -2902,6 +2909,10 @@ class Table(Queryable): ) return self + def analyze(self): + ""Run ANALYZE against this table"" + self.db.analyze(self.name) + def analyze_column( self, column: str, common_limit: int = 10, value_truncate=None, total_rows=None ) -> ""ColumnDetails"": ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1096563265,Python library methods for calling ANALYZE, https://github.com/simonw/sqlite-utils/issues/366#issuecomment-1009286373,https://api.github.com/repos/simonw/sqlite-utils/issues/366,1009286373,IC_kwDOCGYnMM48KHzl,9599,simonw,2022-01-10T19:50:22Z,2022-01-10T19:50:22Z,OWNER,"With respect to #365, I'm now thinking that having the ability to say ""... and then run ANALYZE"" could be useful for a bunch of Python methods. For example: ```python db[""dogs""].insert_all(list_of_dogs, analyze=True) db[""dogs""].create_index([""name""], analyze=True) ``` ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1096563265,Python library methods for calling ANALYZE, https://github.com/simonw/sqlite-utils/issues/366#issuecomment-1009285627,https://api.github.com/repos/simonw/sqlite-utils/issues/366,1009285627,IC_kwDOCGYnMM48KHn7,9599,simonw,2022-01-10T19:49:19Z,2022-01-10T19:51:25Z,OWNER,Documentation for those two new methods: https://sqlite-utils.datasette.io/en/latest/python-api.html#optimizing-index-usage-with-analyze,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1096563265,Python library methods for calling ANALYZE, https://github.com/simonw/sqlite-utils/issues/366#issuecomment-1009288898,https://api.github.com/repos/simonw/sqlite-utils/issues/366,1009288898,IC_kwDOCGYnMM48KIbC,9599,simonw,2022-01-10T19:54:04Z,2022-01-10T19:54:04Z,OWNER,"Having browsed the API reference I think the methods that would benefit from an `analyze=True` parameter are: - `db.create_index` - `table.insert_all` - `table.upsert_all` - `table.delete_where`","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1096563265,Python library methods for calling ANALYZE,