html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,issue,performed_via_github_app https://github.com/simonw/sqlite-utils/issues/147#issuecomment-683528149,https://api.github.com/repos/simonw/sqlite-utils/issues/147,683528149,MDEyOklzc3VlQ29tbWVudDY4MzUyODE0OQ==,9599,2020-08-31T03:17:26Z,2020-08-31T03:17:26Z,OWNER,"+1 to making this something that users can customize. An optional argument to the `Database` constructor would be a neat way to do this. I think there's a terrifying way that we could find this value... we could perform a binary search for it! Open up a memory connection and try running different bulk inserts against it and catch the exceptions - then adjust and try again. My hunch is that we could perform just 2 or 3 probes (maybe against carefully selected values) to find the highest value that works. If this process took less than a few ms to run I'd be happy to do it automatically when the class is instantiated (and let users disable that automatic proving by setting a value using the constructor argument).","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",688670158, https://github.com/simonw/sqlite-utils/issues/147#issuecomment-779409770,https://api.github.com/repos/simonw/sqlite-utils/issues/147,779409770,MDEyOklzc3VlQ29tbWVudDc3OTQwOTc3MA==,9599,2021-02-15T19:23:11Z,2021-02-15T19:23:11Z,OWNER,"On my Mac right now I'm seeing a limit of 500,000: ``` % sqlite3 -cmd "".limits variable_number"" variable_number 500000 ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",688670158, https://github.com/simonw/sqlite-utils/issues/147#issuecomment-779416619,https://api.github.com/repos/simonw/sqlite-utils/issues/147,779416619,MDEyOklzc3VlQ29tbWVudDc3OTQxNjYxOQ==,9599,2021-02-15T19:40:57Z,2021-02-15T21:27:55Z,OWNER,"Tried this experiment (not proper binary search, it only searches downwards): ```python import sqlite3 db = sqlite3.connect("":memory:"") def tryit(n): sql = ""select 1 where 1 in ({})"".format("", "".join(""?"" for i in range(n))) db.execute(sql, [0 for i in range(n)]) def find_limit(min=0, max=5_000_000): value = max while True: print('Trying', value) try: tryit(value) return value except: value = value // 2 ``` Running `find_limit()` with those default parameters takes about 1.47s on my laptop: ``` In [9]: %timeit find_limit() Trying 5000000 Trying 2500000... 1.47 s ± 28 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` Interestingly the value it suggested was 156250 - suggesting that the macOS `sqlite3` binary with a 500,000 limit isn't the same as whatever my Python is using here.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",688670158, https://github.com/simonw/sqlite-utils/issues/147#issuecomment-779417723,https://api.github.com/repos/simonw/sqlite-utils/issues/147,779417723,MDEyOklzc3VlQ29tbWVudDc3OTQxNzcyMw==,9599,2021-02-15T19:44:02Z,2021-02-15T19:47:00Z,OWNER,"`%timeit find_limit(max=1_000_000)` took 378ms on my laptop `%timeit find_limit(max=500_000)` took 197ms `%timeit find_limit(max=200_000)` reported 53ms per loop `%timeit find_limit(max=100_000)` reported 26.8ms per loop. All of these are still slow enough that I'm not comfortable running this search for every time the library is imported. Allowing users to opt-in to this as a performance enhancement might be better.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",688670158, https://github.com/simonw/sqlite-utils/issues/147#issuecomment-779445423,https://api.github.com/repos/simonw/sqlite-utils/issues/147,779445423,MDEyOklzc3VlQ29tbWVudDc3OTQ0NTQyMw==,9599,2021-02-15T21:00:44Z,2021-02-15T21:01:09Z,OWNER,"I tried changing the hard-coded value from 999 to 156_250 and running `sqlite-utils insert` against a 500MB CSV file, with these results: ``` (sqlite-utils) sqlite-utils % time sqlite-utils insert slow-ethos.db ethos ../ethos-datasette/ethos.csv --no-headers [###################################-] 99% 00:00:00sqlite-utils insert slow-ethos.db ethos ../ethos-datasette/ethos.csv 44.74s user 7.61s system 92% cpu 56.601 total # Increased the setting here (sqlite-utils) sqlite-utils % time sqlite-utils insert fast-ethos.db ethos ../ethos-datasette/ethos.csv --no-headers [###################################-] 99% 00:00:00sqlite-utils insert fast-ethos.db ethos ../ethos-datasette/ethos.csv 39.40s user 5.15s system 96% cpu 46.320 total ``` Not as big a difference as I was expecting.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",688670158, https://github.com/simonw/sqlite-utils/issues/147#issuecomment-779446652,https://api.github.com/repos/simonw/sqlite-utils/issues/147,779446652,MDEyOklzc3VlQ29tbWVudDc3OTQ0NjY1Mg==,9599,2021-02-15T21:04:19Z,2021-02-15T21:04:19Z,OWNER,"... but it looks like `batch_size` is hard-coded to 100, rather than `None` - which means it's not being calculated using that value: https://github.com/simonw/sqlite-utils/blob/1f49f32814a942fa076cfe5f504d1621188097ed/sqlite_utils/db.py#L704 And https://github.com/simonw/sqlite-utils/blob/1f49f32814a942fa076cfe5f504d1621188097ed/sqlite_utils/db.py#L1877","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",688670158, https://github.com/simonw/sqlite-utils/issues/147#issuecomment-779448912,https://api.github.com/repos/simonw/sqlite-utils/issues/147,779448912,MDEyOklzc3VlQ29tbWVudDc3OTQ0ODkxMg==,9599,2021-02-15T21:09:50Z,2021-02-15T21:09:50Z,OWNER,"I fiddled around and replaced that line with `batch_size = SQLITE_MAX_VARS // num_columns` - which evaluated to `10416` for this particular file. That got me this: 40.71s user 1.81s system 98% cpu 43.081 total 43s is definitely better than 56s, but it's still not as big as the ~26.5s to ~3.5s improvement described by @simonwiles at the top of this issue. I wonder what I'm missing here.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",688670158,