github
html_url | issue_url | id | node_id | user | created_at | updated_at | author_association | body | reactions | issue | performed_via_github_app |
---|---|---|---|---|---|---|---|---|---|---|---|
https://github.com/simonw/sqlite-utils/issues/147#issuecomment-779409770 | https://api.github.com/repos/simonw/sqlite-utils/issues/147 | 779409770 | MDEyOklzc3VlQ29tbWVudDc3OTQwOTc3MA== | 9599 | 2021-02-15T19:23:11Z | 2021-02-15T19:23:11Z | OWNER | On my Mac right now I'm seeing a limit of 500,000: ``` % sqlite3 -cmd ".limits variable_number" variable_number 500000 ``` | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | 688670158 | |
https://github.com/simonw/sqlite-utils/issues/147#issuecomment-779416619 | https://api.github.com/repos/simonw/sqlite-utils/issues/147 | 779416619 | MDEyOklzc3VlQ29tbWVudDc3OTQxNjYxOQ== | 9599 | 2021-02-15T19:40:57Z | 2021-02-15T21:27:55Z | OWNER | Tried this experiment (not proper binary search, it only searches downwards): ```python import sqlite3 db = sqlite3.connect(":memory:") def tryit(n): sql = "select 1 where 1 in ({})".format(", ".join("?" for i in range(n))) db.execute(sql, [0 for i in range(n)]) def find_limit(min=0, max=5_000_000): value = max while True: print('Trying', value) try: tryit(value) return value except: value = value // 2 ``` Running `find_limit()` with those default parameters takes about 1.47s on my laptop: ``` In [9]: %timeit find_limit() Trying 5000000 Trying 2500000... 1.47 s ± 28 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` Interestingly the value it suggested was 156250 - suggesting that the macOS `sqlite3` binary with a 500,000 limit isn't the same as whatever my Python is using here. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | 688670158 | |
https://github.com/simonw/sqlite-utils/issues/147#issuecomment-779417723 | https://api.github.com/repos/simonw/sqlite-utils/issues/147 | 779417723 | MDEyOklzc3VlQ29tbWVudDc3OTQxNzcyMw== | 9599 | 2021-02-15T19:44:02Z | 2021-02-15T19:47:00Z | OWNER | `%timeit find_limit(max=1_000_000)` took 378ms on my laptop `%timeit find_limit(max=500_000)` took 197ms `%timeit find_limit(max=200_000)` reported 53ms per loop `%timeit find_limit(max=100_000)` reported 26.8ms per loop. All of these are still slow enough that I'm not comfortable running this search for every time the library is imported. Allowing users to opt-in to this as a performance enhancement might be better. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | 688670158 | |
https://github.com/simonw/sqlite-utils/issues/147#issuecomment-779445423 | https://api.github.com/repos/simonw/sqlite-utils/issues/147 | 779445423 | MDEyOklzc3VlQ29tbWVudDc3OTQ0NTQyMw== | 9599 | 2021-02-15T21:00:44Z | 2021-02-15T21:01:09Z | OWNER | I tried changing the hard-coded value from 999 to 156_250 and running `sqlite-utils insert` against a 500MB CSV file, with these results: ``` (sqlite-utils) sqlite-utils % time sqlite-utils insert slow-ethos.db ethos ../ethos-datasette/ethos.csv --no-headers [###################################-] 99% 00:00:00sqlite-utils insert slow-ethos.db ethos ../ethos-datasette/ethos.csv 44.74s user 7.61s system 92% cpu 56.601 total # Increased the setting here (sqlite-utils) sqlite-utils % time sqlite-utils insert fast-ethos.db ethos ../ethos-datasette/ethos.csv --no-headers [###################################-] 99% 00:00:00sqlite-utils insert fast-ethos.db ethos ../ethos-datasette/ethos.csv 39.40s user 5.15s system 96% cpu 46.320 total ``` Not as big a difference as I was expecting. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | 688670158 | |
https://github.com/simonw/sqlite-utils/issues/147#issuecomment-779446652 | https://api.github.com/repos/simonw/sqlite-utils/issues/147 | 779446652 | MDEyOklzc3VlQ29tbWVudDc3OTQ0NjY1Mg== | 9599 | 2021-02-15T21:04:19Z | 2021-02-15T21:04:19Z | OWNER | ... but it looks like `batch_size` is hard-coded to 100, rather than `None` - which means it's not being calculated using that value: https://github.com/simonw/sqlite-utils/blob/1f49f32814a942fa076cfe5f504d1621188097ed/sqlite_utils/db.py#L704 And https://github.com/simonw/sqlite-utils/blob/1f49f32814a942fa076cfe5f504d1621188097ed/sqlite_utils/db.py#L1877 | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | 688670158 | |
https://github.com/simonw/sqlite-utils/issues/147#issuecomment-779448912 | https://api.github.com/repos/simonw/sqlite-utils/issues/147 | 779448912 | MDEyOklzc3VlQ29tbWVudDc3OTQ0ODkxMg== | 9599 | 2021-02-15T21:09:50Z | 2021-02-15T21:09:50Z | OWNER | I fiddled around and replaced that line with `batch_size = SQLITE_MAX_VARS // num_columns` - which evaluated to `10416` for this particular file. That got me this: 40.71s user 1.81s system 98% cpu 43.081 total 43s is definitely better than 56s, but it's still not as big as the ~26.5s to ~3.5s improvement described by @simonwiles at the top of this issue. I wonder what I'm missing here. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | 688670158 | |
https://github.com/simonw/datasette/issues/1226#issuecomment-779467160 | https://api.github.com/repos/simonw/datasette/issues/1226 | 779467160 | MDEyOklzc3VlQ29tbWVudDc3OTQ2NzE2MA== | 9599 | 2021-02-15T22:01:53Z | 2021-02-15T22:01:53Z | OWNER | This check needs to happen in two places: https://github.com/simonw/datasette/blob/9603d893b9b72653895318c9104d754229fdb146/datasette/cli.py#L222-L227 https://github.com/simonw/datasette/blob/9603d893b9b72653895318c9104d754229fdb146/datasette/cli.py#L328-L333 | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | 808843401 | |
https://github.com/simonw/datasette/issues/1226#issuecomment-779467451 | https://api.github.com/repos/simonw/datasette/issues/1226 | 779467451 | MDEyOklzc3VlQ29tbWVudDc3OTQ2NzQ1MQ== | 9599 | 2021-02-15T22:02:46Z | 2021-02-15T22:02:46Z | OWNER | I'm OK with the current error message shown if you try to use too low a port: ``` datasette fivethirtyeight.db -p 800 INFO: Started server process [45511] INFO: Waiting for application startup. INFO: Application startup complete. ERROR: [Errno 13] error while attempting to bind on address ('127.0.0.1', 800): permission denied INFO: Waiting for application shutdown. INFO: Application shutdown complete. ``` | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | 808843401 |