issue_comments: 1008232075
This data as json
html_url | issue_url | id | node_id | user | created_at | updated_at | author_association | body | reactions | issue | performed_via_github_app |
---|---|---|---|---|---|---|---|---|---|---|---|
https://github.com/simonw/sqlite-utils/issues/369#issuecomment-1008232075 | https://api.github.com/repos/simonw/sqlite-utils/issues/369 | 1008232075 | IC_kwDOCGYnMM48GGaL | 9599 | 2022-01-09T05:13:15Z | 2022-01-09T05:13:56Z | OWNER | I think the query that will help solve this is: `explain query plan select * from ny_times_us_counties where state = 1 and county = 2` In this case, the query planner needs to decide if it should use the index for the `state` column or the index for the `county` column. That's where the statistics come into play. In particular: | tbl | idx | stat | |----------------------|---------------------------------|---------------| | ny_times_us_counties | idx_ny_times_us_counties_date | 2092871 2915 | | ny_times_us_counties | idx_ny_times_us_counties_fips | 2092871 651 | | ny_times_us_counties | idx_ny_times_us_counties_county | 2092871 1085 | | ny_times_us_counties | idx_ny_times_us_counties_state | 2092871 37373 | Those numbers are explained by this comment in the SQLite C code: https://github.com/sqlite/sqlite/blob/5622c7f97106314719740098cf0854e7eaa81802/src/analyze.c#L41-L55 ``` ** There is normally one row per index, with the index identified by the ** name in the idx column. The tbl column is the name of the table to ** which the index belongs. In each such row, the stat column will be ** a string consisting of a list of integers. The first integer in this ** list is the number of rows in the index. (This is the same as the ** number of rows in the table, except for partial indices.) The second ** integer is the average number of rows in the index that have the same ** value in the first column of the index. ``` So that table is telling us that using a value in the `county` column will filter down to an average of 1,085 rows, whereas filtering on the `state` column will filter down to an average of 37,373 - so clearly the `county` index is the better index to use here! Just one catch: against both my` covid.db` and my `covid-analyzed.db` databases the `county` index is picked for both of them - so SQLite is somehow guessing that `county` is a better index even though it doesn't have statistics for that. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | 1097091527 |