{"html_url": "https://github.com/simonw/sqlite-utils/issues/529#issuecomment-1592110694", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/529", "id": 1592110694, "node_id": "IC_kwDOCGYnMM5e5a5m", "user": {"value": 7908073, "label": "chapmanjacobd"}, "created_at": "2023-06-14T23:11:47Z", "updated_at": "2023-06-14T23:12:12Z", "author_association": "CONTRIBUTOR", "body": "sorry i was wrong. `sqlite-utils --raw-lines` works correctly\r\n\r\n```\r\nsqlite-utils --raw-lines :memory: \"SELECT * FROM (VALUES ('test'), ('line2'))\" | cat -A\r\ntest$\r\nline2$\r\n\r\nsqlite-utils --csv --no-headers :memory: \"SELECT * FROM (VALUES ('test'), ('line2'))\" | cat -A\r\ntest$\r\nline2$\r\n```\r\n\r\nI think this was fixed somewhat recently", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1581090327, "label": "Microsoft line endings"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/535#issuecomment-1592052320", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/535", "id": 1592052320, "node_id": "IC_kwDOCGYnMM5e5Mpg", "user": {"value": 7908073, "label": "chapmanjacobd"}, "created_at": "2023-06-14T22:05:28Z", "updated_at": "2023-06-14T22:05:28Z", "author_association": "CONTRIBUTOR", "body": "piping to `jq` is good enough usually", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1655860104, "label": "rows: --transpose or psql extended view-like functionality"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/555#issuecomment-1592047502", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/555", "id": 1592047502, "node_id": "IC_kwDOCGYnMM5e5LeO", "user": {"value": 7908073, "label": "chapmanjacobd"}, "created_at": "2023-06-14T22:00:10Z", "updated_at": "2023-06-14T22:01:57Z", "author_association": "CONTRIBUTOR", "body": "You may want to try doing a performance comparison between this and just selecting all the ids with few constraints and then doing the filtering within python.\r\n\r\nThat might seem like a lazy-programmer, inefficient way but queries with large resultsets are a different profile than what databases like SQLITE are designed for. That is not to say that SQLITE is slow or that python is always faster but when you start reading >20% of an index there is an equilibrium that is reached. Especially when adding in writing extra temp tables and stuff to memory/disk. And especially given the `NOT IN` style of query...\r\n\r\nYou may also try chunking like this:\r\n\r\n```py\r\ndef chunks(lst, n) -> Generator:\r\n for i in range(0, len(lst), n):\r\n yield lst[i : i + n]\r\n\r\nSQLITE_PARAM_LIMIT = 32765\r\n\r\ndata = []\r\nchunked = chunks(video_ids, consts.SQLITE_PARAM_LIMIT)\r\nfor ids in chunked:\r\n data.expand(\r\n list(\r\n db.query(\r\n f\"\"\"SELECT * from videos\r\n WHERE id in (\"\"\"\r\n + \",\".join([\"?\"] * len(ids))\r\n + \")\",\r\n (*ids,),\r\n )\r\n )\r\n )\r\n```\r\n\r\nbut that actually won't work with your `NOT IN` requirements. You need to query the full resultset to check any row.\r\n\r\nSince you are doing stuff with files/videos in SQLITE you might be interested in my side project: https://github.com/chapmanjacobd/library", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1733198948, "label": "Filter table by a large bunch of ids"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/557#issuecomment-1590531892", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/557", "id": 1590531892, "node_id": "IC_kwDOCGYnMM5ezZc0", "user": {"value": 7908073, "label": "chapmanjacobd"}, "created_at": "2023-06-14T06:09:21Z", "updated_at": "2023-06-14T06:09:21Z", "author_association": "CONTRIBUTOR", "body": "I put together a [simple script](https://github.com/chapmanjacobd/library/blob/42129c5ebe15f9d74653c0f5ca4ed0c991d383e0/xklb/scripts/dedupe_db.py) to upsert and remove duplicate rows based on business keys. If anyone has similar problems with above this might help\r\n\r\n```\r\nCREATE TABLE my_table (\r\n id INTEGER PRIMARY KEY,\r\n column1 TEXT,\r\n column2 TEXT,\r\n column3 TEXT\r\n);\r\n\r\nINSERT INTO my_table (column1, column2, column3)\r\nVALUES\r\n ('Value 1', 'Duplicate 1', 'Duplicate A'),\r\n ('Value 2', 'Duplicate 2', 'Duplicate B'),\r\n ('Value 3', 'Duplicate 2', 'Duplicate C'),\r\n ('Value 4', 'Duplicate 3', 'Duplicate D'),\r\n ('Value 5', 'Duplicate 3', 'Duplicate E'),\r\n ('Value 6', 'Duplicate 3', 'Duplicate F');\r\n```\r\n\r\n```\r\nlibrary dedupe-db test.db my_table --bk column2\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1740150327, "label": "Aliased ROWID option for tables created from alter=True commands"}, "performed_via_github_app": null}