github: issue_comments: 9 rows where issue = 707427200

9 rows where issue = 707427200

Search:

descending

id ▼	html_url	issue_url	node_id	user	created_at	updated_at	author_association	body	reactions	issue
697466497	https://github.com/simonw/sqlite-utils/issues/172#issuecomment-697466497	https://api.github.com/repos/simonw/sqlite-utils/issues/172	MDEyOklzc3VlQ29tbWVudDY5NzQ2NjQ5Nw==	simonw 9599	2020-09-23T14:41:17Z	2020-09-23T14:41:17Z	OWNER	Steps to produce that database: ``` curl -o salaries.csv 'https://data.sfgov.org/api/views/88g8-5mnd/rows.csv?accessType=DOWNLOAD' sqlite-utils insert salaries.db salaries salaries.csv --csv ```	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Improve performance of extract operations 707427200
697467833	https://github.com/simonw/sqlite-utils/issues/172#issuecomment-697467833	https://api.github.com/repos/simonw/sqlite-utils/issues/172	MDEyOklzc3VlQ29tbWVudDY5NzQ2NzgzMw==	simonw 9599	2020-09-23T14:42:03Z	2020-09-23T14:42:03Z	OWNER	Here's the loop that's taking the time: https://github.com/simonw/sqlite-utils/blob/1ebffe1dbeaed7311e5b61ed988f4cd701e84808/sqlite_utils/db.py#L892-L897	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Improve performance of extract operations 707427200
697473247	https://github.com/simonw/sqlite-utils/issues/172#issuecomment-697473247	https://api.github.com/repos/simonw/sqlite-utils/issues/172	MDEyOklzc3VlQ29tbWVudDY5NzQ3MzI0Nw==	simonw 9599	2020-09-23T14:45:13Z	2020-09-23T14:45:13Z	OWNER	`lookup_table.lookup(lookups)` is doing a SQL lookup. This could be cached in-memory, maybe with a LRU cache, to avoid looking up the primary key for records that we have recently used. The `.update()` method it is calling first does a `get()` and then does a SQL `UPDATE ... WHERE`: https://github.com/simonw/sqlite-utils/blob/1ebffe1dbeaed7311e5b61ed988f4cd701e84808/sqlite_utils/db.py#L1244-L1264 Batching those updates may have an effect. Or finding a way to skip the `.get()` since we already know we have a valid record.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Improve performance of extract operations 707427200
697835956	https://github.com/simonw/sqlite-utils/issues/172#issuecomment-697835956	https://api.github.com/repos/simonw/sqlite-utils/issues/172	MDEyOklzc3VlQ29tbWVudDY5NzgzNTk1Ng==	simonw 9599	2020-09-23T18:22:49Z	2020-09-23T18:22:49Z	OWNER	I ran `sudo py-spy top -p 123` against the process while it was running and the most time is definitely spent in `.update()`: ``` Total Samples 1000 GIL: 0.00%, Active: 90.00%, Threads: 1 %Own %Total OwnTime TotalTime Function (filename:line) 38.00% 38.00% 3.85s 3.85s update (sqlite_utils/db.py:1283) 27.00% 27.00% 2.12s 2.12s execute (sqlite_utils/db.py:161) 10.00% 10.00% 0.890s 0.890s execute (sqlite_utils/db.py:163) 10.00% 17.00% 0.870s 1.54s columns (sqlite_utils/db.py:553) 0.00% 0.00% 0.110s 0.210s <listcomp> (sqlite_utils/db.py:554) 0.00% 3.00% 0.100s 0.320s table_names (sqlite_utils/db.py:191) 0.00% 0.00% 0.100s 0.100s __new__ (<string>:1) ```	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Improve performance of extract operations 707427200
697859772	https://github.com/simonw/sqlite-utils/issues/172#issuecomment-697859772	https://api.github.com/repos/simonw/sqlite-utils/issues/172	MDEyOklzc3VlQ29tbWVudDY5Nzg1OTc3Mg==	simonw 9599	2020-09-23T18:38:43Z	2020-09-23T18:38:52Z	OWNER	I wonder if I could make this faster by separating it out into a few steps: - Create the new lookup table with all of the distinct rows - Add the blank foreign key column - run a `UPDATE table SET blah_id = (select id from lookup where thang = table.thang)` - Drop the value columns	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Improve performance of extract operations 707427200
697863116	https://github.com/simonw/sqlite-utils/issues/172#issuecomment-697863116	https://api.github.com/repos/simonw/sqlite-utils/issues/172	MDEyOklzc3VlQ29tbWVudDY5Nzg2MzExNg==	simonw 9599	2020-09-23T18:41:06Z	2020-09-23T18:41:06Z	OWNER	Problem with this approach is it's not compatible with progress bars - but if it's a multiple of times faster it's worth it.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Improve performance of extract operations 707427200
697866885	https://github.com/simonw/sqlite-utils/issues/172#issuecomment-697866885	https://api.github.com/repos/simonw/sqlite-utils/issues/172	MDEyOklzc3VlQ29tbWVudDY5Nzg2Njg4NQ==	simonw 9599	2020-09-23T18:43:37Z	2020-09-23T18:43:37Z	OWNER	Also what would happen if the table had new rows added to it while that command was running?	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Improve performance of extract operations 707427200
697869886	https://github.com/simonw/sqlite-utils/issues/172#issuecomment-697869886	https://api.github.com/repos/simonw/sqlite-utils/issues/172	MDEyOklzc3VlQ29tbWVudDY5Nzg2OTg4Ng==	simonw 9599	2020-09-23T18:45:30Z	2020-09-23T18:45:30Z	OWNER	There's something to be said for making this operation pausable and resumable, especially if I'm going to make it available in a Datasette plugin at some point.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Improve performance of extract operations 707427200
698178101	https://github.com/simonw/sqlite-utils/issues/172#issuecomment-698178101	https://api.github.com/repos/simonw/sqlite-utils/issues/172	MDEyOklzc3VlQ29tbWVudDY5ODE3ODEwMQ==	simonw 9599	2020-09-24T07:48:57Z	2020-09-24T07:49:20Z	OWNER	> I wonder if I could make this faster by separating it out into a few steps: > > * Create the new lookup table with all of the distinct rows > > * Add the blank foreign key column > > * run a `UPDATE table SET blah_id = (select id from lookup where thang = table.thang)` > > * Drop the value columns My prototype of this knocked the time down from 10 minutes to 4 seconds, so I think the change is worth it! ``` % date sqlite-utils extract salaries.db salaries \ 'Department Code' 'Department' \ --table 'departments' \ --fk-column 'department_id' \ --rename 'Department Code' code \ --rename 'Department' name date sqlite-utils extract salaries.db salaries \ 'Union Code' 'Union' \ --table 'unions' \ --fk-column 'union_id' \ --rename 'Union Code' code \ --rename 'Union' name date sqlite-utils extract salaries.db salaries \ 'Job Family Code' 'Job Family' \ --table 'job_families' \ --fk-column 'job_family_id' \ --rename 'Job Family Code' code \ --rename 'Job Family' name date sqlite-utils extract salaries.db salaries \ 'Job Code' 'Job' \ --table 'jobs' \ --fk-column 'job_id' \ --rename 'Job Code' code \ --rename 'Job' name date Thu Sep 24 00:48:16 PDT 2020 Thu Sep 24 00:48:20 PDT 2020 Thu Sep 24 00:48:24 PDT 2020 Thu Sep 24 00:48:28 PDT 2020 Thu Sep 24 00:48:32 PDT 2020 ```	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Improve performance of extract operations 707427200

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);