(Meta) TML Archiving and cleanup
Vareck Bostrom
(26 Sep 2024 22:02 UTC)
|
Re: [TML] (Meta) TML Archiving and cleanup
kaladorn@xxxxxx
(26 Sep 2024 23:32 UTC)
|
Re: [TML] (Meta) TML Archiving and cleanup Vareck Bostrom (26 Sep 2024 23:36 UTC)
|
Re: [TML] (Meta) TML Archiving and cleanup
kaladorn@xxxxxx
(27 Sep 2024 23:15 UTC)
|
Re: [TML] (Meta) TML Archiving and cleanup
Vareck Bostrom
(29 Sep 2024 22:53 UTC)
|
Re: [TML] (Meta) TML Archiving and cleanup
Charles McKnight
(30 Sep 2024 00:08 UTC)
|
Re: [TML] (Meta) TML Archiving and cleanup
Vareck Bostrom
(30 Sep 2024 00:11 UTC)
|
Re: [TML] (Meta) TML Archiving and cleanup
kaladorn@xxxxxx
(07 Oct 2024 03:47 UTC)
|
Re: [TML] (Meta) TML Archiving and cleanup
Phil Pugliese
(07 Oct 2024 04:19 UTC)
|
Re: [TML] (Meta) TML Archiving and cleanup
lotions.felines0x@xxxxxx
(07 Oct 2024 06:10 UTC)
|
Re: [TML] (Meta) TML Archiving and cleanup
Phil Pugliese
(07 Oct 2024 11:17 UTC)
|
Re: [TML] (Meta) TML Archiving and cleanup
kaladorn@xxxxxx
(07 Oct 2024 12:19 UTC)
|
Re: [TML] (Meta) TML Archiving and cleanup
Thomas Jones-Low
(07 Oct 2024 12:43 UTC)
|
Re: [TML] (Meta) TML Archiving and cleanup
kaladorn@xxxxxx
(07 Oct 2024 14:01 UTC)
|
Re: [TML] (Meta) TML Archiving and cleanup
Jeff Zeitlin
(07 Oct 2024 14:09 UTC)
|
Re: [TML] (Meta) TML Archiving and cleanup
lotions.felines0x@xxxxxx
(07 Oct 2024 15:30 UTC)
|
Re: [TML] (Meta) TML Archiving and cleanup
lotions.felines0x@xxxxxx
(07 Oct 2024 15:30 UTC)
|
Re: [TML] (Meta) TML Archiving and cleanup
Vareck Bostrom
(07 Oct 2024 15:33 UTC)
|
Re: [TML] (Meta) TML Archiving and cleanup
lotions.felines0x@xxxxxx
(07 Oct 2024 20:18 UTC)
|
Re: [TML] (Meta) TML Archiving and cleanup
Phil Pugliese
(07 Oct 2024 20:22 UTC)
|
Re: [TML] (Meta) TML Archiving and cleanup
Shannon Appelcline
(08 Oct 2024 06:21 UTC)
|
Re: [TML] (Meta) TML Archiving and cleanup
lotions.felines0x@xxxxxx
(08 Oct 2024 16:33 UTC)
|
Re: [TML] (Meta) TML Archiving and cleanup
Phil Pugliese
(08 Oct 2024 18:08 UTC)
|
Re: [TML] (Meta) TML Archiving and cleanup
Phil Pugliese
(08 Oct 2024 18:11 UTC)
|
Re: [TML] (Meta) TML Archiving and cleanup
Shannon Appelcline
(09 Oct 2024 02:05 UTC)
|
Re: [TML] (Meta) TML Archiving and cleanup
kaladorn@xxxxxx
(08 Oct 2024 20:44 UTC)
|
Re: [TML] (Meta) TML Archiving and cleanup
Brett Kruger
(30 Sep 2024 01:03 UTC)
|
I am using obsidian to read the markdown files as it is. Getting them imported from the TML archive and then converting them to markdown after cleaning up the extraneous information from the message text is what I'm doing now. If someone already has an Obsidian vault of the TML archive, that would be very helpful. On Thursday, September 26th, 2024 at 4:32 PM, kaladorn at gmail.com (via tml list) <xxxxxx@simplelists.com> wrote: > Are you looking to import to some tool such as Obsidian? (You could, if you are usiing Markdown) > > Is your idea to bring in other sources (one that seems interesting to have would be the HIWG archive)? I think it is in some similarly okay but cluttered mail format. > > Tom > > On Thu, Sep 26, 2024 at 6:03 PM Vareck Bostrom - vareck at proton.me (via tml list) <xxxxxx@simplelists.com> wrote: > > > I've recently discovered notebooklm (at google) and it has been interesting loading a large amount of source data and ask it questions about that data. > > > > I thought I'd try it with the TML list data. I've wanted to create a more usable archive of the data for a while, so I came up with a python script to walk through the simplelists URL and collect the text. For purposes of source material though, many of the messages are 'polluted' - much of the text is quoted from another message and therefore already present as a source and there are list control messages ('to unsubscribe...') and "Registered Trademark" and other lawyer repellant messages in some of the message footers. > > > > It turns out that GPT, called with an appropriate prompt through the api, does a pretty good job of cleaning that up. Quoted text blocks can identified and removed, lawyer-repellant, signature blocks, and so on can be removed, and just the content remains. gpt-4o-mini even does a pretty good job and it is dirt-cheap per token. I then format the messages as blocks of markdown text, which seems easily ingestible by notebooklm. Before I go too far in this though (even though gpt-4o-mini is inexpensive, it is some money), I thought I'd ask if someone has already done something like this - taken the entire archive and cleaned it up? > > > > > > ----- > > The Traveller Mailing List > > Archives at http://archives.simplelists.com/tml > > Report problems to xxxxxx@simplelists.com > > To unsubscribe from this list please go to > > https://https://www.simplelists.com/subs/ > > ----- > The Traveller Mailing List > Archives at http://archives.simplelists.com/tml > Report problems to xxxxxx@simplelists.com > To unsubscribe from this list please go to > https://www.simplelists.com/confirm/?u=CI5p2bgJsrtvV2ITHzff6D9FuVCUUjI6