Bugs in card splitting logic

Hi, I’ve seen this bug for a while but finally reporting it :slight_smile:

If I paste this text (> 300 characters):

And God said, “Let the water under the sky be gathered to one place, and let dry ground appear.” And it was so. 10 God called the dry ground “land,” and the gathered waters he called “seas.” And God saw that it was good.

11 Then God said, “Let the land produce vegetation: seed-bearing plants and trees on the land that bear fruit with seed in it, according to their various kinds.” And it was so. 12 The land produced vegetation: plants bearing seed according to their kinds and trees bearing fruit with seed in it according to their kinds. And God saw that it was good. 13 And there was evening, and there was morning—the third day.

I get the option to split it into 2 cards.

When this happens, I get the 1st paragraph in one card, and part of the 2nd paragraph in the second. The second paragraph seems like it only has the first 300 characters—the rest has been truncated.

It seems like the logic that’s working here is, kinopio detects the line breaks and assumes we are splitting one paragraph per card. But it is not handling the case where a paragraph itself is > 300 characters.

Desired behavior: I think it makes sense to split first along line breaks. But if a paragraph is longer, then keep as much in one card as possible, but split at the closest sentence boundary that fits. So in this example, the result should be:

The first paragraph is a card. The second paragraph got broken up after the first sentence because the first and second were more than 300 characters.

hopefully that makes sense…

On a related note, when Kinopio splits a single block of text into sentences, it removes the periods at the ends of sentences. I feel like it shouldn’t remove any data. For example, pasting

11 Then God said, “Let the land produce vegetation: seed-bearing plants and trees on the land that bear fruit with seed in it, according to their various kinds.” And it was so. 12 The land produced vegetation: plants bearing seed according to their kinds and trees bearing fruit with seed in it according to their kinds. And God saw that it was good. 

yielded

periods missing.

thanks!

1 Like

If you did that then instead of three cards you’d have a lot more than that.

Here’s the logic I’m working towards, optimizing for fewer cards being split into:

  • if the paste is more than 300 characters, split the next card by paragraph
  • else if the paragraph is too big, then split the next card by sentence instead,
  • else if the sentence is too big than split by 300 chars

repeated over and over (recursion) until the last card is less than 300 characters

1 Like

Your logic here makes sense and is consistent with what I had in mind. So perhaps I wasn’t clear with what I was saying :slight_smile:

my second screenshot has 3 cards, and this is what I would expect with the logic I had in mind. so I’m pretty sure we’re on the same page ( ͡° ᴥ ͡°)

this is what i get with the new logic (which also now compensates for trailing ‘.’ in sentences). still have some QA to do before shipping

2 Likes

thank you, that lines up exactly with what I was thinking.

QQ: when I’ve seen this from pasting large text, I always assumed it cut off part of the text, which was then lost.

Am I reading this right that I can paste a bunch of text, split, and it’ll bring in everything across how every many cards it takes?

1 Like

Yes, when you see in the first screenshot in my first post, you see a message about Max Length, you can hit the Split into x cards button and it will do what you describe. Except, there was a bug where sometimes some of those cards would still truncate data. :slight_smile:

I should add, it has mostly worked and I’ve used it extensively since that feature was implemented :slight_smile:

2 Likes

To be able to also use this logic to properly split by sentences, it looks like I might need to update my logic to prefer splitting a card by sentence or paragraph based on whichever comes first.

This would mean though, that your snippet would be split into 6 cards. Sometimes it splits by paragraph, other times by sentence, whichever comes first. does that work?

alrighty, shipped. This took way longer than I thought it would :frowning: , probably because the new logic is now recursive and more conditional.

I tested this as well as I could, but there’s so many permutations possible with card names, so let me know if you see anything amiss.

2 Likes

I really appreciate it. So far so good. There are some tweaks I would like, but I will sit on that for a while after more testing/use. The big thing is, no more data truncation/loss that I’ve seen :wink:

1 Like

I found a bug where this is not working as desired. It seems like the case where this happens is when you paste more than 300 characters and have 2 or more line breaks. This is a very common case, for example, when you want to have a list of items split into that many number of cards:

:watermelon:Watermelon Ham, The Non-Linear Toolbox, and Your Desktop Pal - The Land of Random (substack.com) 1
https://stegriff.co.uk/upblog/web-pages-with-personality/ 1 and https://stegriff.co.uk/upblog/baby-griff/
https://tiv.today/2021/06/kinopio 2
https://design.futureland.tv/vin/futureland-design/82632?fullscreen=1

Or also:

  • Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
  • Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

The current logic seems to always try to split it into only 2 cards.

1 Like

Can you paste this into a new thread? That’ll make it easier for me to get to next week or a bit after

Thx!

1 Like