Bug in card splitting logic

Continuing the discussion from Bugs in card splitting logic:

I found a bug where this is not working as desired. It seems like the case where this happens is when you paste more than 300 characters and have 2 or more line breaks. This is a very common case, for example, when you want to have a list of items split into that many number of cards:

:watermelon:Watermelon Ham, The Non-Linear Toolbox, and Your Desktop Pal - The Land of Random (substack.com) 1
https://stegriff.co.uk/upblog/web-pages-with-personality/ 1 and https://stegriff.co.uk/upblog/baby-griff/
https://tiv.today/2021/06/kinopio 2
https://design.futureland.tv/vin/futureland-design/82632?fullscreen=1

Or also:

  • Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
  • Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

The current logic seems to always try to split it into only 2 cards.

2 Likes

working on this now

1 Like

multi-paragraph should be fixed now

let me know if you find any other issues

1 Like

Here is a test case which I consider a bug:

1 In the beginning God created the heavens and the earth. 2 Now the earth was formless and empty, darkness was over the surface of the deep, and the Spirit of God was hovering over the waters.

3 And God said, “Let there be light,” and there was light. 4 God saw that the light was good, and he separated the light from the darkness. 5 God called the light “day,” and the darkness he called “night.” And there was evening, and there was morning—the first day.

This is two paragraphs. Both paragraphs individually fit into a single card. So I think the algorithm should prioritize keeping paragraphs together if possible. However, the algorithm splits this into 5 sentences, which is OK, but loses the paragraph information.

i would consider it a different interpretation, rather than a bug. Doing the ‘right’/smart thing here, may mean doing the ‘wrong’ thing in another case so I learn towards predictability in all cases

as a non-subject-matter expert, having 1, 2, 3, etc. as separate cards in this case seems a lot more readable though?

1 Like

What would be a case where the proposed algorithm would do the “wrong” thing?

Readability is important, but I think retaining information is more important. This is not specific to bible passages, but for any kind of writing: I have two paragraphs, and when I paste those into Kinopio, I want my paragraphs to be preserved because I treat those as units of information. The current algorithm throws that information away.

What I think is predictable is, “Keep paragraphs together as much as possible. If a paragraph is longer than 300 characters, start taking off sentences from the end until it fits.” Admittedly there are edge cases here which are ambiguous and we can talk about.

Also, what would make the current algorithm more palatable would be a way to combine cards, making it easier to put paragraphs back together. So, paint some cards, and if they all fit on one card, add a button to combine/join them together.

But I still think doing the smart/right thing is preferable :slight_smile: What is your idea of the “right” thing?

2 Likes

So when I brought in a numbered list, I was hoping it would treat it like the “Observed” above. But since my numbered list was formatted as:

  1. First item
  2. Second item
  3. Third item

It actually split into six cards “1.”, “First item”, “2.”, etc.

Knowing I can remove the periods and it would split it into three cards is nice to know…

1 Like

I think my proposed algorithm would handle this correctly because it would first split based on new lines, which I assume separate each bullet item. Then you wouldn’t need special logic to detected a numbered list either.

1 Like

i have a fix for this that i’ll ship near the end of the week (when i’m back from break)

it splits by paragraph, then splits a paragraph by sentences if the paragraph is too big, then splits into sentence by 150 chars if the sentence is too big.

here are the test cases and results

1. First item
2. Second item
3.Third item

Screen Shot 2021-08-23 at 8.29.35 PM

3 Likes

feel free to provide additional test cases in the meantime

2 Likes

Here is a test cases that currently fails (in my opinion):

Elegent pay duty spectacular price treat also price messy. Industry go space juicy, clean mountain the fast handling crystals. Zesty proven advertising and, aroma, rich. 

The, grab easily challenge full affordable burst absorbent, terrific bigger any our buy improved sleek. Boast inside however, makes gentle double. Special, screamin' you advertising any extravaganza high. 
  • Current behavior: option to split into 6 cards.
  • Desired behavior: option to split into 2 cards
  • Rationale: This is two paragraphs of text. Paragraphs are an atomic unit of thought. We shouldn’t break them up unless they are longer than the max characters. Doing so loses information. If a paragraph is too long, then the behavior is more ambiguous, and any of these seem okay:
    • break it apart into sentences
    • break it apart into sentences, but from the last one, keeping as many sentences together as possible.
    • if it’s only one sentence, break it up at the nearest word boundary.

I just realized, this is probably the same test case as the Genesis one above, and that you haven’t pushed the fix yet. So let’s see :slight_smile:

i haven’t pushed it yet, gonna do QA a bunch of new features tomorrow

with the new code, this creates two cards

2 Likes

new splitting logic shipped

3 Likes