Talk:Red–black tree
This is the talk page for discussing improvements to the Red–black tree article. This is not a forum for general discussion of the article's subject. |
Article policies
|
Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL |
This level-5 vital article is rated C-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | |||||||||||||||||||||||||||||||
|
Red–black tree received a peer review by Wikipedia editors, which is now archived. It may contain ideas you can use to improve this article. |
Properties
[edit]Property 1 is trivial. Property 2 prevents the children of the root node from being red-black trees, since their roots may need to be recolored black. This seems like an undesired property in a theoretical context. Property 3 is useful, but only because it allows us to say "black" instead of "black or NIL" or "not red." Properties 4 and 5 are the important invariants.
Given this, why not renumber the properties? 99.11.197.75 (talk) 00:42, 5 May 2011 (UTC)
NIL nodes yet again
[edit]The current description of "NIL" nodes is confusing. First, I would suggest that it is more precise to call a "NIL node" "the empty tree". Then one could say that NIL is a symbol for the empty tree. Second, the use of NIL nodes is not peculiar to red-black trees. In fact, many tree implementations use the empty tree such that every non-empty tree has two children, and one doesn't need separate cases for nodes with two children, one left child, one right child, or no children. However, it should be clear that this is an implementation detail, not a requirement. Calling these NIL nodes leaves is unnecessarily confusing, as a "leaf node" could also refer to a node with both children empty.
The only thing about NIL nodes that is special for red-black trees is that it is useful to consider them to be black. 99.11.197.75 (talk) 00:04, 5 May 2011 (UTC)
NIL nodes seem to be an unneeded complexity while treating red-black trees, just the set of properties should be somewhat adjusted:
- Properties 2 and 3 deleted
- Property 4 reversed: If a node is red, then it has a black parent.
- Property 5: Every path from a given node to any of its descendant nodes contains the same number of black nodes.
Of course black-height should be defined under terminology slightly differently than presently under property 5, e.g. 'the number of black nodes in any path from the root to a leaf'. --Jangell2 (talk) 10:23, 19 April 2016 (UTC)
- I also agree that NIL nodes introduce an unneeded complexity (even to RB-trees). The only thing which possibly is understood more easily is counting the black nodes in a path. But Ben Pfaff who does not have them defines the RB-tree as a binary tree with:
- No red node has a red child.
- Every simple path from a given node to one of its non-branching node descendants contains the same number of black nodes.
- Without the NIL nodes, especially the wording within the article should become shorter. –Nomen4Omen (talk) 13:52, 5 June 2023 (UTC)
- Agreed, "NIL" makes the article overly complex for no value. "A node whose both children are NIL" could be simplified to to "A node with no children", or "A node with one non-NIL and one NIL node" could be "a node with a single child". The double-negative is especially confusing. NefzenK (talk) — Preceding undated comment added 14:05, 31 October 2023 (UTC)
Insert description more complicated than necessary?
[edit]The following is from http://en.wikibooks.org/wiki/F_Sharp_Programming/Advanced_Data_Structures:
B(z) / \ R(x) d / \ a R(y) / \ b c || \/ B(z) R(y) B(x) / \ / \ / \ R(y) d => B(x) B(z) <= a R(y) / \ / \ / \ / \ R(x) c a b c d b R(z) / \ / \ a b c d /\ || B(x) / \ a R(z) / \ R(y) d / \ b c
These changes are followed by recursion if R(y)'s new parent is also red. The only other step is to recolor R(y) Black if it is the new root. 128.146.32.245 (talk) 00:08, 29 April 2011 (UTC)
Mistake in the formal proof?
[edit]I think the following sentence from the formal proof is wrong (problematic part highlighted):
As such it has two children both of which have a black-height of either bh() or bh()-1 (depending on whether is red or black).
Given the definition of the black-height bh (relevant part highlighted):
bh(v) = the number of black nodes (not counting v if it is black) from v to any leaf in the subtree (called the black-height).
I think the black-height of the children is bh(v') or bh(v') - 1 depending on whether these children are red or black, and not depending on v' color.
Also, the sentence may be interpreted as if both children have the same color, which is not necessarily the case (the children may have different colors).
Can someone confirm? These details do not invalidate the proof, but they make it a bit harder to follow. Olivier Teuliere (talk) 10:16, 21 March 2011 (UTC)
- I made that edit, and you are correct. I'm sorry about that, I've reverted it. 193.77.101.149 (talk) 09:44, 23 July 2011 (UTC)
Deletion error?
[edit]It cannot be correct to say that if N is the new root then the tree is balanced. In a three element tree where all elements are black deleting the root node produces a tree where the black height is different on either left or right unless the child element is coloured red in an additional step. Or have I missed something here? 79.69.42.27 (talk) 21:23, 17 July 2010 (UTC)
- The various deletion cases all refer to a node with at most one (non-leaf) child. The case of nodes with 2 children (like in your example) is dealt with in the first paragraph of the "Removal" section. Olivier Teuliere (talk) 09:56, 21 March 2011 (UTC)
rotate missing
[edit]for completeness, adn to be able to see what is going on at one point, there should be implementations of the rotation functions available. —Preceding unsigned comment added by 213.61.9.75 (talk) 09:20, 9 June 2010 (UTC)
tree
[edit]I would like to say thank you for this great article. Also, it might be good to link to http://webpages.ull.es/users/jriera/Docencia/AVL/AVL%20tree%20applet.htm. It's a java applet that demonstrates a red-black tree, as well as AVL and splay trees. --Anonymous CS Student
I think the tree in the image is actually invalid, since the left-left-left path and the left-right-right path only passes through two black nodes. Κσυπ Cyp 01:46, 30 Nov 2003 (UTC)
- Now right-right-right-... passes through 3 black nodes, the rest through 2. Swapping the colour of 15 and 17 should work. Nice tree, though... Κσυπ Cyp 01:41, 4 Dec 2003 (UTC)
- Oops, I'm a complete idiot. I was trying not to copy an existing image, but I guess I didn't understand red-black trees as well as I thought. Sorry for the problems.
- Derrick Coetzee 14:58, 4 Dec 2003 (UTC)
- I fixed the image, but I'm wondering if it could be misleading now, since no node now has a red and a black child (not counting nil children). I don't want to give any misimpressions.
- Derrick Coetzee 15:05, 4 Dec 2003 (UTC)
- The tree seems correct now. To have a node with both a red and a black child, it would be possible to remove node 6, and invert the colour of node 1, 8 and 11. And then if the tree looks too small, maybe add a red child for 15. Nice tree, in either case. Κσυπ Cyp 16:58, 4 Dec 2003 (UTC)
tree does not seem correct. The shorted path is not all black nodes.
The tree image is not correct since nodes 6, 22, and 27 are red even though they are also leaves. Red-Black trees don't allow for red leaf nodes. —Preceding unsigned comment added by 99.5.101.121 (talk) 06:08, 30 May 2009 (UTC)
About that whole nil-leaves discussion: Why do you need to remember that there are nil leaves? What does this actually do to change the definition? The example tree without the nil leaves would not violate the definition, and I cannot think of any others that would. Tjdw 19:43, 30 May 2004 (UTC)
- It would violate the requirement that all leaves are black. It may be a good idea to add this to the article. I think nil leaves aren't strictly necessary, but red-black trees are typically presented this way. Derrick Coetzee 22:18, 30 May 2004 (UTC)
- More importantly, it would violate the rule that each direct path must have the same number of black nodes. If one disregards the leaves, a black root with a black left child would pass the rule (the only path is two black nodes). Include the leaves and one can more clearly see that there's actually 3 paths, one with 2 black nodes, and two with 3 black nodes (including the leaves).
- The picture should be updated to include a black node that has a red and a black child, this would make it more clear why the leaves are important to include in the counting.
- It is possible though to have a correctly balanced red black tree without taking leaves into account, by replacing the counting rule by three simpler rules:
- 1) Any black node, except for the root node, must always have a sibling (ie, it cannot be an only child).
- 2) A red node that has a sibling always has exactly two black children.
- 3) A red node that has no sibling always has no children. John Hendrikx (talk) 13:10, 4 March 2010 (UTC)
This article leaves me confused as to how the behaviour and performance of a red-black tree differs from other binary trees. Could someone who understands the subject please add a note explaining why one might choose this structure over e.g. AVL trees? Alternatively, might a new topic like "binary tree implementations: advantages and disadvantages" be helpful?
- A comparison is a great idea; of course, we know their theoretical (asympototic worst-case) performances are all the same. I don't really know how they differ in practice. Maybe someone does? Derrick Coetzee 14:09, 27 Aug 2004 (UTC)
On another note, this page needs well-organized, detailed explanations of the operations and their performance. I'll do this eventually if no one else does. Derrick Coetzee 16:22, 27 Aug 2004 (UTC)
- And sure enough, I did, complete with pretty diagrams. It'd be really helpful if someone could fact-check me, though — this stuff is pretty complicated. Derrick Coetzee 07:41, 18 Sep 2004 (UTC)
I'm not sure if the description of the insertion operation is complete. Consider the following tree(. is a placeholder, ignore it):
0(B)
.\
.45(R)
After inserting the value 207, the algorithm yields:
207(B)
./
45(B)
.\
..0(R)
Which isn't a valid Red-Black tree(it violates property 4). --Ryan Stone
- You may find it easier to write your trees using s-expression syntax, like this:
- (0 black (45 red nil nil) nil)
- (0 black (45 red nil nil) (207 black nil nil))
- This insertion does fall into Case 2, but this tree does not actually occur, because newly inserted nodes are always initially coloured red. The actual tree produced would look like this:
- (0 black (45 red nil nil) (207 red nil nil))
- This satisfies property 4. If you think the text is unclear about this, please edit. Perhaps a diagram for Case 2 would help. Derrick Coetzee 22:29, 20 Sep 2004 (UTC)
- That tree isn't a binary search tree, is it? You have 0 at the root and 47 in the left subtree, which can't happen.
- No, it isn't; just the example they gave. Good catch. I'm sure they meant to put a negative number, though, just as you meant to put 45. Derrick Coetzee 07:34, 21 Sep 2004 (UTC)
- Ok, maybe I'm just applying the algorithm wrong, but here's what I get:
- Step 1: Start with: (0 black nil (45 red nil nil))
- Step 2: Insert value 207 into tree as in normal binary search tree: (0 black nil(45 red nil (207 red nil nil)))
- Step 3: Parent is red, uncle is black; apply case 4(rotate around parent): (0 black nil(207 red (45 red nil nil) nil))
- Step 4: Apply case 5(rotate around grandparent): (207 black (0 red nil (45 black nil nil)) nil)
- Ok, maybe I'm just applying the algorithm wrong, but here's what I get:
- --Ryan Stone
Ok, I think that I've found the algorithm that will handle the above case(and similar ones correctly):
Cases 1-3 are unchanged
Cases 4 and 5 only apply when the parent(P in the diagram) is a left child of the grandparent
Case 6: The parent(P) is red, the uncle(U) is black, the new node(N) is the left child of P and P is a right child of its parent(G). Perform a right rotation on P, and then apply case 7 to P.
Case 7: The parent(P) is red, the uncle(U) is black, the new node(N) is the right child of P and P is a right child of its parent(G). Perform a left rotation on G, paint G red and paint P black.
I've tried implementing this and it seems to work, but I'm hesitant to make any changes to the article until somebody who actual knows this stuff confirms that this should work.
--Ryan Stone
- It's true that when P is the right child of its parent, we're in a different, symmetrical situation, where N must be the opposite child of its parent, and rotations are performed in the opposite direction. I admit this isn't obvious. One way of dealing with this is to add the steps you have described. Another way is to say that when P is on the right side, right and left are reversed throughout steps 4 and 5; we are effectively acting on the mirrored tree. I've added a note to this effect — do you think this is a good solution? Derrick Coetzee 15:56, 21 Sep 2004 (UTC)
- A similar note was added for deletion, where left and right must be reversed if N is the right child instead of the left. Although I didn't put it in the article, reversing left and right throughout is always okay, because you're effectively acting on the mirrored red-black tree, which is a binary search tree when considered with the opposite order. Derrick Coetzee 16:03, 21 Sep 2004 (UTC)
I don't understand a couple of things about the deletion algorithm. The cases are supposed to handle the case deleting a black node with two black children, at least one of which is leaf. However, wouldn't property 4 be violated if a black node had one child who was a leaf and the other child was black but was not a leaf node?(in other words, wouldn't be simpler just to say the cases apply to a black node with two leaf nodes as children?)
My second question is about case 3 of the deletion algorithm. I can see that the algorithm will ensure that all paths through P will have one fewer black node. However, won't any path that does not pass through P be unaffected, and have one more black node than any path through P?
--Ryan Stone
- Ok, I see the problem. If you look closely at case 2a of the San Diego State University: CS 660, after colouring the sibling red, you colour the parent double black and then perform the same procedure on it. I think that this also answers my first question, as well.
- --Ryan Stone
- I've changed the article to note that the parent node becomes double-black after case 3 is applied. Am I correct when I say that a double-black node has one fewer black node along any paths running through it? Also, is my wording clear enough?--Ryan Stone 19:03, 22 Sep 2004 (UTC)
Case 3 still mentions double-black nodes. I'm not sure how to reword that to avoid using the term double-black; anyone else have any ideas?--Ryan Stone 16:13, 24 Sep 2004 (UTC)
I've removed the reference to double-black nodes in case 3 of deletion. I've also removed the statement in case 1 that it applies when the root node was deleted and replaced. Because the rebalancing algorithm is recursive, it's possible to fall into case 1 even if the root node was not deleted.--Ryan Stone 21:24, 29 Sep 2004 (UTC)
I've slightly reworded deletion case 3 to make it clearer why property 4 is still violated. I've made a change to the operations paragraph. Case 3 of the insertion rebalancing procedure and case 3 of the deletion rebalancing procedure are the only two recursive cases. Both only involve colour changes; not rotations, so we can say that the insertion and deletion operations require O(log n) colour changes and no more than 2 rotations. Because colour changes are so quick(in practice, merely changing the value of a variable), it gives a better picture of the speed of the insertion and deletion algorithms IMO.
Now, the article states that insertion and deletion require O(log n) space. I assume that's because of the stack space needed when we hit a recursive case. However, aren't both algorithms tail-recursive?(ie wouldn't it be possible to convert them to iterative implementations?) The algorithms are complex enough that I'm not sure if I'd like to make that change, but I think that it would be quite possible to do so.--Ryan Stone 16:03, 30 Sep 2004 (UTC)
- Ok, I understand now. Persistent implementations for Red-Black Trees require O(log n) space, but not the normal version.--Ryan Stone 14:50, 1 Oct 2004 (UTC)
- Hey, thanks for your changes. I believe they're all correct. Sorry that the statement about the persistent implementations was unclear. I was confused why the deletion algorithm didn't seem to be recursive before, so that helps. Thanks again. Derrick Coetzee 22:03, 1 Oct 2004 (UTC)
I don't understand the purpose of property #2, "All leaves are black", if you're going to use "nil" nodes for all leaves, and then say you don't even need to draw them. If you just got rid of property #2, and "nil" nodes, would it not work just as well?
(Yes, the nil nodes confused me, too. You started out by saying nodes hold data, and leaf nodes are black, and then show a diagram where the leaves appear to be red -- and then 3 paragraphs later say that the boxes that are shaped differently from the other nodes and don't hold any data are actually the leaves.)
- The terminology section does conflict with the use of nil leaves. Leaf nodes do not store data in this presentation. It is possible to eliminate nil leaves by removing rule 2 and changing rule 3 to "red nodes only have black children". Unfortunately, this introduces a large number of special cases in the later proofs, since we have to deal with each internal node having 1 or 2 children. By introducing nil leaves and colouring them black, we can now say there is a black child where before we'd have to say "black or no child". I would like to admit the simplest presentation possible of the main invariants, but not if it sacrifices the overall presentation too much. Thanks for your feedback though; I realise nil leaves are often confusing and will think about how to ameliorate this problem. Deco 12:15, 28 Dec 2004 (UTC)
Leaf nodes again
[edit]- Note that the use of the leaf-nodes in the articles conflict with the link in the terminology section.--Loading 12:11, 11 May 2005 (UTC)
Perhaps these "Nil" nodes code be done away with altogether? I think it'd be easier just to say "black or missing" (or "non-red") than to introduce a funny construct to make certain special cases disappear. It's also not that hard in actual code to do without them.--ghakko
- Well, it's easy to say "black or missing" but it's not so easy to draw black or missing in the diagrams. I copied the nil leaf presentation from an online source, but it could be presented either way, it would just be a lot of work to change over and I don't see a particularly compelling reason not to do it this way. Deco 19:42, 14 May 2005 (UTC)
- A very important information is missing: Red Black Trees are isomorphics to 2-3-4 Trees (B-Trees of order 4) (http://pedia.nodeworks.com/2/2-/2-3/2-3-4_trees/). 22:15, 28 May 2005 (UTC)
I think, we should add an explanation to the Delete case about the possibility that N (the deleted node's child) may be a "Nil" node. In order that the tree remains well defined, N must remain a leaf node even after all transformations. You can verify (for example from the images) that it will, but I think it is not obvious. Should I add the explanation? Another thing is that maybe the text would be simplified if we used consistently "P" instead of "N's parent" throughout the text. What do you think? Drak 22:58, 6 January 2006 (UTC)
- Go for it. I'll review your changes. Thanks for looking over this. Deco 00:39, 7 January 2006 (UTC)
The diagrams in the CASE are flatout WRONG. If your inserting into a Btree, then N is RED and with two Black Nils.... Not trees. There is no trees attached to new intertions, at least not before rotations.
This thing is more confusing then helpful. If I had time I'd scrap it and rewrite it completely. 07:00, 6 April 2015 (UTC)07:00, 6 April 2015 (UTC)07:00, 6 April 2015 (UTC)07:00, 6 April 2015 (UTC)07:00, 6 April 2015 (UTC)07:00, 6 April 2015 (UTC)~~ — Preceding unsigned comment added by 96.57.23.82 (talk)
Deleting a red node
[edit]In the early paragraphs of the Deletion section, you state "If we are deleting a red node, we can simply replace it with its black child, if any." This is indeed the case if there is at most one child. If this red node has two children, however, the algorithm becomes a lot more complicated, unless you simply replace the red node by one of the children and reinsert every element of the other child's subtree. Schellhammer
- You can assume that the node has at most one child. The reason for this is explained in the previous paragraph. I'm sorry that it wasn't clear. The idea is that you delete the node exactly as you would delete a node in a BST. To do this, you must find a node that immediately precedes or follows the node to be deleted in inorder traversal order. We need to copy its value into the node to be deleted, then delete the node it was copied from. Such a node will always have zero or one children, which simplifies the remaining cases. Deco 23:52, 28 July 2005 (UTC)
- Okay, got you now. Thanks!
Something else. I was just wondering: in deleting a node with two children, is there's a simple way to determine if replacing the value by the largest preceeding or the smallest following value is the faster alternative? Schellhammer- The simplest way I can think of is to add two values to each node indicating the number of steps needed to reach the smallest and largest node in that subtree. These are easy to maintain in log time and allow you to choose optimally. This requires quite a bit of space and time to maintain though; the most effective strategy is probably just to choose one at random. Deco 02:10, 3 August 2005 (UTC)
- Okay, got you now. Thanks!
I think I encountered a problem in the deletion algorithm yet again connected with the leaf-nodes. You state: "If the node has no non-leaf children, we simply remove it." However, if the node is black and has only leaf children (which are also black), we reduce the black height in this branch if we simply remove the node. Consider the tree built by inserting the integers 1 to 4 and then delete the integer 1. Here's a nice applet which can be used to see the restructuring following the deletion. (It is German, but if you consider the translations Anfang=start Zurück=back Einfügen=insert Weiter=proceed and Löschen=delete (mark node to delete after hitting the button) it should be easy to handle.) Schellhammer
- Thanks for catching this. I'm afraid I've forgotten far too much about red-black trees. I believe we proceed through all the cases whether or not the child is a leaf node or non-leaf node, and they all work out if the child is simply a leaf node since they don't assume that N has children. Deco 02:10, 3 August 2005 (UTC)
- For what it's worth, to avoid further correctness issues I'm going to try to come up with some ML code that precisely mirrors the discussion. I should be able to test it and inject it into the article then to provide some concreteness. Deco 02:52, 3 August 2005 (UTC)
- I've just finished implementing the red-black tree as Visual-Basic classes (don't ask why!) and I took the deletion algorithm from this wiki-page. Empiric evidence (i.e. deleting any one of 1000 nodes in the tree) shows that the algorithm now is correct. Schellhammer 13:16, 3 August 2005 (GMT+1)
- Thanks for your feedback, that's reassuring. I've got some C code for insertion and as soon as I figure out how the article formatting will work, I'll add a bit of code to each case. Then I'll do the same for deletion. This should help clarify things further. Out of curiosity, does your code use parent pointers or do you get around this somehow? The description in this article seems to almost require it. Deco 19:19, 3 August 2005 (UTC)
- Yes, my code uses parent pointers as well. Actually, I don't know how you could do without, seeing that Insertion Case 3 has a recursive call upwards. By the way, the note in the diagram for this case should read "May violate property 2". Schellhammer 11:13, 4 August 2005 (UTC)
- It's possible to do without parent pointers by keeping a linked list during downward traversal; push a pointer to the current node onto the list just before descending into a child. The list elements can allocated on the stack. My C implementation was like this because I was being silly and trying to save space.
- The only other way I can think of handling Insertion Case 3 without parent pointers by having all the insertion functions return boolean values; this would let the caller check if the rebalancing has propagated upwards and continue it if necessary. Deletion is like this, but much messier (because of the need to swap two elements just before starting). I had to do it this way in Haskell (which has no pointers), and it was very painful.
- —Ghakko 14:30, 4 August 2005 (UTC)
- Yes, my code uses parent pointers as well. Actually, I don't know how you could do without, seeing that Insertion Case 3 has a recursive call upwards. By the way, the note in the diagram for this case should read "May violate property 2". Schellhammer 11:13, 4 August 2005 (UTC)
- Thanks for your feedback, that's reassuring. I've got some C code for insertion and as soon as I figure out how the article formatting will work, I'll add a bit of code to each case. Then I'll do the same for deletion. This should help clarify things further. Out of curiosity, does your code use parent pointers or do you get around this somehow? The description in this article seems to almost require it. Deco 19:19, 3 August 2005 (UTC)
- I've just finished implementing the red-black tree as Visual-Basic classes (don't ask why!) and I took the deletion algorithm from this wiki-page. Empiric evidence (i.e. deleting any one of 1000 nodes in the tree) shows that the algorithm now is correct. Schellhammer 13:16, 3 August 2005 (GMT+1)
The code
[edit]I've now added some C code to this page. All this code has been compiled and thoroughly tested, not just against crashes but that the red-black properties are preserved after every single insert/delete operation. I tried to keep the code as simple and readable as possible, and in line with what the text and diagrams describe to the letter. I hope it adds some concreteness to the article. I chose C because it's good for this presentation, since it deals well with trees with parent pointers (cyclic data structures are kinda annoying in functional languages) and is familiar to most programmers. Deco 07:32, 4 August 2005 (UTC)
- Nice job! That'll help a lot in future implementations. However, in the Deletion part, you state "the insertion code works with either representation", i.e. designing leaf-nodes as node objects or NULL pointers. I think you rather meant the insertion algorithm, since you use constructions like
child->color
in thedelete_one_child
function andchild
may be a leaf if the originaln
(the one we deleted) had no non-leaf children. Similarly, calling any of thedelete_caseX
functions with a NULL pointer as parameter will cause problems. The code probably still works if you'd add the (admittedly irritating)if
-statements everywhere to catch this case. Schellhammer 11:32, 4 August 2005 (UTC)- Sorry, I meant the code in the insertion section. There was only one place in which a leaf could be encountered during insertion, but deletion turned out to be much trickier, because it may be that the parameter n is a leaf, which prevents me from accessing any of the surrounding nodes if it's represented with NULL. I would have to additionally pass in at least its parent, which requires updating the parent as we perform rotations, and it's just a mess. Deco 17:30, 4 August 2005 (UTC)
I find it confusing that the insertion code is using NULL-pointer as leaves, but the removal code suddenly implies that you use real nodes as leaves. Perhaps it should be either both null pointer, or both real nodes. 21:53, 12. March 2010 —Preceding unsigned comment added by 85.177.97.53 (talk) 20:53, 12 March 2010 (UTC)
Errors in algorithm
[edit](copied from an e-mail sent to Deco by Pavel)
case 3. if G was a root, you violate rule 2.
- Also, if its parent is red, it violates rule 4. This is mentioned in the text and the code, and used to be in the image, but I wanted to keep the image easy to update. I fixed the text to mention that it might violate rule 2 also, and to be clearer that we always recursively process G.
case 5. path from leaf 4 and 5 contains 2 black nodes, path from other leafs contains 1 black nodes, that violates rule 5.
- The triangles are not meant to represent leaves but subtrees. Some subtrees have black circles at the top to indicate that their root must be black. The rotation does not alter the number of black nodes along any path (it is the same before and after), and so does not threaten rule 5. The reason rule 5 is not violated initially is that subtrees 4 and 5 are not as deep as subtrees 1, 2, and 3; in fact, the uncle may even be a leaf itself.
- If there's any way I can clarify this further please ask. Thanks again. Deco 14:16, 18 August 2005 (UTC)
case 5 IS violated. The the "leaf" are subtrees then it is even worse because then the lack of balance is potentially greater. As for the nulls idictating trees and not nulls, that needs to be better explained but is not what the page says. I would alter this, but I am frankly unqualified.
Also, FWIW, why G flips color is not explained. If you think that is understandable, it is not. — Preceding unsigned comment added by 96.57.23.82 (talk) 06:43, 6 April 2015 (UTC)
Top-Down Insertion
[edit]Can someone change this to the more effecient single pass top-down insertion method?
- It sounds interesting. Can you give a reference? However, I am not completely sure if it should replace the algorithm explained in the article. One of the objectives of this article is to explain the well-established algorithm for red-black trees, and I think the current article does it very well (thank you, authors!). Of course a note about a more efficient algorithm would be helpful.
- Here's a reference on the top-down insertion: [1]. The notes paraphrase Data Structures & Problem Solving Using Java by Mark Allen Weiss.
- The top-down approach to insertion works by preventing insertion case 3. During the tree descent, any black node with two red children is recolored to red and its children recolored to black (like in case 3). This preserves property 5 but may violate property 4. When property 4 is violated, it is guaranteed that the uncle of the new red node is black, since the grandparent node would have been recolored during the descent if both of its children were black. (See the reference if this is confusing.) As a result, one or two rotations (like in case 5) can be applied to restore property 4. The search then continues.
- Once the leaf is ready to be inserted, insertion case 3 is impossible. This prevents the need to recursively ascend back up the tree. This makes the entire operation top-down. Rotations may still be required to preserve properties 4 and 5 after inserting the leaf.
- Top-down insertion seems to be less efficient than bottom-up insertion. More rotations are required since each insertion can require multiple sets of rotations. Rather than letting the leaf node "push up" on the tree to find the proper rotation point, the top-down approach rotates at every such rotation point along the path on the way down. I will try to make a diagram to explain this better.
- I wrote a C program to test my hypothesis that the top-down insertion is less efficient, and it appears to be correct. A top-down insertion requires more rotations for the same data set. However the top-down algorithm does work and it may make sense to publish it anyway. Cintrom 22:39, 5 May 2006 (UTC)
- Thank you, Cintrom! I see that even if the top-down algorithm is less efficient than the bottom-up one, it is sometimes useful for educational purposes because of its simplicity. In addition, it is probably good to keep in mind that there is also a top-down algorithm for red-black trees, because it may come in handy when you derive a new algorithm from red-black trees.
Error in Insertion Case 4?
[edit]According to the rules of a R-B tree, "both children of a red node are black". In the Case 4 example, N and P are both red after insertion.
- Case 4 does not complete the insertion, as you can see from the sample code - the node P is then processed using Case 5. The text says, "then, we deal with the former parent node P using Case 5". Deco 17:33, 17 October 2006 (UTC)
In case 4 the code that calls insert_case5 is indented as if there were a line with an else missing, but from the narrative, I think the code is right, but the indention is misleading. Also, is it too obvious to say in the last sentence that property 4 is still violated? I found the parenthesized comment about the sub-tree labelled "1" confusing, so I tried to touch it up. The other two things I left alone. Bill Smith 23:35, 24 November 2006 (UTC)
- I implemented the insertion algorithm in OCaml and verified that the indention was wrong, so I corrected it.Bill Smith 17:11, 26 November 2006 (UTC)
- The misleading indention was probably a result of tabs. There is no else. Your clarifications are welcome. Deco 14:42, 27 November 2006 (UTC)
The code
[edit]Is the C code correct? For example:
node grandparent(node n) { return n->parent->parent; }
I don't think that's correct: shouldn't dots (.) instead of arrows (->) for accessing members in a struct that isn't referenced by a pointer? And isn't the parent member a pointer to the parent struct? If it is, then the function definition should be:
node* grandparent(node n)
Tell me if I'm wrong.
--wj32 talk | contribs 07:22, 8 November 2006 (UTC)
- You are correct that the arrow operator is used with pointers. I just assumed that somewhere in the code that isn't shown there was "
typedef struct rbtreenode * node;
". This explains the lack of thestruct
keyword, as well. Since the article is about the data structure rather than the C language, I don't think that noting C language syntax is strictly necessary. If we were going to present a play-by-play on how to implement a red/black tree in C, we'd have to note that code is required to keep track of which node is the root node, maybe reccomend the definition of a structure to hold the root and a comparator function pointer, etc. - Perhaps the wording should be changed to something like "We'll demonstrate each case with example C-like code"? Conor H. 06:54, 23 November 2006 (UTC)
- This is actual C code copied from a real working C program. That was how I verified its correctness. Yes, node is a pointer type. I suppose my stylistic conventions are a bit unusual. Deco 05:53, 28 November 2006 (UTC)
I think that the assertion that deletions are O(1) complexity must be incorrect. If the deleted node has two subtrees, then it must find the rightmost node of the left subtree or leftmost node of the right subtree. This find is admittedly quick since it is just following pointers, but nevertheless could be the height of the tree, so it is O(log(n)). Any comments? Caviare 01:08, 25 January 2007 (UTC)
- Yes, it should be log n... just check other reference sources and that's what they'll say. enochlau (talk) 02:47, 25 January 2007 (UTC)
npov
[edit]A- This article isn't neutral
B- It uses "we" an unnecessary number of times.
- You need to actually provide instances of NPOV. You can't slap a template up there and expect us to guess at what you're talking about. I'm removing the template until you do so. TheWarlock 01:37, 9 May 2007 (UTC)
- Oh, and I forgot to mention: sign your posts. Put four tildes after your post so we know who you are. TheWarlock 01:38, 9 May 2007 (UTC)
- You need to actually provide instances of NPOV. You can't slap a template up there and expect us to guess at what you're talking about. I'm removing the template until you do so. TheWarlock 01:37, 9 May 2007 (UTC)
- Should it still be changed to use a non-we structure? j.engelh 08:20, 11 May 2007 (UTC)
- It is preferable to have it use "we" than for it to be unintelligible but use the third-person out of grammatical tradition. If you can rephrase out the "we" and still have the article at least as understandable as it is now, go for it. TheWarlock 01:01, 12 May 2007 (UTC)
- Should it still be changed to use a non-we structure? j.engelh 08:20, 11 May 2007 (UTC)
- The use of "we" has nothing to do with NPOV. If you don't like the style feel free to fix it, just make it readable. Dcoetzee 20:06, 11 May 2007 (UTC)
- Just out of curiosity, what does "neutral" even mean when you're talking about a computer data structure? Whether the math is correct? I suppose you could argue over whether it or some other structure is best used in <something> - but that would necessarily mean an example, which either means it's citing someone else saying something, or it's OR, which is bad on the grounds that it's OR. Now, maybe this article did have OR in it - the article itself reads like a chapter out of "Data Structures and Algorithms" by Weiss.. — Preceding unsigned comment added by Jimw338 (talk • contribs) 03:19, 27 March 2012 (UTC)
Mistake in proof of correctness in Step 3?
[edit]The article currently says
Note that this [call to insert_case1] is the only recursive call, and it occurs prior to any rotations, which proves that a constant number of rotations occur.
This is far from evident. I'm not saying it's false that a constant number of rotations occur, only that this can't properly be called proof. It's easy to imagine a recursive function that makes its recursive call before doing some chunk of work (like a rotation) and yields a linear number of chunks of work done. For example, consider
postorder_tree_print(node n) { postorder_tree_print(n->left); postorder_tree_print(n->right); print_node(n); }
The only recursive calls occur before print_node, but calling postorder_tree_print on the root of a tree results in a number of calls to print_node equal to the number of nodes in the tree, which is not a constant.
Like Heawood reading Kempe's four-color proof, I'm smart enough to see the mistake but I'm not smart enough (or at least not interested enough) to fix it.
Pqrstuv 23:13, 1 July 2007 (UTC)
- The fact that it's the only recursive call, and that it returns immediately after calling it, is what makes this evident. Apologies for not being clearer - I'll attempt to reword it. Dcoetzee 00:31, 19 January 2008 (UTC)
Removed from article
[edit]- Every simple path from a node to a descendant leaf contains the same number of black nodes, either counting or not counting the null black nodes. (Counting or not counting the null black nodes does not affect the structure as long as the choice is used consistently.). (Someone needs to fix this wording, since it is not true in its present form. It could be interpreted to mean that a path from node A to a descendant leaf would have the same number of black nodes as a path from node B to a descendent leaf. It must be made clear that the "from a node" is fixed but the "descendant leaf" is allowed to vary.)
The section from (Someone needs... has been removed because it's not part of the article, it's a comment on the article. Text moved here so that someone who knows about the subject can hopefully fix it. -- Roleplayer (talk) 11:49, 18 January 2008 (UTC)
Possible insert case 5 error
[edit]Insertion, case 5 says:
"In this case, a right rotation on the parent P is performed; the result is a tree where the former parent P is now the parent of both the new node N and the former grandparent G."
It should say
"...a right rotation on the GRANDparent G is performed;..."
Most implementations define the rotation functions RotateRight(Node* n) or RotateLeft(Node* n) such that n is the root of the subtree being rotated, rather than one of the children. The code to implement the line in question would then be:
RotateRight(g);
rather than
RotateRight(p);
This could be just a terminology issue, but it will confuse people when they read source code and see irreconcilable differences with what is listed here.
Iterative implementation
[edit]The current article contains the iterative version (the version without using recursion) of implemenation of the removal operation. However, I doubt its usefulness because of the following reasons. Before removing the code, I would like to know what others think.
- I do not think a long code like this fits well in the Wikipedia. The aim of the article is not to build a ready-to-use library, but to explain clearly what a red-black tree is, including how it works and how it is used. The article does its job well without this code.
- It contains a small bug. In the lines
the assignment to
/* delete_case3(n): */ //s = sibling(n);//not needed
s
is needed, as is explained in the text of case 2. I hesitate to correct just this because I am not sure about the correctness of the other part. - IMHO this code is hard to understand because almost everything is written inside the single
for(;;)
loop. Cases 4–6 (and possibly also case 2) should be moved out of the loop for better readability.
In addition, I do not see any reasons to include the iterative implementation only for the removal operation and not for the insertion operation. If the current code ever helps understanding of how removal works, then I think the iterative implementation of insertion would also be helpful. --fcp2007 (talk) 14:31, 4 May 2008 (UTC)
- I personally am using the iterative implementation in benchmarks vs treaps. I find it very useful. It's kind of funny... I only read this discussion because I wanted to post about the bug described above! The same bug exists in case 6. Smilindog2000 (talk) 14:48, 21 May 2008 (UTC)
- I agree about removing the iterative implementation personally. Nobody should be using code from Wikipedia articles, they're not appropriately licensed - the code is only there to explain the concepts. Dcoetzee 02:07, 25 December 2008 (UTC)
C code vs. pseudo-code
[edit]I find that pseudo-code would be far more appropriate for this article. After all, the article is about a data structure and the theory behind implementing it. A reference implementation should be hosted elsewhere, not inside an encyclopedia. I would recommend replacing the C code with pseudo-code, however I didn't want to do this myself until certain everybody agrees. -- Jerome Baum —Preceding unsigned comment added by 77.176.73.67 (talk) 01:19, 15 December 2008 (UTC)
- There used to be pseudocode. I replaced it with C code fragments originally because of a series of spurious "fixes" that made the cases incorrect; it was the only way I could verify with any certainty that the code was valid, since with runnable code these fixes could be tested. However, nowadays I've moved more towards using pseudocode and the code block templates to control spurious fixes. I think the article should also give more attention to the diversity of different methods for implementing the operations, and to various choices that can be made for the properties. Dcoetzee 01:50, 25 December 2008 (UTC)
citation for Rudolf Bayer
[edit]Paper title: Symmetric binary B-Trees: Data structure and maintenance algorithms there is a pdf available here for 34 bucks. http://www.springerlink.com/content/qh51m2014673513j/ —Preceding unsigned comment added by 140.177.205.91 (talk) 20:55, 14 August 2009 (UTC)
"Simple path"?
[edit]The fifth listed property under "Properties" gave me long pause. Red-black trees do not contain cycles. Qualifying "paths" with "simple" suggests otherwise, and merely confused me. Is there any reason "simple" should not be deleted? Strebe (talk) 20:42, 12 April 2010 (UTC)
- I think the intended meaning is that the path does not repeat edges. For example, it doesn't go from the root down to the leaf, then back up, then back down to some other leaf (this is normally called a walk). In the context of binary trees I think this clarification is unnecessary. Dcoetzee 21:31, 14 April 2010 (UTC)
Is insertion case 4 mandatory? why?
[edit]I am very new to this subject, but with careful study, I am able to follow this article. However I found insertion case 4 to be quite confusing. It took a little too much pondering and a little guessing to get to where I think I understand it. A couple points:
1) Unlike all the other cases, this case doesn't appear to ever complete any insertions. This is kinda confusing in itself. So I suggest that it is explained that this step does not solve a particular case, but rather converts the case, so it's the same as case 5. (I would do this edit myself if I was sure I'm right about what's going on.)
2) The language makes this step sound optional. ie it says that the rotation "can be performed" not "must be performed" or "is performed". This coupled with my confusion as to why this is being done in the first place makes for a confused me. So I suggest that this step be made explicitly mandatory (assuming I'm right and it is.)
I hope the above feedback enables someone to make this article even more clear.
Thank you, - JasonWoof (talk) 02:12, 19 April 2010 (UTC)
Removal:case 3
[edit]>> Case 3: P, S, and S's children are black
We remove N and all N's childs are leafs, so if P ans S are black, S's children should be leafs too(becouse we must have the same number of black nodes on every path).
On the picture we see them as nodes that have children... —Preceding unsigned comment added by 91.103.66.4 (talk) 16:32, 18 May 2010 (UTC)
Incomplete case 5
[edit]When inserting 3, 2, 1 in order you get the following tree:
3[black] (child: 2[red] (left: 1[red]))
And then you run into case 5, which attempts to rotate using 1's grand parent (3) as pivot, which fails, because 3 has no parent. In this case, 2 should have been the pivot. So the wording instead of:
- In this case, a right rotation on the parent of P is performed;
should be
- In this case, a right rotation on the parent of P, G is performed, if G is not the root node. In that case, a rotation on P is performed.
And the code accordingly
rotate_right( g->parent ? g : n->parent ); rotate_left( g->parent ? g : n->parent );
Also it should be clarified that the "rotate_"-functions use the Pivot as the argument (which is the explained in Tree rotation)
--77.8.172.59 (talk) 02:59, 20 May 2010 (UTC)
You're right about the pivot being the argument, but your suggestion for a fix is wrong. I don't know why, but I've tried with and without your suggestion, and your suggestion causes problems. ;-) ArlenCuss (talk) 02:59, 20 April 2011 (UTC)
Insertion Case 3
[edit]According to the wiki:
Case 3: If both the parent P and the uncle U are red, then both nodes can be repainted black and the grandparent G becomes red (to maintain Property 5 (All paths from any given node to its leaf nodes contain the same number of black nodes)).
Shouldn't it be to maintain Property 4? As Property 5 isn't violated by the addition of a red node.
Edit: After proper analysis I think both Properties should be mentioned. Property 4 for the reason why P and U are turned black and Property 5 for the reason why G turns red. RevoltPT (talk) 15:04, 11 June 2010 (UTC)
C++ reference
[edit]The "often" in "In the C++ Standard Template Library, the containers std::set<Value> and std::map<Key,Value> are often based on red-black trees" does not make sense to me...134.58.42.46 (talk) 15:34, 22 December 2010 (UTC)
- Most (possibly all) implementations of these STL containers use red-black trees, but any data structure that fulfills the behavioral requirements could be used. Hence, “often”. Strebe (talk) 19:04, 22 December 2010 (UTC)
- I'll change it to "typically". (Prevents giving the impression that any red-black implementation might actually turn out to be based on a different data structure on Wednesdays and bank holidays.) Card Zero (talk) 08:26, 7 February 2011 (UTC)
Possible Error in "Removal" Section
[edit]We're referring to this paragraph:
"The complex case is when both M and C are black. (This can only occur when deleting a black node which has two leaf children, ..... (S cannot be a leaf because if N is black, which we presumed, then P's one subtree which includes N counts two black-height and thus P's other subtree which includes S must also count two black-height, which cannot be the case if S is a leaf node)."
Our comments: Since N, which used to be called C, is a leaf, P's one subtree which includes N can not count two black heights as stated in the original article.
Also note that this changes a lot of the cases in the Removal procedure. For example, any figure that shows N having subtrees hanging from it is incorrect.
In fact, in case M and C are both black, perhaps the way to do Removal is just to replace M with C, thereby messing up the black height. This is then to be followed by some form of black height rebalancing which we haven't thought out completely.
Vitit Kantabutra, Idaho State University, Pocatello, ID, U.S.A. Vkantabu (talk) 20:12, 3 November 2011 (UTC)
"All leaves are the same color of the root"
[edit]Hi,
In the section Red-black_tree#Properties, it's said in point 3 number that "All leaves are the same color of the root". However, on the red black tree example picture, there are 3 leaves which are red. What did I miss? :)
Regards, J 195.83.155.55 (talk) 03:31, 28 May 2012 (UTC)
- Nodes are circles; leaves are boxes. Glrx (talk) 18:03, 30 May 2012 (UTC)
Request move
[edit]- The following discussion is an archived discussion of a requested move. Please do not modify it. Subsequent comments should be made in a new section on the talk page. Editors desiring to contest the closing decision should consider a move review. No further edits should be made to this section.
The result of the move request was: not moved. Favonian (talk) 19:11, 9 August 2012 (UTC)
Red–black tree → Red-black tree –
Per WP:COMMONNAME. It is spelled with a hyphen in almost every single source in google books and google scholar.
Some will quote WP:DASH without providing any source that backs the dash spelling, but the Manual of Style warns that "The title of an article should be based on the Article titles policy.", and WP:COMMONNAME is part of that policy. --Enric Naval (talk) 19:41, 2 August 2012 (UTC)
- Oppose. It is a tree of red and black nodes. The conjunction is "and", so WP:DASH#En dashes: other uses number 2 applies. See example of "red–green colorblind". COMMONNAME does not address punctuation style; whether hypen or dash, we are using the common name. See WP:MOS#Allowable typographical changes, styling of dashes and hypens bullet. Glrx (talk) 23:32, 2 August 2012 (UTC)
- Oppose – sources that have a style of distinguishing the roles of hyphen and en dash as WP has (see MOS:DASH) do use the en dash for this one. Selected sources: [2], [3], [4], [5]. I never move articles without verifying the sources support my interpretation of the structure that the hyphen or en dash represents, in this case an alternation between parallel items, not a red-black color like a hot poker. I suspect that ErikHaugen also wouldn't move without checking. In fact, I see an article cited already with the en dash correctly in its title, from sciencedirect.com, a publisher that has a style that distinguishes. Enric, please stop running around doing controversial moves that are contrary to WP style. Styling is an an MOS issue, not a COMMONNAME issue; there's no controversy about the name or the spelling; it's just a matter of styling the puctuation per our MOS. Dicklyon (talk) 00:35, 3 August 2012 (UTC)
- Oppose—I have an occasional looks at your contribs list, Enric. This time I was disappointed to see that there's some kind of anti-MoS punctuation/typography campaign aswing. Please do not change against the style guides, which have been carefully built through consensus, without a very good reason: I don't see one, and citing poor typographical practices elsewhere isn't going to cut the mustard. Tony (talk) 04:51, 3 August 2012 (UTC) PS, I::Enric, I'm not sure about these ones from Googlebooks: [6], [7], [8]. Don't they look like en dashes to you? And these from Scholar: [9], [10], [11]. Just checking that we're working with the same data. Tony (talk) 07:32, 3 August 2012 (UTC)
- Oppose Per Dick and Tony. —Ruud 12:19, 3 August 2012 (UTC)
- (replying to several people above) OK, another styling choice. But there are hundreds of sources from reputable publishers using a hyphen (in google books[12] and in google scholar[13]). You are hand-picking the few sources that use a dash and claiming that those use the "correct" styling. I have a problem when someone claims that the huge majority of the sources uses the "wrong" styling. --Enric Naval (talk) 15:51, 3 August 2012 (UTC)
- The way to test whether sources disagree here is to find a source that uses the en dash to connect parallel terms, but nevertheless uses the hyphen in red-black tree. If you find more of those than sources using the en dash, then there's a case to be made that sources prefer the hyphen. But the "hundreds" that you refer to are just the typical majority of sources who have a style of representing the role of the en dash (or long hyphen as some guides call it) with an ordinary hyphen. These are not "wrong", just a different style that what we have in MOS:DASH; the hyphen would be "wrong" in our style, but not in theirs. Dicklyon (talk) 16:36, 3 August 2012 (UTC)
- There have been numerous occasions where Tony has tried—sometimes succesfully, sometimes not—to force his arguably idiosyncratic views on proper use of spelling, grammar and typography upon us, against proper use of proper nouns, proper names, common names and common sense, generally wasting a lot of editor's time in the process for little to no gain. This isn't one of them. Even without discounting the poor typographical choices made in many sources, you have not demonstrated the use of a hyphen is the ubiquitously used common name of this data structure. As Dick pointed out red–black here indicates "an alternation between parallel items, not a red-black color", making a strong argument for choosing this particular typography from among the choices offered to use by the sources. Ergo, leave the article where it is and has been for a long time. —Ruud 16:51, 3 August 2012 (UTC)
- Now I see why they call you Ruud. Dicklyon (talk) 16:54, 3 August 2012 (UTC)
- Either because that was the name my parents gave me, or because you're mispronouncing it. —Ruud 16:57, 3 August 2012 (UTC)
- Both, probably. Dicklyon (talk) 17:48, 3 August 2012 (UTC)
- Either because that was the name my parents gave me, or because you're mispronouncing it. —Ruud 16:57, 3 August 2012 (UTC)
- Now I see why they call you Ruud. Dicklyon (talk) 16:54, 3 August 2012 (UTC)
- In all fairness, Knuth—a recognized expert on both typography and computer science—in The Art of Computer Programming, as well as the standard textbooks by Cormen et al. and Goodrich & Tamassia and the article by Guibas & Sedgewick introducing this term, all write "red-black tree" with a hyphen. —Ruud 18:15, 3 August 2012 (UTC)
- In Knuth's style, the hyphen would be correct; he advocates, and is perfectly entitled to, a style in which the en dash is used for nothing but number ranges. If you want to make a point of someone choosing the hyphen when their style uses en dash to connect parallel items, you'll have to look harder. Dicklyon (talk) 21:22, 4 August 2012 (UTC)
- That was not my point, or I wouldn't still be opposing this move. Just an observation that the world at large doesn't seem to care as much about "correct" usage of hyphens and dashes, or doens't quite agree with us what a correct usage constitutes; I had expected at least one of those sources to use a hyphen. Out of interest, where did D.E.K. state the en dash should only be used for number ranges? —Ruud 01:01, 5 August 2012 (UTC)
- On p.4 of the TeXbook, he gives only the one use for the en dash. He doesn't explicitly say there are no others, but that's his implication. Dicklyon (talk) 01:29, 5 August 2012 (UTC)
- That was not my point, or I wouldn't still be opposing this move. Just an observation that the world at large doesn't seem to care as much about "correct" usage of hyphens and dashes, or doens't quite agree with us what a correct usage constitutes; I had expected at least one of those sources to use a hyphen. Out of interest, where did D.E.K. state the en dash should only be used for number ranges? —Ruud 01:01, 5 August 2012 (UTC)
- In Knuth's style, the hyphen would be correct; he advocates, and is perfectly entitled to, a style in which the en dash is used for nothing but number ranges. If you want to make a point of someone choosing the hyphen when their style uses en dash to connect parallel items, you'll have to look harder. Dicklyon (talk) 21:22, 4 August 2012 (UTC)
- N.B.: WP:COMMONNAME does not apply to punctuation; punctuation decisions are left to our internal style standards. Powers T 23:31, 4 August 2012 (UTC)
- The above discussion is preserved as an archive of a requested move. Please do not modify it. Subsequent comments should be made in a new section on this talk page or in a move review. No further edits should be made to this section.
Origin of name
[edit]Source: https://www.coursera.org/learn/algorithms-graphs-data-structures/lecture/8acpe/red-black-trees - Coursera Graph Search, Shortest Paths, and Data Structures course by Tim Roughgarden of Stanford
At about 5:08 in this lecture Tim explains the origin of the name as he had asked the question to Guibas. Indeed it acquired the name from the paper mentioned in the article. The reason was due to new limited colour printing technology of the journal which the authors were keen to use and have red and black pictures of the structure. The story has a twist in that due to a problem the technology wasn't available for use, but the name stuck.
Sedgewick explains the naming here in Algorithms, Part 1: https://class.coursera.org/algs4partI-002/lecture/50 at 31:55 — Preceding unsigned comment added by 82.34.15.102 (talk) 09:16, 10 March 2013 (UTC)
David — Preceding unsigned comment added by 86.18.202.252 (talk) 02:06, 5 March 2013 (UTC)
How does U in case 4 not have any leaves?
[edit]I know why: Because it would cause there to be 3 black nodes from the root node to the leaves for the leaves attached to U, as opposed to 2 black nodes elsewhere. The only thing is, How would the program know not to create nodes? Does it check all the paths or something? --Cornince (talk) 11:03, 27 May 2013 (UTC)
Usage in functional programming
[edit]The article says "Red–black trees are also particularly valuable in functional programming, where they are one of the most common persistent data structures" which is wrong. The most used data structure is AVL trees because they are simpler and support set operations easily. Currently in the functional programming standard libraries - AVL are used by OCaml, FSharp - Red-black are used by SML-NJ - Size balanced trees are used by Haskell — Preceding unsigned comment added by 90.46.151.222 (talk) 01:26, 12 December 2013 (UTC)
Incorrect information?
[edit]The last paragraph in "Analogy to B-trees of order 4" states that "The worst case occurs when all nodes in a colored binary tree are black, the best case occurs when only a third of them are black (and the other two thirds are red nodes)." This is untrue; when all nodes in a red-black tree are black then by the 5th property the tree is perfectly balanced. The worst case is instead when the tree is most unbalanced, when one side of root it all black and the other half only a third black.
I feel as though my intuition is failing me in the very last sentence above, but besides that is the article actually incorrect in what it says.
-Node — Preceding unsigned comment added by 129.2.129.93 (talk) 22:29, 28 September 2014 (UTC)
- I guess you're mixing up execution time and fill factor. What is talked about is: fill factor. --Nomen4Omen (talk) 09:23, 1 November 2018 (UTC)
Irrelevent "See also" links?
[edit]I am tempted to remove some of the "See also" links. AA trees are a variation of the red-black tree, AVL and B-trees are discussed in the article, but scapegoat trees, splay trees, and T-trees are not. The article about scapegoat trees mentions red-black trees, but the connection is weak. The other two don't refer to red-black trees at all. Is there some relevence that I've missed? 2602:306:CD0E:D410:213:72FF:FE32:59B5 (talk) 01:51, 12 August 2015 (UTC)
- The guideline is WP:ALSO. I suppose there are a lot of tree structures with articles, and this see also certainly should not be a list of them. However, it seems reasonable to have a couple of links to other articles about similar topics, and scapegoat tree and the others look suitable. A target does not have to be about red-black trees to appear. Johnuniq (talk) 02:10, 12 August 2015 (UTC)
- And yet there's already a link to trees in general as well as the "Tree data structures" template box at bottom (with links to scapegoat trees &c.). If one expects the "See also" links to be directly relevent to the article, as I did, then the unrelated links are leading the reader off topic IMHO.
- The Manual of Style/Layout suggests, "Editors should provide a brief annotation when a link's relevance is not immediately apparent". What might the annotation be for scapegoat trees &c? 2602:306:CD0E:D410:213:72FF:FE32:59B5 (talk) 23:02, 13 August 2015 (UTC)
- An annotation is not needed for an article on a tree with concerns similar to a red–black tree. However, I take your point about the "Tree data structures" navbox at the bottom, so go ahead and clean up the list if you want. Johnuniq (talk) 07:28, 14 August 2015 (UTC)
Missing cases for deletion of nodes with zero non-leaf children
[edit]It is explained how the deletion of an arbitrary node can be reduced to the problem of deleting a node with at most one non-leaf child, but then then following cases all assume there is a non-leaf child. Where are the cases that cover the situation where there are no non-leaf children? — Preceding unsigned comment added by Jan Hidders (talk • contribs) 10:32, 25 August 2015
- See the parentheses at: "The complex case is when both M and C are black. (This can only occur when deleting a black node which has two leaf children, because if the black node M had a black non-leaf child on one side but just a leaf child on the other side, then the count of black nodes on both sides would be different, thus the tree would have been an invalid red–black tree by violation of property 5.)" This means that all the cases 1 thru 6 cover only situations where there are only leaf children C resp. N. Of course, that remark makes some other Note:s obsolete. --Nomen4Omen (talk) 13:19, 25 August 2015 (UTC)
I agree with Jan Hidders. The case where there are 1 or 2 children is covered above. The case where there are 0 children is not covered. If M is red, case is simple, it is deleted. If M is black, unless it is the root node, it is not simple. Deleting it will reduce the black height of the tree which needs repairing. The parenthesized comment does not cover it above. Stephen Howe (talk) 22:00, 30 October 2018 (UTC)
- In my opinion the case with 0 non-leaf (≠ NIL) children IS covered (irrespective of [but in accordance with] the parenthesized comment).
- A leaf node (= NIL node) has of course 0 non-leaf children. But there is no requirement and thus is illegal to delete such a key-less node. Otherwise ...
- 0 non-leaf children then means at least 1 proper or 2 leaf children. The case of at least 1 proper children is not under discussion here. 2 leaf children are shown as the children of node 6 (red) or node 11 (black) in the figure "An example of a red–black tree" in § Properties. These leaf children are black according to property 3.
- The removal starts with the sentence "We begin by replacing M with its child C." in Removal 5th §. What is meant by "its child C" is explained in Removal 2nd § by "If M does have a non-leaf child, call that its child, C; otherwise, choose either leaf as its child, C." In the case under discussion, one of these NIL leaves is chosen as M’s (its) child C.
- --Nomen4Omen (talk) 10:17, 31 October 2018 (UTC)
What is a row or a black row?
[edit]Property 4 has been changed from "(This property means that node colored red cannot be in a row.)" to "This property means that a node that is colored red cannot be in a black row." But either way: What is a row in a red–black tree ? --Nomen4Omen (talk) 09:41, 2 September 2015 (UTC)
A row in a tree consists of all nodes at the same depth. Row 0 would consist of the root node, Row 1 would consist of the children of the root node, Row 2 would consist of the children of the nodes in Row 1, etc. Jmonty42 (talk) 21:03, 29 September 2015 (UTC)
Insertion/Deletion average complexity
[edit]The run time complexity of insertion and deletion from any binary search tree will be O(log n) because you must first either find where the element should be inserted or find the element to delete. The mention of an insertion after a supposed "NEXT" operation does not represent the average case of the insert or delete operations. Furthermore, there is no standard NEXT operation for binary search trees. If you have a reference that suggests a NEXT operation is indeed part of the standard operations supported by general binary search trees, please add it. Jmonty42 (talk) 21:17, 29 September 2015 (UTC)
- @Jmonty42 Maybe INSERT or DELETE without SEARCH is not your "average case". Nevertheless it has an average complexity excluding SEARCH, and this
- is of interest for some applications (whether the RBtree is a search tree or not).
- If SEARCH would be always included there would be no need mentioning because then always logarithmic.
- Some authors strongly emphasise that their amortized complexity is constant. Then there has to be some means different from SEARCH for accessing a node.
- --Nomen4Omen (talk) 21:35, 29 September 2015 (UTC)
- I fixed the article to report O(log n) costs for all amortized operations.. 24.206.111.9 (talk) 14:18, 29 May 2024 (UTC)
@Nomen4Omen, please cite your references for authors claiming that the amortized complexity of insert into a binary search tree is constant. A red black tree by definition is a binary search tree. The rules for coloring nodes red/black and for inserting, deleting, and balancing afterwards would be completely meaningless if each node did not conform to the rules of a node in a binary search tree (greater than the children to its left, less than the children to its right).
There are data structures that have constant insert and delete operations, such as a hash map. That is an entirely different class of data structure intended for a completely different purpose. Every single binary search tree will have average complexities of O(log n) for insert and delete operations. Jmonty42 (talk) 21:43, 29 September 2015 (UTC)
Each of these references lists the run-time complexity of insert and delete as O(log n): http://pages.cs.wisc.edu/~paton/readings/Red-Black-Trees/ https://www.topcoder.com/community/data-science/data-science-tutorials/an-introduction-to-binary-search-and-red-black-trees/ https://www.cs.princeton.edu/~rs/talks/LLRB/RedBlack.pdf
Please stop spreading misinformation. Jmonty42 (talk) 21:51, 29 September 2015 (UTC)
- I'm not spreading misinformation. The other sources implicitly include SEARCH, and I explicitly exclude it by use of a footnote.
- Amortized modification costs are proved in Kurt Mehlhorn, Peter Sanders: Algorithms and Data Structures. The Basic Toolbox. Springer, Berlin/Heidelberg 2008, ISBN 978-3-540-77977-3, doi:10.1007/978-3-540-77978-0 p. 165.
- Although an RBtree is called a binary search tree, it also is a binary tree and has all these properties in addition to a SEARCH operation.
- --Nomen4Omen (talk) 21:58, 29 September 2015 (UTC)
@Nomen4Omen, I looked up your reference in Mehlhorn and Sanders and you are misunderstanding the text. From the text on page 165 from section 7.7 "Historical Notes and Further Findings": "On the other hand, AVL trees do not have constant amortized update costs. ... In comparison [to AVL trees], red-black trees have slightly higher costs for locate, but they have faster updates ..." It makes no claim that an insert operation has constant amortized cost. You may be confused from the section 7.4 titled "Amortized Analysis of Update Operations". The update operations refer to the balancing and updating of binary search trees after the initial insert or delete is done. This is only part of the operation that we are discussing when we are referring to the insert operation of red-black trees. The insert operation (including searching for the place to insert, actually inserting the new node as a leaf in the correct spot, and then balancing the tree - or updating to use the terms from your reference) is O(log n) when we compare to a similar operation on say, a linked list (where the insert operation is O(n)) or a hash map (where the insert operation - which includes a hashing function) is O(1).
Again, to quote from your reference material, please turn to page 151. This section is talking about (a,b) trees and red-black trees. The paragraph directly under Exercise 7.5 clearly states "To insert an element e, we first descend the tree recursively to find ...". When discussing the complexity of the insert operation on data structures, you must include the entire operation. On page 149, in the introduction of binary search trees it also states "Thus any insertion requires O(log n) comparisons on average." Again, that will be true for any binary search tree, of which red-black trees are a type of. Jmonty42 (talk) 22:50, 29 September 2015 (UTC)
This link might help to clear things up when discussing the complexity of these common operations: http://bigocheatsheet.com/. Jmonty42 (talk) 23:22, 29 September 2015 (UTC)
- @Jmonty42 The latter source is not consistent in not including search for stacks and lists or in always INcluding it for the other data structures. Anyway, it is more information for the reader to give time complexity EXcluding search (and telling him about that), because INcluding logarithmic to constant is an easy task. (Why should emphasizing amortized constant update costs be so important for Mehlhorn/Sanders when UPDATE does not exist without SEARCH?) --Nomen4Omen (talk) 02:37, 30 September 2015 (UTC)
@Nomen4Omen You're missing the fundamental concept here. Inserting into a red black tree depends on search. You cannot insert an element into a red-black tree without searching for where it goes first, it is meaningless to try to analyze the insert operation without that necessary step. You are correct that the linked resource does not include searching as part of insertion for other data structures, because other data structures don't require searching for insertion. Take a stack for example. Placing an element onto a stack is always an O(1) operation because it just goes onto the top of the stack (think of stacking plates). Stacks do not insert elements in sorted order, it's a last-in, first-out (LIFO) data structure by definition. A red-black tree inserts elements in sorted order by definition, you cannot insert an element into a red-black tree without log n comparisons on average. An insert operation on an array doesn't require searching, first, either (assuming you're inserting by index). But inserting into an array requires all of the elements after the insertion point to be moved down, resulting in an average O(n) complexity operation, which is the same as linearly searching in an array O(n). The two operations having the same complexity does not mean that one depends on the other.
Your changes only affected the summary section of the article, which is supposed to give a way to quickly compare this particular data structure to other data structures with similar summary sections. If you wish to provide more information about the amortized update operation from Melhorn/Sanders, include it in the relevant section in the article. Jmonty42 (talk) 05:06, 30 September 2015 (UTC)
- @Jmonty42 My very last attempt!
- The main (possibly only) difference is that SEARCH is absolutely mandatory for you − and not for me. I say, you cannot know (and should not preclude) how the keys in the binary search tree are organized and how the user accesses a specific element. An example from real life: An application (e.g. in the context of the Windows registry) may have keys which have a major part (e.g. the registry "key") and a subordinate part (e.g. the registry "value"). The comparison subroutine of the binary search tree is capable to compare every leading subset (substring starting at offset 0) of the hierarchical key:=("key","value"). Parts of the application access the data by SEARCHing for ("key","value"). But there may also be other parts of the application which SEARCH only for the registry "key" and access the subordinate registry "value"s and associated "data" by means of (inOrder) sequential NEXT operations (while using only the doubly linked property of the binary tree), thereby possibly INSERTing and/or DELETEing. My complexity assessment takes care of this latter kind of access, too. Because a reader can be assumed to be able to add: constant (ave INSERT without SEARCH) + logarithmic (ave SEARCH) = logarithmic (ave INSERT with SEARCH), my complexity assessment contains both cases, but yours (in my opinion unnecessarily) withholds some information to the reader.
- IMHO my changes do not at all affect the summary section of the article, because there only the worstcase ("guarantee"d) complexities are given − and no average complexities. (This is quite common for summary sections.)
- --Nomen4Omen (talk) 10:32, 30 September 2015 (UTC)
@Nomen4Omen, no, search is not only mandatory for me, search is mandatory for all inserts into any binary search tree. Every single reference I have cited, including yours, explicitly states as much. In your example of inserting while traversing the tree in order, you've increased the complexity of the insert operation to O(n). Insert cannot be considered independent for binary search trees. Even if you disregard how the operation got to the point of where the element is to be inserted, in the case of a red-black or AVL tree you'll still need to rebalance the tree, which on average is O(log n).
Also, regarding your second point, the only change of yours I affected on this page was the change to the info box (which I mistakenly called the summary). You had incorrectly (according to every reference we have discussed here) edited the average complexity for insert and delete operations to O(1). That is the change I am referring to. It is factual and supported by references that these operations have a complexity of O(log n).
Your example of the Windows registry containing key value pairs has no relevance to the discussion of the complexity of the insert operation. The elements in a red-black tree can be any homogenous data type so long as that data type supports comparison. Jmonty42 (talk) 16:22, 30 September 2015 (UTC)
- To offer a third perspective: I agree with Jmonty42 that listing the amortized complexity of Insert and Delete operations as O(1) is potentially confusing and unintuitive (at least, it confused me, even even with the footnote). I would second Jmonty42's suggestion to remove these from the table, leaving only the O(log n) costs (as treated in standard textbooks like "Introduction to Algorithms, 4th Edition" by Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein). Einabla (talk) 16:42, 8 November 2022 (UTC)
Why is Sedgewick not listed as the inventor?
[edit]It looks like the infobox lists the inventor of the antecedent to red-black trees rather than the inventor of red-black trees. — Preceding unsigned comment added by 45.49.18.32 (talk • contribs) 09:47, 23 January 2016
Question about delete_case5 and delete_case6
[edit]Is the color exchange in delete_case5 actually necessary? As far as I can tell the delete_case5 sibling and near nephew nodes always have their colors re-assigned in delete_case6.
If delete_case5 rotated then "s" in delete_case6 was the near nephew in delete_case5 and the node that is the far nephew in delete_case6 (the one re-colored inside the delete_case6 if block) was s in delete_case5.
If that understanding is correct then I see no reason for delete_case5 to bother assigning colors.
Soronel~enwiki (talk) 19:32, 9 February 2016 (UTC)
- I guess you are right in essence. And the implementation could be optimized in this way, especially because there are no color tests left in delete_case6. There may be more such optimizations possible in the algorithm. However, the goal of the text is not code optimization, it is clarity. --Nomen4Omen (talk) 21:16, 9 February 2016 (UTC)
I would have found it far clearer if the operations were presented as single blocks of pseudo-code rather than broken out C functions for the various cases. I've come up with the following.
For insert:
insert_rebalance(node n): If n is the tree root then set n's color to black and done. If n's parent color is black then done.
- There is a red violation, if the uncle is also red then push the red node up the tree and start over at n's grandparent.
- Because this is the only path that starts over it can be seen that n must always be red on entry, and that insertion is limited to O(log n) iterations, and also because tree rotations only occur after this point that no more than two rotations will be needed to complete an insertion.
If n's uncle's color is red then set n's parent's color to black and n's uncle's color to black, set n's grandparent's color to red and call insert_rebalance(n's grandparent) and done
- The uncle is black, perform at least a single rotation and possibly a double rotation. A double rotation will be needed if n and n's parent do not have the same orientation.
- If n and n's parent do have the same orientation then set n to n's parent for the following steps.
If n is the left child of a right child or n is the right child of a left child rotate so that n's parent becomes a child of n otherwise set n = n's parent.
- It is not necessary to color the node that will remain a child of n - it must already be red.
set n's color to black
set n's parent's color to red
rotate so that n's parent is child to n. done
rb_insert(node n) insert n as for any BST. set n's color to red call insert_rebalance(n) done
And for removal: delete_rebalance(node n): if n is the tree root then done if n's sibling is red then flip the colors of n's sibling and n's parent and rotate so that n's parent is child to n's sibling.
- n's sibling will always be black after the above step, this is guaranteed because a red sibling of a black non-leaf node will have two black non-leaf children and one of those children becomes the new sibling of n.
- It can be seen that the above step can only occur once per node removal, if the new sibling's children are both black then the newly red parent will be re-colored black again and and rebalancing is complete otherwise the possibility of further recursion is skipped.
If n's sibling's children are both black then set n's sibling's color to red and if n's parent is black then call delete_rebalance(n's parent) otherwise set n's parent black and done
- n's sibling has at least one red child, at least one (and possibly two rotations will be needed to complete rebalancing). A double rotation will be needed if the far nephew is black, otherwise just color that far nephew black (the far nephew will remain a child of n's sibling after the final rotation).
- If the far nephew is already black then it is not necessary to re-color n's sibling (the node that becomes n's far nephew) because it is already black.
if n's far nephew is black then rotate so that n's sibling is child to n's near nephew otherwise set n's far nephew's color to black.
- After the final rotation n's sibling will occupy the tree position now held by n's parent, so give n's sibling the color of n's parent.
set n's sibling's color to the color of n's parent set n's parent's color to black. rotate so that n's parent is child to n's sibling. done
rb_remove(node n): if n's left child and n's right child are non-leaf nodes replace the value in n with the value from n's in-order predecessor or in-order successor and set n to that in-order predecessor or in-order successor node
if n's color is black then begin declare child if n's left child is non-leaf set child = n's left child otherwise set child = n's right child
- If child is not a leaf node then it must be red
if child is a non-leaf node then set child's color to black otherwise call delete_rebalance(n) end
- Note that this remove operation may need to move a single non-leaf child into the position now occupied by n.
- It is an error to reach this point with n having two non-leaf children.
remove n from the tree. done.
I came up with the above after collapsing the wiki code into single functions and then working them over quite awhile. Note that while I am very comfortable with insert_rebalance and delete_rebalance I am not as sure of rb_delete (I consider rb_insert to be all but trivial).
The one thing I see being potentially confusing is that the relationships are always for the current statement rather than whatever relationship existed at the start of the operation. For instance in insert_rebalance after the choice between rotate and re-assign "n's parent" refers to what was the grandparent.
Even if the C code is to be kept I think combining delete cases 3 and 4 would make sense, doing so would emphasize what checks they have in common as well as what differences they rely on. void delete_case3(struct node * n) { struct node * s = sibling(n); if(s->color == BLACK /* This check is trivial due to case 2 above,*/ && s->left->color == BLACK && s->right->color == BLACK) { s->color = RED;
if(n->parent->color == BLACK) delete_case1(n->parent); else n->parent->color = BLACK; } else delete_case5(n); }
Of course if that were done then delete_case5 and delete_case6 should be renamed to delete_case4 and delete_case5 respectively.
Soronel~enwiki (talk) 08:04, 11 February 2016 (UTC)
Set operations and bulk operations
[edit]@user:Henryy321 Could you pls show the point where the "duplicates" of t1 and t2 are suppressed in the function
- union(t1, t2) ?
A similar question arises with intersection and difference, but may be answered by the previous answer. --Nomen4Omen (talk) 07:57, 14 December 2016 (UTC)
"Sedgewick originally allowed nodes whose two children are red making his trees more like 2-3-4 trees but later this restriction was added"
"This restriction": no restriction has been named, so "this restriction" is dangling. Is what is meant "but ;ater did not allow two red children"? Then that is what should be written! GeneCallahan (talk) 12:40, 16 February 2017 (UTC)
There appears to be a typo in the cited paper "Parallel Ordered Sets Using Join" v4 Nov 12 2016; in figure 3 line 10 it has else T'' which should probably be else T' instead.
Also Note
(1) |
(2) |
L black height | R black height | r(L) | r(R) | r(R)/2 | floor(r(R)/2)*2 | r(L)=floor(r(R)/2)*2 | L=black | bh(L)=bh(R) | (L=black) and (bh(L)=bh(R)) |
---|---|---|---|---|---|---|---|---|---|
3 | 3 | 5 | 5 | 2.5 | 4 | FALSE | FALSE | TRUE | FALSE |
3 | 3 | 5 | 4 | 2 | 4 | FALSE | FALSE | TRUE | FALSE |
3 | 3 | 4 | 5 | 2.5 | 4 | TRUE | TRUE | TRUE | TRUE |
3 | 3 | 4 | 4 | 2 | 4 | TRUE | TRUE | TRUE | TRUE |
3 | 4 | 5 | 7 | 3.5 | 6 | FALSE | FALSE | FALSE | FALSE |
3 | 4 | 5 | 6 | 3 | 6 | FALSE | FALSE | FALSE | FALSE |
3 | 4 | 4 | 7 | 3.5 | 6 | FALSE | TRUE | FALSE | FALSE |
3 | 4 | 4 | 6 | 3 | 6 | FALSE | TRUE | FALSE | FALSE |
4 | 3 | 7 | 5 | 2.5 | 4 | FALSE | FALSE | FALSE | FALSE |
4 | 3 | 7 | 4 | 2 | 4 | FALSE | FALSE | FALSE | FALSE |
4 | 3 | 6 | 5 | 2.5 | 4 | FALSE | TRUE | FALSE | FALSE |
4 | 3 | 6 | 4 | 2 | 4 | FALSE | TRUE | FALSE | FALSE |
4 | 4 | 7 | 7 | 3.5 | 6 | FALSE | FALSE | TRUE | FALSE |
4 | 4 | 7 | 6 | 3 | 6 | FALSE | FALSE | TRUE | FALSE |
4 | 4 | 6 | 7 | 3.5 | 6 | TRUE | TRUE | TRUE | TRUE |
4 | 4 | 6 | 6 | 3 | 6 | TRUE | TRUE | TRUE | TRUE |
GregoryKaiser (talk) 23:14, 17 February 2022 (UTC)
Terminology: Sentinel node implementation gain
[edit]Hi,
In the terminology part it is explained that using sentinel nodes as leaf nodes instead of using null pointers allows to save execution time.
I think this affirmation is not verified.
I allow myself to highlight this point given since 2007 it is explained that using sentinel nodes provide advantage in term of memory usage or execution timing.
The first occurrence of this (memory) gain came during the refactoring of the page at the 29 November 2007: [14].
The next modification was the 8 August 2017 involves, among other things, the conversion of this memory gain to an execution time optimization. The motive was Terminology: pointer to sentinel node: [15].
In my opinion, allocating a sentinel node is less efficient than testing the leaf propriety of a node by checking if his pointer is null, just as memory view (allocating an extra structure) as timing view (dereferencing a pointer instead comparing a value).
I suggest to provide justifications or delete the gain's references by using this implementation.
Phil Gekni (talk) 18:43, 26 September 2017 (UTC)
- In my opinion there is a small advantage in efficiency (not memory) to use a ponter to a sentinel node instead of a null pointer. Example:
typedef struct NODE { // node of binary search tree
Node* lChild; // -> left child
Node* rChild; // -> right child
int key; // key
} Node;
typedef struct RESULT { // result structure of search
Node* resNod; // -> result node
int resDir; // compare result (Equal, LessThan, GreaterThan or Empty)
} Result;
typedef struct BinarySearchTree { // the binary search tree
Node* root; // -> its root
} Bst;
Bst bst;
Node Sentinel, *sentinel = &Sentinel; // the sentinel node
Sentinel.lChild = sentinel; Sentinel.rChild = sentinel;
// Initialisation of the binary search tree:
bst.root = sentinel; // indicates the missing root.
int FindWithSentinel(Bst* t, // anchor of the tree
int sKey, // key to be searched for
Result *r) // -> result structure
{ Node *x, *y;
sentinel->key = sKey; // prepare node Sentinel with search key
y = (Node*)t; // »parent« of the root
// search loop:
for (x = t->root; sKey != x->key; ) {
if (sKey < x->key)
y = x->lChild; // -> genuine node or sentinel node
else
y = x->rChild; // dito
if (sKey == y->key) { // keys equal! really?
r->resNod = y; // set result node
goto Epilog;
}
if (sKey < y->key)
x = y->lChild; // dito
else
x = y->rChild; // dito
}
// fall thru
// sKey == x->key // keys equal! really?
r->resNod = x;
x = y; // keep parent of x
Epilog:
if (r->resNod != sentinel) { // resNod is a genuine node
// and r->resNod->key == sKey.
r->resDir = Equal; // sKey has been found
}
else { // we ran into Sentinel
// sKey has not been found: x is »parent« of r->resNod
r->resNod = x; // ->Node or ->Bst
if (x != t) { // x is a ->Node
if (sKey < x->key) {
r->resDir = LessThan;
else
r->resDir = GreaterThan;
}
else // x is ->Bst
r->resDir = Empty;
}
return r->resDir;
// *r contains (->Node,compare result) or (->Bst,Empty)
}
Performance comparison with AVL trees
[edit]The current "Applications and related data structures" section [16] reads:
The performance measurements of Ben Pfaff with realistic test cases in 79 runs find AVL to RB ratios between 0.677 and 1.077, median at 0.947, and geometric mean 0.910.
This sentence doesn't state clearly whether AVL trees or RB trees are better. Also, the numbers are not found in the reference cited.
Could someone clear this paragraph up?
--MaigoAkisame (talk) 03:20, 6 December 2017 (UTC)
replace_node code missing
[edit]This article mentions a replace_node
function, but has no source code showing how that works. Since other algorithms are explained in source code, it would seem to make sense for this algorithm to also be explained in source code. - Gilgamesh (talk) 16:18, 23 December 2017 (UTC)
- Simple enough, just find the node and change the contents. We don't need to do anything else, because we haven't changed the structure.50.28.190.226 (talk) 21:30, 20 March 2018 (UTC)
Better deletion algorithm
[edit]When we remove a black node and replace it with a child (that isn't null, if possible), this algorithm is nicer.
A few notes: to rotate at a node, rotate so it goes up and its parent goes down.
Also, rotation implies that the colors become switched (so that if the parent was red before, the parent is red after).
Start by pointing at the node such that paths through it need an extra black. Go to the earliest case that fits.
Case 1: The node is red.
Color it black. Done.
Case 2: The node is now the root (and is black). (Case 1 in the main page)
Done.
Case 3: The sibling is red (and the node is black). (Case 2 in the main page)
Rotate at the sibling. Recurse (to cases 4-6).
Case 4: The far nephew is red (and the node and sibling are black). (Case 6 in the main page)
Rotate at the sibling, point at the far nephew (now the uncle), and recurse (to case 1).
Case 5: The near nephew is red (and the node, sibling and far nephew are black). (Case 5 in the main page)
Rotate at the near nephew. Recurse.
Case 6: Default (the node, sibling, and both nephews are black). (Cases 3 and 4 in the main page)
Recolor the sibling red, point at the parent, and recurse.
Opinions? 50.28.190.226 (talk) 21:30, 20 March 2018 (UTC)
When removing a black node with a non-leaf child it is known that child is red (because the removal already requires that the node being removed have one or no non-leaf children). In that situation all that needs to be done is move the child node into the tree position of the node to remove and color that child black, there is no need at all for further re-balancing in that situation. The complicated re-balance path is only needed when the node being removed has no non-leaf children. In the case where the node to remove has two non-leaf children you can swap the value of the node being removed with either the in-order predecessor or in-order successor node (or swap the nodes themselves although that is often more complicated) and then the node to remove is where the value ends up. At that point the node to remove has either one or no non-leaf children (the BST nature of the tree is messed up at that point but re-balancing and removing the node will fix that, re-balancing is concerned with node colors and not value ordering).
Beyond that, the standard deletion algorithm can never need more than three rotations, I haven't spent a lot of time examining the subtleties of your algorithm but it appears that it could potentially require O(log n) rotations, one at each level of the tree.
66.223.182.108 (talk) 22:31, 1 April 2018 (UTC)
- To your section "Beyond that, the standard deletion algorithm ...":
- I do not know what you mean by standard deletion algorithm. But the deletion algorithm in the article's text really requires maximally 3 rotations which is totally easy to see:
- The rotation in case2 leads to case4 which either exits or leads to case5. The rotation in case5 leads to case6 which contains a second (resp. third) rotation and then exits. So, there is a maximum of 3 rotations. (case3 which is the only case leading to a new iteration step does NOT contain a rotation.) --Nomen4Omen (talk) 09:07, 2 April 2018 (UTC)
- So in Case 1 of main page, we skip to case 2 and do nothing.
In Case 2 of main page, we skip to case 3 and recurse, which is the same as before.
In Case 3 of main page, we skip to case 6 and recurse, which leads to a recursion same as the main page.
In Case 4 of main page, we skip to case 6 and recurse, which leads to case 1, which is the same effect as the main page.
In Case 5 of main page, we skip to case 5 and recurse, which leads to case 4, equivalent to the effect on the main page.
In Case 6 of main page, we skip to case 4 and recurse, which leads to case 1, total effect same as main page.
This is simpler than the algorithm given, and leads to the same result, as you can see. (Same guy as above.) 130.132.173.201 (talk) 16:35, 7 June 2018 (UTC)
Hi. I don't find your algorithm simpler. I am speaking as a programmer. Ideally any loops should be small with unnecessary parts of the loop lifted outside (if they are technically not part of the loop). For deletion, it is basically a 2-prong algorithm: If the parent, sibling and sibling's children are black, you keep making the sibling red, and moving up a level. The problem stops if you reach the root node (cases 1 & 3). Otherwise you deal with the various cases of finding that the parent, sibling or sibling's children are red, and you exploit that to create a black node on the side of tree lacking one, by rotation and recoloring (cases 2, 4, 5 & 6). Your algorithm lacks clarity. Recursion should only occur once due to case 1 & 3, all other cases are aftermaths of discovering one of parent, sibling or siblings children is red and after a few steps, they halt (and not recurse). But the main Wiki algorithm I am still thinking about as well. I want insertion and deletion to be clear, succinct and minimal. See the point I raise below. Thanks. Stephen Howe (talk) 18:39, 18 November 2018 (UTC)
For Delete, why must case 2 be done between case 1 and case 3?
[edit]If you look at case 2 for delete, why is it considered between case 1 and case 3 ? Nothing in the rotation of case 2 or recoloring of nodes will affect the consideration of case 3. And case 3 loops back to case 1 and its change of the red-black tree, does not alter what should happen for case 2.
Therefore in my mind a more efficient algorithm would have the tight loop of cases 1 & 3 ("Keep moving upwards while the parent, sibling and siblings children are black") and then dealing with case 2, 4, 5 & 6 afterwards.
Have I got something wrong with this analysis?
Stephen Howe (talk) 23:26, 17 November 2018 (UTC)
- I agree that there may be possible some streamlining of the logic.
The current version has been entered by user:Dcoetzee at 05:17, 18 September 2004 (Added explanation of insertion and deletion, and revised other stuff) without giving a source. S/he has been banned by the Wikimedia Foundation from editing Wikimedia sites and stopped contributing on 3 October 2014, so that it may be impossible to ask her/him for a source.
If you have got a source pls don't hesitate to enter your version. --Nomen4Omen (talk) 11:06, 19 November 2018 (UTC)
I think that is Derrick Coetzee. Googling on his name and "court case" may explain the Wiki ban. What I have turned up is that "Introduction to Algorithms, 3rd Edition" by Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein covers Red-Black Delete. Interestingly enough, their fixup reduces to 4 cases not 6 cases. I like that. I turned up numerous YouTube videos that reference Wiki, GeeksForGeeks covers Delete but it has a number of mistakes. I think Microsoft's std::set is based on the book above. Now the problem reduces to, Can the books algorithm in 4 cases be explained simply? Algorithm here: https://stackoverflow.com/questions/23700857/deletion-in-red-black-tree Stephen Howe (talk) 04:15, 20 November 2018 (UTC)
What I think is irreducible for Red Black Tree delete is (i) looping and moving upwards if Parent, Sibling and Siblings children are all black with exit if root node is reached (ii) The up to 3 rotation cases with various cases of Parent, Sibling and Siblings children being Red. Stephen Howe (talk) 16:47, 20 November 2018 (UTC)
- I agree with your last "irreducibility" remark. The tricky part is how to order resp. mixup the other cases dealing with the 5 resources you mention: Current, Parent, Sibling and Siblings children. And the FIXUP in https://stackoverflow.com/questions/23700857/deletion-in-red-black-tree is rather tricky and cannot be understood without good figures containing at least the colors RED and BLACK and the proper names of the nodes as used in the FIXUP. (Btw, the article has only 1 case in addition because it counts
x == root[T]
as an extra case; thus the FIXUP would have 5 cases. Secondly, the article's case 2 is exactly equal to case 1 of the FIXUP – a position to which you seem to object.)
If FIXUP, Microsoft's std::set, and Cormen et al. all have the very same approach, this would be an excellent source. --Nomen4Omen (talk) 17:43, 21 November 2018 (UTC)
"Secondly, the article's case 2 is exactly equal to case 1 of the FIXUP – a position to which you seem to object" Yes I noticed that, well done, I am impressed. That makes me wonder if I have overlooked something (very possible) or noticed something that Cormen et all have not (and that is possible as well). I have the book on order. 3rd edition is suitable source, well respected. And I agree with the 5 cases. There are also the trivial cases easy to fixup: (i) red node, no children (just delete it). (ii) black node with red child (delete black node, replace with child, recolour child as black). I think it is time to write a red-black tree class in C++, check it out Stephen Howe (talk) 23:47, 21 November 2018 (UTC)
- Btw, as I found out in the meantime: FIXUP is exactly the same as the article at least wrt the input conditions.
- case F1 = case A1
- case F2 with B black = case A2
- case F2 with B red = case A3
- case F3 = case A4
- case F4 = case A5
- Especially, the handling of case F2 with B red looks quite different in that B is colored black immediately outside the do-loop, whereas case F2 with B black loops because B is black.
- The number of lines of code may be fewer in FIXUP this way, but I cannot see any performance gain. In contrast, when using code duplication (or goto-statements) it is possible to avoid any tests for the very same condition in the code flow (this is NOT obeyed in the article: s->color is checked at least twice: in case 2 and case 4). --Nomen4Omen (talk) 08:48, 28 November 2018 (UTC)
- To your original question: Because (s->color == RED) is a very strong condition which if the outcome is YES has implication to all other colors, this check is placed early (as case 2). The branch to the else-part of this check (cases 3-6) has almost no performance penalty vs. an early check (s->color != RED), so the loop would be tighter almost only optically and not performancewise.
- IMHO the problem here is how to sequence the partially overlapping input conditions of the cases without much code duplication or goto-statements. And in any case, for a Wikipedia article clarity is more important than efficiency. --Nomen4Omen (talk) 10:59, 28 November 2018 (UTC)
Thanks for the analysis. I was thinking the same on cases. I was comparing Microsoft's XTREE header (and I have found the Insert & Delete fixups) with FIXUP and Wiki's version.
To your original question: Because (s->color == RED) is a very strong condition which if the outcome is YES has implication to all other colors, this check is placed early (as case 2). The branch to the else-part of this check (cases 3-6) has almost no performance penalty vs. an early check (s->color != RED), so the loop would be tighter almost only optically and not performancewise.
That can't be right. Lets say for 10 levels, there is nothing but black nodes, no red nodes at all among parent, sibling & sibling children as we progress towards the root and after that the first red (11th level). Then for Wiki's code, it will loop 10 times checking for case 1 and case 3 (the right one 10 times). But it also needlessly evaluate for case 2 10 times. I am saying if case 3 is not met, the loop is quit and then it is right to do case 2, 4, 5 & 6 in that order, working out which case applies for various nodes being red. If sibling is red, parent and sibling children must be black. That has to be more efficient. Stephen Howe (talk) 19:14, 28 November 2018 (UTC)
- Which statements exactly do you mean by ckeck for case 1 and case 3 ? --Nomen4Omen (talk) 19:34, 28 November 2018 (UTC)
I mean in pseudo code (X being the current node and BLACK)
LOOP IF X == ROOT RETURN 'CASE 1 IF PARENT(X).COLOUR == RED OR SIBLING(X).COLOUR == RED OR SIBLING(X).LEFT.COLOUR == RED OR SIBLING(X).RIGHT.COLOUR == RED 'CASE 3 BREAK END IF SIBLING(X).COLOUR = RED X = PARENT(X) END LOOP IF SIBLING(X).COLOUR == RED 'CASE 2, NODES ABOVE & BELOW MUST BE BLACK 'TO BE DONE END IF 'CASE 4, 5 & 6 HERE
Stephen Howe (talk) 01:18, 29 November 2018 (UTC)
- What do you think about the following, whereby the statements 'CASE_x contain only the transformation (i.e. rotation, recolouring and/or reassignments) whereas the IF-statements are pulled out together with the GOTOs (RETURN, BREAK, CONTINUE) so that one can see which statement follows.
LOOP IF X == ROOT 'CASE_1 RETURN IF SIBLING(X).COLOUR == RED 'CASE_2 GOTO 'CASE_4 // SIBLING(X).COLOUR == BLACK: IF PARENT(X).COLOUR == BLACK IF SIBLING(X).LEFT.COLOUR == BLACK AND SIBLING(X).RIGHT.COLOUR == BLACK 'CASE_3 CONTINUE // loops GOTO 'CASE_5 'CASE_4: // PARENT(X).COLOUR == RED: // SIBLING(X).COLOUR == BLACK: IF SIBLING(X).LEFT.COLOUR == BLACK AND SIBLING(X).RIGHT.COLOUR == BLACK 'CASE_4 RETURN 'CASE_5: // SIBLING(X).COLOUR == BLACK: IF SIBLING(X).LEFT.COLOUR == RED 'CASE_5 // fall thru 'CASE_6 RETURN END LOOP
- The assumption is that the transformations (especially recolouring and/or reassignments) are not much more expensive than the IFs. --Nomen4Omen (talk) 11:04, 29 November 2018 (UTC)
LEAF...anybody?
[edit]Did anybody notice that as far as I can tell User:David_of_Earth introduced a LEAF constant that is not defined anywhere? I suppose it is meant to be NULL but I am mildly struggling to follow the presented algorithm... --Sven Pauli (talk) 18:40, 3 January 2019 (UTC)
- Yes, just #define it as NULL and all appears to work fine. I have taken the liberty of adding the basic types as used in the code, without which it is more difficult to follow. I also updated the info on libavl. The link to Eternally Confuzzled appears to be down, but I left it in in case that is just a temporary thing. If it is still down in a couple of month's time, perhaps remove it then. 82.13.91.100 (talk) 09:02, 27 June 2019 (UTC)
Changes as of Feb 16th 21
[edit]The proposed leaves are no individuals, nowhere in the article. They all have the same value, either NULL or the address of the ONE sentinel node. (Maybe it is helpful to call this type LEAF.) They do not contain a key or data. Nevertheless, they are considered a black unity for easier counting of the black nodes in the path.
The difficult case of delete, black node without child, has to replace the latter by a such a LEAF. –Nomen4Omen (talk) 19:03, 16 February 2021 (UTC)
Changes as of 2 Mar 2021
[edit]Mainly the sections #Terminology, #Properties, #Operations, #Proof of bounds have been revised.
- section #Properties
- without rule 3: "root is black" (because this disturbs recursions)
- then rules number 4 and 5 become 3 and 4
- NIL leaves are never individuals, never "null leaves as actual node objects".
- section #Operations
- Both, Insertion and Deletion, now programmed iteratively. Advantages:
- more precise visibility of the logic, the conditions of the cases
- no need to observe tail-recursivity
- rotations with root involvement easier
- if-s with
goto
separate the iteration from solution which improves structure - (performance)
- Both, Insertion and Deletion, cases slightly renumbered. Case 1 = iteration, Case 2 loop break, Insert Case 3 simple exit from loop.
Rotations are commutative with color changes, but consistently placed behind.
- diagrams:
- 2 to 4 phases; from top to bottom
- 2 or 3 if complete
- 3rd or 4th if new assignment of current node for further processing
- section #Insertion:
- new simple insert case 6
- section #Removal turned into 2 sections
- simple cases (never loop)
- case black leaf without child is more complex and possibly loops
- Finally, section #Proof of bounds
- The section "Proof of asymptotic bounds" has been rewritten, mainly because the old version was introducing slightly deviating definitions of height and black height.
The new version uses the definitions of #Properties. - It additionally gives an exact formula of the number of nodes of minimal RB trees.
- The section "Proof of asymptotic bounds" has been rewritten, mainly because the old version was introducing slightly deviating definitions of height and black height.
- stylistic: fewer "Note that ..."
- fewer "we"
- fewer "in this case"
- no "this test is trivial due to ..."
Conflicting History behind the name of Red-black trees
[edit]The article suggests two possible sources for the name of Red-black trees. But this video by Prof.Tim Roughgarden who was a colleague of Prof. Leonidas J. Guibas, suggests both reasons might be wrong. In this video from coursera, at the 5:08 mark, you can hear a second hand account from Prof. Tim say that, the name came about because when the inventors were writing the article for a journal which was going to use some new technology that allowed them to print in limited colors (red and black). So they prematurely named their data structure as red-black trees, but they ended up not being able to use the new printing method anyway.
This above story is not corroborated by other accounts found on stackexchange, one of which is from Prof. Leonidas himself which says they did it because they had black and red pens! Prof Sedgedwick seems to say it's because of the laser printer at Xerox PARC. Hazkaz (talk) 12:42, 11 June 2021 (UTC)
Reversal of Red-black tree edit on June 29
[edit]@WarmCabinet: This talk here is the right place for discussing the article. So I reverted your edit of my talk page and placed the post here:
If I just remove that "fall through to loop" comment, would everything be fine? These are all fairly independent code snippets and I don't see a reason that particular return statement needs to come before the loop. Actually, it looks like the parent == NULL case is in that first snippet, commented "// There is no parent"
- I do not at all understand your edit:
- You have twice entered the lines:
// I_case1:
\CRLFT->root = N;
\CRLFreturn;
- I am unable to see how the loop is reached after your second
return;
Of course, this way there is no// fall through to the loop
. - The first return has the condition
if (P == NULL)
, but the 2nd and 3rd are without a condition, so even the labelI_case_2:
cannot be reached.
- Pls stop editing things you do not understand. WP is not a institution for learning some computer programming.
- −Nomen4Omen (talk) 20:10, 8 July 2021 (UTC)
- Sorry for starting this discussion in the wrong place.
- I understand Red-black trees quite well. I find the insert cases to be confusing and disorganized and I'm trying to make them more intuitive. I don't think this section is necessarily supposed to be a valid C program with bits of Wikepedia interjected, but we can make it that way if you want.
- In the version you keep reverting to, the I_case_3 bit is already a duplicated snippet from that first bit. This is no different if you decide to label it as I_case_1.
if (P == NULL) { // There is no parent T->root = N; // N is the new root of the tree T. return; // insertion complete } P->child[dir] = N; // insert N as dir-child of P // fall through to the loop
// I_case_3 (P == NULL): return; // insertion complete
- How does this strike you?
There is an extremely big difference between I_case_1 and I_case_3, namely that in I_case_3 the root is already set and in I_case_1 it has to be set. How can you miss that? ―Nomen4Omen (talk) 10:02, 18 July 2021 (UTC)
void RBinsert1( RBtree* T, // -> red–black tree struct RBnode* P, // -> parent node of N (may be NULL) struct RBnode* N, // -> node to be inserted byte dir) // side of P on which to insert N (∈ { LEFT, RIGHT }) { struct RBnode* G; // -> parent node of P struct RBnode* U; // -> uncle of N N->color = RED; N->left = NIL; N->right = NIL; N->parent = P; // Fall through to I_case_1
- Insert case 1: The current node N is the root of the tree...
if (P == NULL) { // There is no parent // I_case1: T->root = N; // N is the new root of the tree T. return; // insertion complete } P->child[dir] = N; // insert N as dir-child of P // Fall through to I_case_2
- Insert case 2: The parent P is red and the root...
I_case_2: // P is the root and red: P->color = BLACK; return; // insertion complete } // fall through to the loop
- Ok, I think I see the source of my confusuion. I_case_3 is not, in fact, a duplicate of that first snippet, but the line after the grandparent iterating loop. I think that's a bit confusing, because the root node case is actually handled in that first snippet.
- I suggest we make that insert method into one code block and let the explanations be separate. Here's what that might look like:
void RBinsert1( RBtree* T, // -> red–black tree struct RBnode* P, // -> parent node of N (may be NULL) struct RBnode* N, // -> node to be inserted byte dir) // side of P on which to insert N (∈ { LEFT, RIGHT }) { struct RBnode* G; // -> parent node of P struct RBnode* U; // -> uncle of N N->color = RED; N->left = NIL; N->right = NIL; N->parent = P; if (P == NULL) { // There is no parent // I_case_3 (P == NULL): T->root = N; // N is the new root of the tree T. return; // insertion complete } P->child[dir] = N; // insert N as dir-child of P do { if (P->color == BLACK) { // I_case_1 (P black): return; // insertion complete } // From now on P is red. if ((G = GetParent(P)) == NULL) goto I_case_6; // P red and root // P red and G!=NULL. dir = childDir(P); // the side of parent G on which node P is located U = G->child[1-dir]; // uncle if (U == NIL || U->color == BLACK) // considered black goto I_case_45; // P red && U black // I_case_2 (P+U red): P->color = BLACK; U->color = BLACK; G->color = RED; N = G; // new current node // iterate 1 black level (2 tree levels) higher } while ((P = N->parent) != NULL); // end of (do while)-loop return; // insertion complete I_case_45: // P red && U black: if (N == P->child[1-dir]) { // I_case_4 (P red && U black && N inner grandchild of G): RotateDir(P,dir); // P is never the root N = P; // new current node P = G->child[dir]; // new parent of N // fall through to I_case_5 } // I_case_5 (P red && U black && N outer grandchild of G): RotateDirRoot(T,G,1-dir); // G may be the root P->color = BLACK; G->color = RED; return; // insertion complete I_case_6: // P is the root and red: P->color = BLACK; return; // insertion complete }
- Insert case 1: The current node’s parent P is black...
- Insert case 2: If both the parent P and the uncle U are red...
As far as I can see you have been able to absolutely cleanly separate the program text from the explaining text. Why do you think that this is difficult for other people? This distinction is available also to blind people when the typography is verbalized.
A major reason for this intertweening («with bits of Wikepedia interjected» and «I suggest we make that insert method into one code block and let the explanations be separate») are the diagrams which are explaining text as well and which have to be placed very close to the cases («code snippets») so that the reader can check the code against the diagram.
Why do you think that your trailing text:
- Insert case 1: The current node’s parent P is black...
- Insert case 2: If both the parent P and the uncle U are red...
is of greater help to somebody than the comments on the code lines? ―Nomen4Omen (talk) 10:02, 18 July 2021 (UTC)
Changes as of 2021-07-24
[edit]- Introduction of a case synopsis for insert and delete with each column explained
- Cases again renumbered
- More consistent wording
- Pull-out of first iteration clearer
- Case defining code at the top of the section
―Nomen4Omen (talk) 15:02, 24 July 2021 (UTC)
Changes as of 16:53, 29 December 2021
[edit]@Mebeim: Your proposal may be more easily understandable. But it has the big disadvantage that the RB software has to know what the user data are, in other words: it is NOT generic. If the RB software touches and changes ONLY the RB pointers and RB data it can deal with any kind of user layout (which potentially can be extremely complicated). This has been the intention of the version before your change and would not be achievable with your version. –Nomen4Omen (talk) 18:36, 29 December 2021 (UTC)
- @Nomen4Omen: That's fair enough. However, the version prior to my edit seems misleading: it says "all red–black tree pointers of this node and N are exchanged". It is not enough to simply exchange left/right/parent pointers of N and the replacement node, you would also have to update N's parent child pointer and the parent pointers of both nodes' children, and you would also need to exchange the colors of the two nodes (after all you are basically changing everything except the node key and user data). Of course, simply swapping keys and pointers to user data is much simpler than all of this dance, but as you say it would require the implementation to know that user data exists. You can revert my edit if you wish, but I'd suggest rephrasing the paragraph regardless. – Mebeim (talk) 19:24, 29 December 2021 (UTC)
Inventor & year of invention
[edit]The info box names R. Bayer&1972. Paragraph History associates both with 2–4 trees, and names L.J. Guibas & R. Sedgewick for publishing red–black trees in 1978. 178.6.237.24 (talk) 13:04, 28 July 2022 (UTC)
- As I interpret it: The scientific community agrees that Bayer did not name his tree RB-tree, nevertheless is considered its inventor because his invention and analysis is extremely close to RB-trees. In 1978 Guibas and Sedgewick rebaptized Bayer's symmetric binary B-tree to RB-tree. −Nomen4Omen (talk) 14:42, 28 July 2022 (UTC)
Removal - simple case 3 not clear
[edit]In the "Removal -> Simple cases" section, the second and third cases state (the case i'm wondering about is the third one, but it has a reference to the second case, so adding it as well):
(2nd case) If N has exactly one non-NIL child, it must be a red child, because if it were black, its NIL descendants would sit at a different black depth than N’s NIL child, violating requirement 4.
(3rd case) If N is a red node, it cannot have exactly one non-NIL child, because this would have to be black by requirement 3. Furthermore, it cannot have exactly one black child as argued just above. As a consequence, the red node N is without any child and can simply be removed.
Regarding the 3rd case: it is clear that a removed red node N cannot have one non-NIL black child. But the 3rd case also states that it is safe to assume that N has no children. However, it is not explained why it cannot have two non-NIL black children. Maybe i'm dumb, but this is not clear to me.
Imho the 3rd case needs an explanation on why a red N node cannot have two black children. Krepst (talk) 12:13, 24 May 2023 (UTC)
- You're absolutely right: a red node can have two black children. But you must admit that
- "two black children" ≠ "exactly one non-NIL child"
- –Nomen4Omen (talk) 14:13, 24 May 2023 (UTC)
- Yes, these are not the same, of course. But the reason why i wrote this topic is that from the way this 3rd case is written - it seems that it says: "If N is red - it cannot have one non-NIL child, as a consequence it has no children".
- Which would be ok imho, if the case of "N having two children" would be skipped, because it is already covered by another case. But it is not.
- So i'm genuinely puzzled. I feel like there is either something missing in the list of cases (or their descriptions), or the descriptions (at least the 3rd case) should be clarified. Krepst (talk) 08:27, 25 May 2023 (UTC)
- After looking at this again, i'm guessing that case 3 is in fact meant as a sub-case of case 2 (i.e. covering N having a single child). The 3rd case simply stating that it is not possible for a red N to have one child.
- And then the case of N having two children is covered by the last simple case "If N has two non-NIL children, [...]". I somehow missed this.
- However, imho the 3rd case could still use a clarification. Because i read it as "A red N cannot have one child, as a consequence it has no children", instead of "A red N cannot have one child". I think other people may read it the same way. Imho either that "as a consequence" part should be removed or it should be made more clear that cases 2 and 3 are related (that they both discuss only the cases of N having one child). Krepst (talk) 08:53, 25 May 2023 (UTC)
- Pls, IMHO try to place your clarification into the article. –Nomen4Omen (talk) 14:35, 25 May 2023 (UTC)
- "two (black and non-NIL) children" ≠ "exactly one non-NIL child" ≠ "only NIL children" ≠ "two (black and non-NIL) children"
- Above I added »"only NIL children"« (= "no child at all").
As it appears, after having been »genuinely puzzled«, you are able to read it »in a mathematical way«, so there is no need for a further clarification. –Nomen4Omen (talk) 13:28, 29 May 2023 (UTC)- aaa
- bbb
- cccc Krepst (talk) 16:51, 1 June 2023 (UTC)
- whoops.. sorry, i was testing this editor's formatting and hit ctrl+enter, which saved the comment. I don't see a way to delete it. Apologies. I will write the actual comment in a bit. Krepst (talk) 16:52, 1 June 2023 (UTC)
- How about changing:
As a consequence, the red node N is without any child and can simply be removed.
- into
As a consequence, the red node N either has no non-NIL children (and can simply be removed) or has two non-NIL children (covered below, in the last simple case).
- ?
- Alternatively, a bigger change could be to restructure the "Simple cases" paragraph so the cases are listed in a clear and consistent order, like this for example (apologies, i'm not very familiar with wikipedia editor's formatting):
* If N has only NIL children (i.e. no children):
* * If N is black:
* * * If N is the root node: it is replaced by a NIL node, after which the tree is empty and in RB-shape.
* * * If N is not the root node: see complex case below.
* * If N is red: N can simply be removed.
* If N has one non-NIL child: the non-NIL child must be a red child, because if it was black - its NIL descendants would sit at a different black depth than N’s NIL child, violating requirement 4.
* * If N is black: it is simply replaced with its red child, after painting the child black.
* * It is not possible for N to be red. As explained above - the single non-NIL child must be a red child and a red N cannot have a red child, as this would violate requirement 3.
* If N has two non-NIL children: an additional navigation to the minimum element in its right subtree [...] (same as the current last simple case)
- What do you think?
- Because the current structure (at least for me, when reading) added to the confusion, as it mixes cases based on number of non-NIL children with cases based on N’s color, with no clear order. As a consequence - it is harder to know if you've built a complete mental model of all cases in your head, after reading.
- Also (though this is probably not very important), in the current description the case of "a black N having a single red child" is mentioned in two separate cases (one case based on number of non-NIL children, the other based on N’s color):
If N has exactly one non-NIL child, it must be a red child [...]
[...] If N has a single red child, it is simply replaced with this child after painting the latter black.
- Anyway, these are just suggestions. Maybe it is all fine the way it is. Krepst (talk) 17:01, 1 June 2023 (UTC)
- "two (black and non-NIL) children" ≠ "exactly one non-NIL child" ≠ "only NIL children" ≠ "two (black and non-NIL) children"
I made some minor changes, mainly the introduction of Conclusion 5. This could to some extent alleviate a reading problem. –Nomen4Omen (talk) 10:13, 3 June 2023 (UTC)
i.e. versus i. .e.
[edit]This page uses i. .e.
, i .e.
and e. g.
. For the first two of i. .e.
and i. e.
this seems to be inconsistent. Checking the style guides acronym section, I was able to find that the specific first two examples should be ie.
and the last should be e.g.
.
Since these currently seem to be inconsistent, I'm going to make a preliminary edit to change them to what is shown in the manual of style's table. (This post is mostly to say my reasoning in case I'm wrong) KindlingFlame (talk) 06:51, 6 March 2024 (UTC)
History section confusion
[edit]"Sedgewick originally allowed nodes whose two children are red, making his trees more like 2–3–4 trees, but later this restriction was added"
What does "this restriction" refer to? Gwideman (talk) 11:44, 30 March 2024 (UTC)
- The 'restriction' added later is that 'a black node may not have two red children. This means that only 2-nodes and 3-nodes are allowed, making them isomorphic to 2-3 trees instead. Left-leaning RB-trees have a 2-3 version and a 2-3-4 version. Ordinary RB-trees also have a 2-3 version and a 2-3-4 version. IntGrah (talk) 19:48, 26 April 2024 (UTC)
"B-trees of order 4" versus "2–3–4 tree"
[edit]There is a section titled "B-trees of order 4". Would it not be clearer to just say "2–3–4 tree" instead, given that most textbooks introduce 2–3–4 trees first? I.e.
- Red–black trees are similar in structure to 2–3–4 trees (B-trees of order 4), ...
and proceed in the rest of the section to refer to them as 2–3–4 trees IntGrah (talk) 19:53, 26 April 2024 (UTC)
C not C++
[edit]The "Operations" section says that the examples are C++.
Are they. Looks like like plain C to me. Didn't want to change it, in case there is some C++ usage in there which I didn't spot.
It's certainly 90%+ C and a proper C++ implementation would look very different.
Good to change to "C" Oschonrock (talk) 16:33, 17 July 2024 (UTC)
:struct RBnode { : RBnode* parent; // C would require 'struct RBnode* parent', : RBnode* child[2]; // but C++ allows this kind of declaration : enum color_t color; : int key; :}; :
- Technically no, but it could be changed easily. The style isn't consistent in the article. I suggest changing it all to the C++ style, or removing it entirely. (edited 10:40, 21 July 2024 (UTC)) IntGrah (talk) 17:07, 17 July 2024 (UTC)
Re-writing the code
[edit]The code in this article is not very well-written. In my opinion it should be pseudocode, since that's what most other CS articles seem to do. If not it should still be cleaned up a bit. Some glaring issues:
- Although the code is technically C++, it doesn’t really use any C++ features, the style is much more similar to C
- RBNode’s children being an array of length 2 with a bunch of associated macros is needlessly confusing. The children should be `left` and `right`. I understand the purpose of the array is to combine the left/right rotation functions into a single function, but why not just have two separate functions if it makes things clearer?
- The type `byte` is used arbitrarily in spots that don't make sense
- `The `goto` statements are confusing and it would be easier to just bunch all of the code together, and then have the sections afterwards explain each part. Or at least something that doesn't use `goto`.
The pseudocode in the Introduction To Algorithms textbook (linked below, start at page 331) is very concise but I'm unsure of how to carry it over without it being considered plagiarism.
[17]https://dl.ebooksworld.ir/books/Introduction.to.Algorithms.4th.Leiserson.Stein.Rivest.Cormen.MIT.Press.9780262046305.EBooksWorld.ir.pdf#page=353 Begilbert238 (talk) 23:56, 20 July 2024 (UTC)
- I agree that the code is quite repugnant. GOTO considered harmful, inconsiderate use of macros, even the fact it is written in C …
- My proposal is to just straight up remove it. In my opinion, the pseudocode is useless in the current state of the article; most red–black tree implementations take several hundreds of lines of code, and spreading this across the article makes the operations seem a lot more complicated than they really are. The cryptic tables used in the explanation also have to go (The section Notes to the sample code and diagrams of insertion and removal > Notes to the insert diagrams).
- IntGrah (talk) 10:36, 21 July 2024 (UTC)
- C-Class level-5 vital articles
- Wikipedia level-5 vital articles in Mathematics
- C-Class vital articles in Mathematics
- C-Class Computing articles
- Low-importance Computing articles
- C-Class software articles
- Low-importance software articles
- C-Class software articles of Low-importance
- All Software articles
- All Computing articles
- C-Class Computer science articles
- High-importance Computer science articles
- WikiProject Computer science articles
- Old requests for peer review