Test Space of the LLM Leaderboard
๐ถ | 81.22 | 79.78 | 91.15 | 77.95 | 75.18 | 87.85 | 76.12 | eren23/ogno-monarch-jaskier-merge-7b-OH-PREF-DPO-v4-test |
๐ถ | 81.22 | 79.78 | 91.15 | 77.95 | 75.18 | 87.85 | 76.12 | fine-tuned on domain-specific datasets | MixtralForCausalLM | bfloat16 | false | cc-by-nc-nd-4.0 | 125.35 | 3536 | fda5cf998a0f2d89b53b5fa490793e3e50bb8239 | eren23/ogno-monarch-jaskier-merge-7b-OH-PREF-DPO-v4-test |
๐ถ | 81.22 | 79.78 | 91.15 | 77.95 | 74.5 | 87.85 | 76.12 | fine-tuned on domain-specific datasets | LlamaForCausalLM | float16 | false | apache-2.0 | 72.29 | 29 | fda5cf998a0f2d89b53b5fa490793e3e50bb8239 | davidkim205/Rhea-72b-v0.5 | |
๐ฌ | 81 | 78.67 | 89.77 | 78.22 | 75.18 | 87.53 | 76.65 | chat models (RLHF, DPO, IFT, ...) | LlamaForCausalLM | bfloat16 | false | other | 72.29 | 2 | ea2b4ff8e5acd7a48993f56b2d7b99e049eb6939 | MTSAIR/MultiVerse_70B | |
๐ถ | 80.98 | 78.58 | 89.74 | 78.27 | 75.09 | 87.37 | 76.8 | fine-tuned on domain-specific datasets | LlamaForCausalLM | float16 | false | other | 72.29 | 2 | ea2b4ff8e5acd7a48993f56b2d7b99e049eb6939 | MTSAIR/MultiVerse_70B | |
๐ถ | 80.81 | 76.79 | 89.02 | 77.2 | 79.02 | 84.06 | 78.77 | fine-tuned on domain-specific datasets | ? | bfloat16 | false | apache-2.0 | 72 | 10 | 40d451f32b1a6c9ad694b32ba8ed4822c27f3022 | SF-Foundation/Ein-72B-v0.11 | |
๐ถ | 80.79 | 76.19 | 89.44 | 77.07 | 77.82 | 84.93 | 79.3 | fine-tuned on domain-specific datasets | ? | bfloat16 | false | apache-2.0 | 72 | 4 | 1f302e0e15f3d3711778cd61686eb9b28b0c72ae | SF-Foundation/Ein-72B-v0.13 | |
๐ถ | 80.72 | 76.19 | 89.46 | 77.17 | 77.78 | 84.45 | 79.23 | fine-tuned on domain-specific datasets | ? | bfloat16 | false | apache-2.0 | 72 | 3 | 84d38e29fec0dc9c274237968fdafe9396702f9b | SF-Foundation/Ein-72B-v0.12 | |
๐ถ | 80.48 | 76.02 | 89.27 | 77.15 | 76.67 | 85.08 | 78.7 | fine-tuned on domain-specific datasets | LlamaForCausalLM | bfloat16 | false | other | 72.29 | 430 | 54a8c35600ec5cb30ca2129247854ece23e57f57 | abacusai/Smaug-72B-v0.1 | |
๐ถ | 79.3 | 73.89 | 88.16 | 77.4 | 72.69 | 86.03 | 77.63 | fine-tuned on domain-specific datasets | LlamaForCausalLM | float16 | false | other | 72.29 | 19 | 4df251a558c53b6b6a4c459045b161951cfc3c4e | ibivibiv/alpaca-dragon-72b-v1 | |
๐ฌ | 78.55 | 70.82 | 85.96 | 77.13 | 74.71 | 84.06 | 78.62 | chat models (RLHF, DPO, IFT, ...) | LlamaForCausalLM | bfloat16 | false | mit | 72.29 | 67 | c64edea08b27be1e7e2ae6a95bcdd74849cb887e | moreh/MoMo-72B-lora-1.8.7-DPO | |
๐ถ | 77.91 | 74.06 | 86.74 | 76.65 | 72.24 | 83.35 | 74.45 | fine-tuned on domain-specific datasets | MixtralForCausalLM | bfloat16 | false | other | 60.81 | 9 | cd29cfa124072c96ba8601230bead65d76e04dcb | cloudyu/TomGrc_FusionNet_34Bx2_MoE_v0.1_DPO_f16 | |
๐ถ | 77.74 | 77.47 | 91.88 | 68.1 | 79.17 | 87.45 | 62.4 | fine-tuned on domain-specific datasets | LlamaForCausalLM | bfloat16 | false | apache-2.0 | 21.42 | 19 | ba3403eaafc6d1f6e3a73245314ee96025c08d96 | saltlux/luxia-21.4b-alignment-v1.0 | |
๐ถ | 77.52 | 74.06 | 86.67 | 76.69 | 71.32 | 83.43 | 72.93 | fine-tuned on domain-specific datasets | MixtralForCausalLM | float16 | false | other | 60.81 | 2 | e8e558b5fd4ac9da839577b1295d10ca75fc2663 | cloudyu/TomGrc_FusionNet_34Bx2_MoE_v0.1_full_linear_DPO | |
๐ถ | 77.5 | 73.81 | 89.22 | 64.92 | 78.57 | 87.37 | 71.11 | fine-tuned on domain-specific datasets | MixtralForCausalLM | bfloat16 | false | apache-2.0 | 12.88 | 27 | 2d8cff968dbfb31e0c1ccc42053ccc4d2698a390 | zhengr/MixTAO-7Bx2-MoE-v8.1 | |
๐ฌ | 77.44 | 74.91 | 89.3 | 64.67 | 78.02 | 88.24 | 69.52 | chat models (RLHF, DPO, IFT, ...) | MixtralForCausalLM | bfloat16 | false | mit | 12.88 | 46 | 915651208ea9f40c65a60d1f971a09f9461ee691 | yunconglong/Truthful_DPO_TomGrc_FusionNet_7Bx2_MoE_13B | |
๐ถ | 77.43 | 73.89 | 89.07 | 75.44 | 71.75 | 86.35 | 68.08 | fine-tuned on domain-specific datasets | LlamaForCausalLM | float16 | false | cc-by-nc-4.0 | 68.98 | 2 | 7dd3ddea090bd63f3143e70d7d6237cc40c046e4 | JaeyeonKang/CCK_Asura_v1 | |
๐ถ | 77.41 | 74.57 | 86.74 | 76.68 | 70.17 | 83.82 | 72.48 | fine-tuned on domain-specific datasets | LlamaForCausalLM | bfloat16 | false | apache-2.0 | 34.39 | 14 | e1cdc5b02c662c5f29a50d0b22c64a8902ca856b | fblgit/UNA-SimpleSmaug-34b-v1beta | |
๐ถ | 77.38 | 73.72 | 86.46 | 76.72 | 71.01 | 83.35 | 73.01 | fine-tuned on domain-specific datasets | MixtralForCausalLM | bfloat16 | false | mit | 60.81 | 7 | 6c7ec6d2ca1c0d126a26963fedc9bbdf5210b0d1 | TomGrc/FusionNet_34Bx2_MoE_v0.1 | |
๐ถ | 77.3 | 71.25 | 85.53 | 76.63 | 71.99 | 81.45 | 76.95 | fine-tuned on domain-specific datasets | LlamaForCausalLM | float16 | false | other | 72.29 | 12 | dc092ecc5d5a424678eac445a9f4443069776691 | migtissera/Tess-72B-v1.5b | |
๐ฌ | 77.29 | 70.14 | 86.03 | 77.4 | 69 | 84.37 | 76.8 | chat models (RLHF, DPO, IFT, ...) | LlamaForCausalLM | bfloat16 | false | mit | 72.29 | 32 | 76389d5d825c3743cc70bc75b902bbfdad11beba | moreh/MoMo-72B-lora-1.8.6-DPO | |
๐ถ | 77.29 | 74.23 | 86.76 | 76.66 | 70.22 | 83.66 | 72.18 | fine-tuned on domain-specific datasets | LlamaForCausalLM | bfloat16 | false | other | 34.39 | 50 | 7b74a95019f01b59630cbd6469814c752d0e59e5 | abacusai/Smaug-34B-v0.1 | |
๐ถ | 77.28 | 72.87 | 86.52 | 76.96 | 73.28 | 83.19 | 70.89 | fine-tuned on domain-specific datasets | MixtralForCausalLM | bfloat16 | false | mit | 60.81 | 4 | 097b951c2524e6113252fcd98ba5830c85dc450f | cloudyu/Truthful_DPO_TomGrc_FusionNet_34Bx2_MoE | |
๐ถ | 77.22 | 73.63 | 89.04 | 75.99 | 70.19 | 85.48 | 68.99 | fine-tuned on domain-specific datasets | MixtralForCausalLM | float16 | false | apache-2.0 | 125.35 | 3 | 95b3b4e432d98b804d64cfe42dd9fa6b67198e5b | ibivibiv/orthorus-125b-v2 | |
๐ถ | 77.19 | 74.49 | 86.76 | 76.55 | 70.21 | 83.27 | 71.87 | fine-tuned on domain-specific datasets | LlamaForCausalLM | float16 | false | other | 34.39 | 11 | 3880710724abcaffbdf8fa4031e1d02066fbfe9d | ConvexAI/Luminex-34B-v0.2 | |
๐ถ | 77.1 | 74.32 | 89.5 | 64.47 | 78.66 | 88.08 | 67.55 | fine-tuned on domain-specific datasets | MixtralForCausalLM | bfloat16 | false | other | 12.88 | 9 | 74c6e4fbd272c9d897e8c93ee7de8a234f61900f | yunconglong/DARE_TIES_13B | |
๐ถ | 77.08 | 74.66 | 89.51 | 64.53 | 78.63 | 88.08 | 67.1 | fine-tuned on domain-specific datasets | MixtralForCausalLM | bfloat16 | false | other | 12.88 | 0 | 96c62ad90f2b82016a1cdbfe96cfa5c4bb278e21 | yunconglong/13B_MATH_DPO | |
๐ถ | 77.07 | 72.95 | 86.22 | 77.05 | 71.31 | 83.98 | 70.89 | fine-tuned on domain-specific datasets | MixtralForCausalLM | bfloat16 | false | mit | 60.81 | 8 | c5575550053c84a401baf56174cb2e5d5bd9e79a | TomGrc/FusionNet_34Bx2_MoE | |
๐ถ | 77.06 | 73.63 | 86.59 | 76.55 | 69.68 | 83.43 | 72.48 | fine-tuned on domain-specific datasets | LlamaForCausalLM | float16 | false | other | 34.39 | 7 | d3efc551679d7ec00da14722d44151c948a48d25 | ConvexAI/Luminex-34B-v0.1 | |
๐ถ | 77.05 | 74.32 | 89.39 | 64.48 | 78.47 | 88 | 67.63 | fine-tuned on domain-specific datasets | MixtralForCausalLM | bfloat16 | false | other | 12.88 | 4 | d8d6a47f877fee3e638a158c2bd637c0013ed4e4 | yunconglong/MoE_13B_DPO | |
๐ถ | 77.03 | 72.95 | 88.86 | 75.41 | 69.1 | 85.08 | 70.81 | fine-tuned on domain-specific datasets | LlamaForCausalLM | float16 | false | cc-by-nc-4.0 | 68.98 | 0 | 06fd0e293aeb3b2722e3910daefcd185fad4558c | JaeyeonKang/CCK_Asura_v3.0 | |
๐ถ | 76.95 | 73.21 | 86.11 | 75.44 | 72.78 | 82.95 | 71.19 | fine-tuned on domain-specific datasets | MixtralForCausalLM | 4bit | false | other | 31.8 | 1 | 331bb6bdba4140bbf0031bd37076f2c8a76d7dbb | cloudyu/4bit_quant_TomGrc_FusionNet_34Bx2_MoE_v0.1_DPO | |
๐ถ | 76.74 | 73.38 | 89.15 | 64.32 | 78.24 | 84.93 | 70.43 | fine-tuned on domain-specific datasets | MistralForCausalLM | float16 | false | apache-2.0 | 7.24 | 71 | bbaef291e93a7f6c9f8cb76a4dbd8c3c054d3f3c | yam-peleg/Experiment26-7B | |
๐ถ | 76.74 | 72.87 | 89.2 | 64.4 | 77.92 | 84.77 | 71.27 | fine-tuned on domain-specific datasets | MistralForCausalLM | bfloat16 | false | apache-2.0 | 7.24 | 19 | a4ca706d1bbc263b95e223a80ad68b0f125840b3 | MTSAIR/multi_verse_model | |
๐ถ | 76.7 | 72.95 | 89.23 | 64.42 | 78.41 | 84.93 | 70.28 | fine-tuned on domain-specific datasets | MistralForCausalLM | float16 | false | apache-2.0 | 7.24 | 2 | f136ec75c9fb7c86c071291ddf418089c8f43da0 | chihoonlee10/T3Q-Mistral-Orca-Math-DPO | |
๐ถ | 76.67 | 73.12 | 89.12 | 64.3 | 78.04 | 85 | 70.43 | fine-tuned on domain-specific datasets | MistralForCausalLM | bfloat16 | false | apache-2.0 | 7.24 | 71 | bbaef291e93a7f6c9f8cb76a4dbd8c3c054d3f3c | yam-peleg/Experiment26-7B | |
๐ถ | 76.65 | 73.29 | 89.11 | 64.35 | 77.86 | 84.93 | 70.36 | fine-tuned on domain-specific datasets | MistralForCausalLM | bfloat16 | false | mit | 7.24 | 0 | cd8bfad664fb7f9b017388d974dd3265f8c40396 | rwitz/experiment26-truthy-iter-0 | |
๐ถ | 76.62 | 73.38 | 89.13 | 64.28 | 77.98 | 84.93 | 70.05 | fine-tuned on domain-specific datasets | MistralForCausalLM | float16 | false | apache-2.0 | 7.24 | 0 | ff261dadc107d0ce67b836a052d7131f9d9e4260 | yam-peleg/Experiment30-7B | |
๐ถ | 76.62 | 73.04 | 89.04 | 64.44 | 78.49 | 85.4 | 69.29 | fine-tuned on domain-specific datasets | MistralForCausalLM | float16 | false | apache-2.0 | 7.24 | 0 | 5efde29924cf7158e4cbd642311a92a14e85597c | yam-peleg/Experiment28-7B | |
๐ถ | 76.61 | 73.12 | 89.19 | 64.36 | 78 | 84.93 | 70.05 | fine-tuned on domain-specific datasets | MistralForCausalLM | float16 | false | apache-2.0 | 7.24 | 7 | a283f4e8169009d683b329ae1a96c9a77ce5936a | MaziyarPanahi/Calme-7B-Instruct-v0.2 | |
๐ถ | 76.6 | 73.21 | 89.13 | 64.34 | 77.66 | 84.85 | 70.43 | fine-tuned on domain-specific datasets | MistralForCausalLM | bfloat16 | false | mit | 7.24 | 0 | cb04e33c4ff559b31767765100cd50c24ec2531c | rwitz/experiment26-truthy-iter-1 | |
๐ถ | 76.6 | 73.38 | 89.11 | 64.36 | 77.3 | 85 | 70.43 | fine-tuned on domain-specific datasets | MistralForCausalLM | bfloat16 | false | mit | 7.24 | 0 | 1dc4edde961960f7263dc3bdd37ca9e9f7e451ea | rwitz/experiment26-truthy-iter-2 | |
๐ถ | 76.59 | 72.95 | 89.15 | 64.44 | 77.96 | 85 | 70.05 | fine-tuned on domain-specific datasets | MistralForCausalLM | float16 | false | apache-2.0 | 7.24 | 2 | 03405145ca06170f1b2e0acc838f573f0e090df8 | chlee10/T3Q-Merge-Mistral7B | |
๐ถ | 76.59 | 74.06 | 88.96 | 64.45 | 77.67 | 85 | 69.37 | fine-tuned on domain-specific datasets | MistralForCausalLM | float16 | false | openrail | 7.24 | 1 | 0da1865ae1ce682d4002dd9935d20520e79ed520 | LeroyDyer/Mixtral_AI_Cyber_3.m1 | |
๐ถ | 76.58 | 73.55 | 89.19 | 64.36 | 78.31 | 85 | 69.07 | fine-tuned on domain-specific datasets | MistralForCausalLM | bfloat16 | false | apache-2.0 | 7.24 | 1 | a27e0dfaf79af8da32fc4ff6c5eb8be46c9f5a13 | yam-peleg/Experiment31-7B | |
๐ถ | 76.57 | 73.55 | 89.14 | 64.29 | 78.43 | 85.16 | 68.84 | fine-tuned on domain-specific datasets | MistralForCausalLM | float16 | false | apache-2.0 | 7.24 | 1 | a27e0dfaf79af8da32fc4ff6c5eb8be46c9f5a13 | yam-peleg/Experiment31-7B | |
๐ถ | 76.56 | 73.81 | 89.06 | 64.34 | 78.54 | 85.16 | 68.46 | fine-tuned on domain-specific datasets | MistralForCausalLM | float16 | false | apache-2.0 | 7.24 | 0 | b7f5aa8d4c899c175a1dad40a03b4071df90bd8e | yam-peleg/Experiment24-7B | |
๐ฌ | 76.55 | 74.23 | 89.37 | 64.54 | 74.26 | 87.77 | 69.14 | chat models (RLHF, DPO, IFT, ...) | MixtralForCausalLM | bfloat16 | false | apache-2.0 | 12.88 | 16 | 69b9280ee4d2a20ef5645798621e62dd9777c139 | zhengr/MixTAO-7Bx2-MoE-Instruct-v7.0 | |
๐ค | 76.55 | 73.21 | 89.19 | 64.39 | 76.82 | 85.32 | 70.36 | base merges and moerges | MistralForCausalLM | bfloat16 | false | apache-2.0 | 7.24 | 2 | 4774173a54be9a648e1cf03248af3ae3d51a0434 | bobofrut/ladybird-base-7B-v8 | |
๐ถ | 76.53 | 73.12 | 89.06 | 64.49 | 78.72 | 85 | 68.76 | fine-tuned on domain-specific datasets | MistralForCausalLM | float16 | false | apache-2.0 | 7.24 | 1 | 11a51df04f85047e166d63eb64cedc1ec02732a1 | yam-peleg/Experiment29-7B | |
๐ถ | 76.53 | 73.46 | 89.09 | 64.4 | 77.76 | 84.85 | 69.6 | fine-tuned on domain-specific datasets | MistralForCausalLM | bfloat16 | false | apache-2.0 | 7.24 | 0 | ff261dadc107d0ce67b836a052d7131f9d9e4260 | yam-peleg/Experiment30-7B | |
๐ถ | 76.5 | 72.78 | 89.15 | 64.51 | 78.8 | 84.85 | 68.92 | fine-tuned on domain-specific datasets | MistralForCausalLM | float16 | false | apache-2.0 | 7.24 | 0 | 1e1cd6e84d02a9c1d70c2a2037f485bc2b646391 | CorticalStack/pastiche-crown-clown-7b-dare-dpo | |
๐ถ | 76.49 | 72.95 | 89.26 | 64.32 | 78.1 | 85.16 | 69.14 | fine-tuned on domain-specific datasets | MistralForCausalLM | float16 | false | apache-2.0 | 7.24 | 9 | 22a9da7289d20a1d5452f77aa5bc49e97344af52 | MaziyarPanahi/Calme-7B-Instruct-v0.1.1 | |
๐ฌ | 76.49 | 73.04 | 89.25 | 64.4 | 78.17 | 84.85 | 69.22 | chat models (RLHF, DPO, IFT, ...) | MistralForCausalLM | float16 | false | apache-2.0 | 7.24 | 4 | cd343f0846ceb4180297920b2da50d6b28dcb242 | mlabonne/UltraMerge-7B | |
๐ถ | 76.48 | 71.25 | 85.24 | 77.28 | 66.74 | 84.29 | 74.07 | fine-tuned on domain-specific datasets | MixtralForCausalLM | bfloat16 | false | mit | 60.81 | 0 | 6ba7b5acb65dd62c28585cba298e0d3671c14f3a | cloudyu/Truthful_DPO_cloudyu_Mixtral_34Bx2_MoE_60B |